The speed of MC
Wil Dijkstra
w.dijkstra at scw.vu.nl
Mon Mar 31 11:21:01 EST 2003
How to speed up MC
All those guys that migrated from good old HyperCard to MC, were
delighted by the speed enhancement. Nevertheless, in a number of cases,
MC is slow compared to languages like Pascal or C. Scott Raney gives
some good advice about how to speed up your scripts. However, there are
some of less known tricks that can speed up your scripts amazingly.
Suppose you want to create a list consisting of a number of lines with
something put into each line. For the sake of simplicity, let us create
a list of 10000 random numbers between 1 and 1000. If your script looks
like this:
script A
repeat with i = 1 to 10000
put random (1000) into line i of randList
end repeat
and are eager to know how to write a script that executes more then ten
times as fast, you should read on!
But first some questions. Do you think that the same script, but now
generating random numbers between 1 and 1000000 (one million) takes more
time? If your answer is yes, youre right. It will take about twice as
much time. Do you think thats because generating a random number
between 1 and 1000000 takes more time than generating a random number
between 1 and 1000? If your answer is yes, youre wrong. It takes
about the same amount of time.
To write scripts that executes as fast as possible, you should know
something about how a computer (or MetaCard) works, how data are stored,
etc. Im going to tell you that in a very, very simplified way. But
first, the fast script:
script B
repeat with i = 1 to 10000
put random (1000) & cr after randList
end repeat
delete last char of theList
Script B yields exactly the same result as script A, but is more then
ten times as fast. If we wanted to generate random numbers between 1 and
1000000, it is even more then 20 times as fast.
Here is the reason why.
Data are stored in memory. Each piece of data has a particular length.
For example, in script A, the 75th call to random (1000) may yield 234,
which is three characters long. This piece of data had to be put after
the CR (carriage return) at the end of line number 74. But where in
memory is that? The computer can only figure that out by counting the
number of CRs from the start of randList. And that takes time. And
each call to random (1000) the computer (the poor thing) starts counting
the CRs again.
Script B does not have this drawback. The computer does not have to
bother about counting CRs, but can just put the result of random (1000)
after the last char of randList. In script B a CR is placed after this
line, to warrant that each new result of random (1000) appears on a new
line. Finally, the last cr is deleted, to ensure that the results of
script A and B are exactly the same.
To count the number of CRs in script A, the computer has to walk
through all characters of randList and to decide whether or not the
character is a CR. If we generated random numbers between 1 and 1000000,
the computer has to walk through about twice (actually, less than twice,
because also the CRs are characters) as much characters than in case of
generating random numbers between 1 and 1000. Hence generating random
numbers between 1 and 1000000 with script A takes about twice as much
time than generating random numbers between 1 and 1000 with script A. In
script B there is no difference of course.
If you understand the principle, you can easily imagine that the speed
increase will become negligible if the number of characters on a line is
very small, but very large if the mean number of characters on each line
as large.
Moreover, you will also understand why using array variables is so fast.
Array variables are indexed. If the variable AR is an array variable,
AR[345] directly points to the correct position in memory. Hence, a
statement like get AR [345] is very fast. On the other hand, the
statement get line 345 of randList is much slower: again, the computer
has to count 344 CRs, walking through the data of randlist.
Similarly, if you want to perfom some action on each number in
randList using the script:
script C
repeat for each line randNumber in randList
put randNumber into temp
-- do something
end repeat
is much faster then
script D
repeat with i = 1 to the number of lines of randList
put line i of randList into temp
-- do something with temp
end repeat
Why? In script D the computer has to count CRs to get the is line. In
script D, the computer remembers where it is, getting lines; it just
proceeds through the lines and doesnt count CRs. Moreover, of course,
the statement put randNumber into temp is superfluous.
A strategy that can give good results, is:
(1) Put a list into an array, using the repeat for each approach. Note
that you can also use items, or words.
(2) Perform actions on the content of the array.
(3) Put it back into the list, e.g. to display it in a field.
This approach is especially useful if you want to change the contents of
the list, because you cannot use repeat for each and at the same time
change the contents (you should now understand why).
There are more tricks, but maybe all this stuff is quite familiar to you
and offers nothing new. Please let me know if you appreciate information
like this. If not, thats fine. It will save me time. If yes, I will
tell you the next time how to speed up calculations. And I will tell you
about some limits (for example, the largest number MC can handle). And,
for the happy few who can write C or Pascal, how to use externals or how
self-written programs in C or Pascal can be approached using Apple
events. (but don't expect me to contribute daily!) My aim is not to give
you just tricks, but to give you some understanding of the mechanisms.
If you grasp the difference between the A and B scripts, and the C and D
scripts, and when it makes a difference and when not, you can yourself
carefully inspect your scripts and improve them.
Happy programming,
Wil Dijkstra
More information about the metacard
mailing list