How to speed up MC, Part 3

Mon Apr 14 05:57:01 EDT 2003

How to speed up MC, Part 3

It is quite interesting to compare the speed of MetaCard with HyperCard
on the one hand and lower level languages like Pascal or C on the other
hand. Consider a simple statement like (assuming that x is appropriately initialized):
  put x + 1 into x
How do the different languages compare? Because you cannot use MetaBench
for this comparison, here are the (rough) results.

Pascal: x := x+1 --> 1
C++: x = x+1 --> 1
MetaCard: add 1 to x --> 80
MetaCard: put x + 1 into x --> 130
HyperCard: add 1 to x --> 2800
HyperCard: put x + 1 into x --> 3600
Empty extension: --> 3500

The relative speeds might be different for different computers or
operating systems of course. Nevertheless, it can easily be seen that
HyperCard is much slower than MetaCard. MetaCard however is much slower
than Pascal or C. This difference may suggest that writing a simple
extension that adds up numbers might considerably enhance the speed of
calculations. Unfortunately this is not true. The overhead of calling an
extension and sending the result back to MC outweighs the gain. In the
table above you can see that calling an "empty" extension (an extension
that just does nothing) would take 3500 time units. Extensions can
enhance the speed, but only if the total amount of calculations
performed by the extension outweighs calling the extension and sending
the result back. It is easy to see that if your extension performs an
adding operation 100 times, you have a significant speed gain (about
3600 versus 8000 units). More about extensions in a forthcoming part.
Hence we need different approaches to enhance the speed of calculations.
Don't expect to learn how to add up numbers faster. But you can expect
how to prevent that scripts manipulating numbers may run 10 or even 50
times (!) or so slower than necessary.
First some fairly simple things to get the most out of calculations.

You most likely know that
  add 1 to x
executes faster then:
  put x + 1 into x

The reason is that in 'add 1 to x', the variable x is accessed only
once, whereas in 'put x + 1 into x' it is accessed two times. The
computer doesn't know that the result of 'x + 1' should be put in the
same place x and has to figure out twice where in memory x resides;
execution of the statement is similar to the execution of 'put x + 1
into y'.
Probably you already preferred to write 'add 1 to x' instead of 'put x +
1 into x': you have less characters to type. Be honest, do you also
always write 'subtract 1 from x' instead of 'put x - 1 into x'? The
speed gain is the same, for the very same reason. And this also holds of
course for the multiply and divide operators; or for x^2 versus x*x.

Accessing variables as least as possible is a general principle to
enhance the speed. Suppose you have a complex calculation:

script A
  put (Ncases*covXY-sumX*sumY)/ sqrt((Ncases*sum2X-\
  (sumX)^2)*(Ncases*sum2Y-(sumY)^2)) into r

The statement (the formula for the pearson correlation, if you really
want to know) is hardly readable and probably you run into trouble in
putting the parentheses at the right places. So the next script looks
much better:

script B
  put Ncases * covXY - sumX * sumY into c
  put Ncases * sum2X - (sumX)^2 into r1
  put Ncases * sum2Y - (sumY)^2 into r2
  put c / sqrt(r1 * r2) into r

Much nicer, but the script is 30% slower because of the overhead of
putting and getting the variables c, r1 and r2. The general rule is:
minimize accessing variables, either to get them, or to put something in them.

The div and mod operators are often used  for particular transformations
of numbers. For example, to transform number 0 to 7 into 7, 8 to 15 into
15, 16 to 23 into  23, etc. you may use:
  get (i div x + 1) * x - 1
(i is the number to transform, x signifies the range, 8 in this case)
To transform 0  to 7 into 0 to 7, 8 to 15 into 0 to 7, 16 to 23 into 0
to 7, etc. you may use:
   (i mod x + 1) - 1
For particular numbers (i.e. 2, 4, 8, 16, etc.) it is faster to use the
bitwise operators:
  i bitor (x-1)
respectively
  i bitand (x-1)
to obtain the same result, but more then 50 % faster.
To determine whether a number is odd or even, you can use:
  (x bitand 2) = 0
(true if even, false if odd)

If you wonder what bitwise operators are (the documentation in MC is
minimal), see the text at the end of this contribution.

----------------------------------

Now for something completely different.
Suppose you have read a real number x, e.g. "0.345678" from a field.
Compare these two scripts:

script C
  get x/(1-x)

script D
  add 0 to x
  get x/(1-x)

Believe it or not, but script D will run considerably faster.
If the content of a field (or a file) is read, MC hasn't the faintest
idea that it's meant to be a number, and stores it as a string of
characters. Thus it has to convert the string into a number, including a
check whether or not x is a number before it actually starts
calculating. In executing script C, this conversion takes place two
times. In script D however, the conversion takes place only once. If
'add 0 to x' is executed, the string x is converted to a number, and the
number is stored into x: now x no longer contains the string "0.345678"
but (a binary representation of) the number 0.345678. Thus it saves a
lot of time in performing calculations on it. The more x's in the
equation, e.g. x*(1+x)/(1-x), the more you gain.
It's a bit tricky to perform a speed test on script D, to compare it
with C. Consider script E:

script E
  put fld "inputFld" into x
  repeat 100000
    add 0 to x
    get x/(1-x)
  end repeat

After the first loop x is converted to a number. Hence the conversion
takes place only once, instead of 100000 times. You should be very well
aware of such peculiarities in performing speed tests. But you can
safely try:

script F
  put fld "inputFld" into x
  add 0 to x -- convert it to a real number
  repeat 100000
    get x/(1-x)
  end repeat

It will run nearly 10 times as fast compared to a script without 'add 0
to x', excluding the overhead of the repeat loop.

Generally, if you store a variable, and MC has become aware that this
variable is a number, it will be stored as a binary number: no
conversion is needed if it is used for further calculations.
You may note that if your are doing speed tests on numbers, and test
numbers are read from a field, results will be completely unreliable, if
you assume that the numbers read, are binary numbers.

If a variable is stored as a binary number, and you use statements like
'get length (x)' or 'get char 3 of x', it has to be converted to a
string, before the statement can be executed. For some reason,
converting a number to a string is much slower then converting a string
to a number. Converting an integer to a string takes about 3 times as
much as the other way around, whereas converting a real number is even
slower, because of the more complex formatting (by the way, MC does not
distinguish internally between integers and real numbers; all numbers
are stored as reals).
Suppose variable x is read from a field, containing "0.345678". Compare:

script G
  add 0 to x -- convert it to a real number
  repeat 100000
    get last char of x
  end repeat

script H
  repeat 100000
    get last char of x
  end repeat

Script G may take about 50 times as long as script H...
If variable x is binary stored, and you are going to perform a number of
string operations on it (like if char 1 of x = "2"; get offSet (".", x);
put length (x) into s), you better copy it into another variable and
convert the copy to a string first, for example:

  put x into s
  put empty after s

You should leave the original variable x untouched. Compare script I and J:

script I

  put 1/3 into x
  put last char of x into last char of x
  put 3*x

script J

  put 1/3 into x
  put 3*x

Script I yields 0.999999 (if the local numberFormat has the default
value), script J yields 1. Because you converted the number to a string,
you lost precision.

To illustrate things, here is a nice example from Xavier, in a reaction
to Part 1 of How to speed up MC. He wrote:

> Another 2 things to consider in your loops:
> - set cursor to busy 
>   will slow down scripts considerably.
>
> - If you have a "progress" bar, dont change it at every loop,
>    do so for each percent change or at every 100th... 
>    this will also save major time...
>
> for example
>   repeat with x = 1 to 100000
>     if char -3 of x is not oldx then
>       set cursor to busy
>       showstatus
>      put char -3 of x into oldx
>   end if

Although his observations are correct, his solution is not (sorry
Xavier, but I love your example because it illustrates that we really
can gain a lot).
The variable x is a binary one, not a string one (it's produced by MC
itself). So each time 'if char -3 of x is not oldx then' is executed, x
is converted to a string, to be able to get character -3: it takes an
awful lot of time. To be precise, x itself remains intact, the converted
copy is put in some internal temporary variable.
The script takes 370 ticks on my computer, neglecting 'showStatus' of
course. Alternatively, take a look at script J:

  put the ticks into myTestTime 
  repeat with x = 1 to 1000000
    if the ticks - myTestTime > 5 then
      put the ticks into myTestTime 
      set cursor to busy
    end if
    -- do something
  end repeat

It takes 160 ticks. Not that much difference? Please note the difference
in the number of repeats: hunderd and thousand versus one million! This
script has some of other advantages too:
- The rotation of the busy cursor is independent of the time 'do
something' takes (unless it takes more then 5 ticks, in which case that
the script would take a day or more).
- You can set a flag, becoming true the first time the ticks -
myTestTime exceeds 5 ticks. Based on the value of x at that moment and
the total number of repeats, you can decide whether or not a progress
bar is shown (assuming that the 'do something's' during the first 5
ticks are representative for the remaining ones). For example, if the
total time is expected to exceed 60 ticks, show a progress bar, else
stick to the busy ball.

For your convenience, here is a more complete scheme.

  put 60 into myDelayPref
-- but better let user decide on whether progress bar will be
-- shown if analysis is expected to take more then 1 second,
-- 2 seconds, 5 seconds or whatever
  put 0 into myProgresStatus
  put the number of lines of myData into nLines
  put the ticks into myTestTime 
  repeat with x = 1 to nLines
    if the ticks - myTestTime > 5 then
      switch myProgresStatus
      case 2 -- analysis takes more then one second
        myAdjustProgressBar -- adjusts the progress bar
        break
      case 1 -- analysis will take less then 1 second
        set cursor to busy
        break
      case 0 -- not checked yet
        if (5*nLines)/x > myDelayPref then -- take 5 ticks as criterion
for updating
          myShowProgressBar -- show a progress bar
          put 2 into myProgresStatus
        else
          put 1 into myProgresStatus
        end if
        break
      end switch
      put the ticks into myTestTime 
    end if
    -- do something
  end repeat

If analyses take more then 10 ticks, case 0 will occur the least, hence
this case is the last one; if the analysis takes less then 10 ticks --
well, why bother?
The five ticks delay before a progress bar appears will not be observed
by the user.
Take the switch part (not the ticks check!) out of the script to create
a function handler, returning myProgresStatus; this will hardly add
time, because it is only called each five ticks.

Happy programming,

Next part about arrays, one of the greatest things in MC (maybe within 2
or 3 weeks)

Wil Dijkstra

----------------------------------
About bitwise operators

Numbers (actually, we are only talking about non-negative integer
values) are eventually stored as ones and zeros in memory. To find out
how a particular number looks like, you can use baseConvert. For example
to get the binary representation (base 2) of the number 9 (base 10):
 get baseConvert (9, 10, 2)
yields 1001.
The binary representation of 12 is obtained with baseConvert (12, 10,
2), yielding 1100.
What does something like '9 bitAnd 12' means?
The first bit of both binary numbers are both 1, hence the result for
the first bit is 1 (like the result of 'true and true' is true). The
second bits are unequal, hence a '0' (true and false is false). The
third one yields '0' (false and false is false) and the final bits yield
'0'. The result is the binary number 1000. Applying baseConvert (1000,
2, 10) yields 8. And this is exactly what the statement:
  get 9 bitAnd 12
produces.

Schematically:
1001 --> 9
1100 --> 12
------
1000 --> 8

Similarly, 9 bitOr 12 yields the binary number 1101, which is 13.
Applying bitXor (exclusive or: either the first, or the second bit
should be 1, but not both) yields 0101, which is 5.