Efficient code for accessing the items (or lines) in a variable.

Alex Tweedly alex at tweedly.net
Sat Feb 19 19:44:08 EST 2011


The other day I mentioned the great efficiency that can be gained by 
using the form
>    repeat for each item tItem in tdata
>       .... tItem ...
>    end repeat

rather than the (perhaps more obvious)

>   repeat with i = 1 to the number of items in tdata
>        .... item i of tData ...
>   end repeat

So what we have is two techniques:

one is very fast, but has restrictions
      - can only be done with one variable at a time, in a simple loop
      - the variable should not be changed during the loop
the other is quite slow, but does not have the restrictions.

I should have mentioned that there is another technique, using char 
offsets. Using "... item i of tData ..." is relatively slow because the 
engine must do a search (scan) from the beginning of the data for the 
numbered item. However, the engine can directly access a numbered char 
offset without any scan (and it can very quickly access the first item, 
because all that is needed is a small scan to find the end of the item). 
Therefore, you can re-write this second form as

>   put 1 into tCurrentItemCharOffset
>   repeat with i = 1 to the number of items in tData
>     ... item 1 of (char tCurrentItemCharOffset to -1 of tData) ...
>     add the number of chars in item 1 of (char tCurrentItemCharOffset 
> to -1 of tData) + 1 to tCurrentItemCharOffset
>   end repeat
This is almost as quick as the first method, and similarly its 
performance is linear wrt the number of items. But it can be done on 
more than one variable at a time, and can be done in more general 
contexts, not simply looping through the variable. And, if done 
carefully, you can change the current item while processing the loop 
(i.e. you must increment the offset after any change to the current 
item, and of course you must not change any part of the variable 
*before* the current item).

so for a simple loop, we have times like

>       10000 items    20000 items
> 1.           2 msec            4 msec
> 2.       419 msec      1604 msec
> 3.           5 msec          10 msec
>
NB doubling the number of items causes  methods 1 and 3 to take double 
the time, but method 2 takes 4x the time, so if you are working with 
large data sets this makes a difference.

-- Alex.










More information about the use-livecode mailing list