Relative Speed of different loop structures. [was Finding non-common elements]

Alex Tweedly alex at tweedly.net
Sat Nov 5 21:48:36 EST 2005


[ You guys probably all know this already, and I had just missed it up 
till now ....]

Any time speed of looping through a data structure is mentioned, the 
first thing anyone says is

Use

>    repeat for each line tLine of tData
>        doSomeThingWith  tLine

because it is much faster than

>   repeat with tIndex = 1 to the number of lines in tData
>        doSomeThingWith  line tIndex of tData


I just realized tonight that you can do

>    put the number of lines in tData into tNumLines
>    split tData by CR
>    repeat with tIndex = 1 to tNumLines
>        doSomeThingWith  tData[tIndex]

(and later combine it again if needed)

While this is nowhere near as fast as the "repeat for", it is better (by 
one or two orders of magnitude) than the "line tIndex of tData" version 
- fast enough that algorithms which I had rejected because of the 
slowness of "line tIndex of tData" can be worth considering again.

In particular - one obvious approach to the "common vs non-common lines" 
problem would be to sort each list, and then step through them 
"together", using an index into each one. This isn't a good approach in 
Rev - because of the high cost of using "line tIndex of tData" - but 
using the split and tData[tIndex] makes it feasible. In fact, I found 
that my first idea of using the two arrays and 'union' is no faster than 
doing  the following
(note "ZZZZZ is assumed to be after all possible lines, and is added to 
ease the loop termination case)

>     put fld "Field" & cr & "ZZZZZZZZZZ" into t1
>     put fld "Field" & cr & "test line" & cr  & "ZZZZZZZZZZ" into t2
>     
>     put the millisecs into tStart
>     put 1 into i2
>     put the number of lines in t2 into limit2
>     
>     sort t1

>     sort t2
>     split t2 by CR
>     put t2[1] into L2
>    
>     repeat for each line L1 in t1
>         repeat while L2 < L1
>             add 1 to i2
>             put t2[i2] into L2
>         end repeat
>         if L2 = L1 then
>             -- put L1 & cr after tBoth
>             add 1 to i2
>             put t2[i2] into L2
>         else
>             -- put L1 & cr after t1only
>         end if
>     end repeat
>     if i2 < limit2 then
>         repeat with i = i2 to limit2-1
>             put t2[i] & cr after t2only
>         end repeat
>     end if
>     put "loop" && the millisecs - tStart & cr after msg         


P.S. I tried hard to break every one of Jerry's recommendation about 
variable naming as described in his excellent tutorial from the 
"Conference" session; if you haven't already downloaded and read that 
stack, you should. It *might* just stop you from writing such ugly code 
as I did above - but my old Fortran habits just keep coming back :-)


-- 
Alex Tweedly       http://www.tweedly.net



-- 
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.362 / Virus Database: 267.12.8/161 - Release Date: 03/11/2005




More information about the use-livecode mailing list