Relative Speed of different loop structures. [was Finding non-common elements]
Alex Tweedly
alex at tweedly.net
Sat Nov 5 21:48:36 EST 2005
[ You guys probably all know this already, and I had just missed it up
till now ....]
Any time speed of looping through a data structure is mentioned, the
first thing anyone says is
Use
> repeat for each line tLine of tData
> doSomeThingWith tLine
because it is much faster than
> repeat with tIndex = 1 to the number of lines in tData
> doSomeThingWith line tIndex of tData
I just realized tonight that you can do
> put the number of lines in tData into tNumLines
> split tData by CR
> repeat with tIndex = 1 to tNumLines
> doSomeThingWith tData[tIndex]
(and later combine it again if needed)
While this is nowhere near as fast as the "repeat for", it is better (by
one or two orders of magnitude) than the "line tIndex of tData" version
- fast enough that algorithms which I had rejected because of the
slowness of "line tIndex of tData" can be worth considering again.
In particular - one obvious approach to the "common vs non-common lines"
problem would be to sort each list, and then step through them
"together", using an index into each one. This isn't a good approach in
Rev - because of the high cost of using "line tIndex of tData" - but
using the split and tData[tIndex] makes it feasible. In fact, I found
that my first idea of using the two arrays and 'union' is no faster than
doing the following
(note "ZZZZZ is assumed to be after all possible lines, and is added to
ease the loop termination case)
> put fld "Field" & cr & "ZZZZZZZZZZ" into t1
> put fld "Field" & cr & "test line" & cr & "ZZZZZZZZZZ" into t2
>
> put the millisecs into tStart
> put 1 into i2
> put the number of lines in t2 into limit2
>
> sort t1
> sort t2
> split t2 by CR
> put t2[1] into L2
>
> repeat for each line L1 in t1
> repeat while L2 < L1
> add 1 to i2
> put t2[i2] into L2
> end repeat
> if L2 = L1 then
> -- put L1 & cr after tBoth
> add 1 to i2
> put t2[i2] into L2
> else
> -- put L1 & cr after t1only
> end if
> end repeat
> if i2 < limit2 then
> repeat with i = i2 to limit2-1
> put t2[i] & cr after t2only
> end repeat
> end if
> put "loop" && the millisecs - tStart & cr after msg
P.S. I tried hard to break every one of Jerry's recommendation about
variable naming as described in his excellent tutorial from the
"Conference" session; if you haven't already downloaded and read that
stack, you should. It *might* just stop you from writing such ugly code
as I did above - but my old Fortran habits just keep coming back :-)
--
Alex Tweedly http://www.tweedly.net
--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.362 / Virus Database: 267.12.8/161 - Release Date: 03/11/2005
More information about the use-livecode
mailing list