Speed testing: Fastest search method
Alex Tweedly
alex at tweedly.net
Sun Aug 31 17:04:46 EDT 2014
Hugh,
The condition you've chosen for deciding whether to delete the line is
whether or not the line is empty.
So in method 2, replacing those lines by "" has no effect on the data.
That is, I think, an inadequate benchmark. The primary cost you would
encounter in the real case with method 2 is the data copying caused by
emptying the line (i.e. all subsequent content in the variable has to be
copied down into place). So at the very least it gives very optimistic
times for method 2.
I tried the same code (more or less - added the "filter" for method 2 to
get the same results) on a data set which contains real lines of data to
be deleted.
I also added method4, which tries to get the best of both worlds. It
restricts the additional memory usage (by building up a second variable,
but removing sections of the input variable at the same time), and also
does relatively few deletions (and hence few data copies).
(full script included below)
LC7DP10, MacOSX 10.8.5 Macbook Pro
Source data:
10,000 lines of 100 chars, alternating between two values (non-random
so it would be completely repeatable)
5000 lines begin with char "a", other 5000 being with "b"
Condition for deletion
First char of the line is "a"
(hence 5000 lines are deleted
Method 1:
not timed - was taking too long :-)
Method 2:
11.51 secs
Method 3:
0.04 secs
Method 4:
0.09 secs
Changing the conditional test so that no lines are deleted, the times
are all < 0.1 seconds
Conclusion:
if memory is not an issue, method 3 is best
if memory is an issue, method 4 is best
-- Alex.
on mouseUp
local t1, t2, t3, t4
-- put method1() into t1
put method2() into t2
put method3() into t3
put method4() into t4
put the number of lines in t1 && the number of lines in t2 && the
number of lines in t3 && the number of lines in t4 after fld 1
end mouseup
function make_data
local L, L1, t
put "a" into L
put "b" into item 100 of L
put "b" into L1
put "a" into item 100 of L1
repeat 5000 times
put L & CR & L1 & CR after t
end repeat
return t
end make_data
function method1
local tVar, tStart
set the cursor to watch
put make_data() into tVar
put the long seconds into tStart
repeat with x=the number of lines in tVar down to 1
if char 1 of line x of tVar="a" then
delete line x of tVar
end if
end repeat
put the long seconds - tStart & CR after fld 1
return tVar
end method1
function method2
local tVar, L, tStart, x
set the cursor to watch
put make_data() into tVar
put the long seconds into tStart
repeat for each line L in tVar
add 1 to x
-- put x && L &CR after msg
if char 1 of L="a" then
put "" into line x of tVar
end if
end repeat
filter tVar without empty
put the long seconds - tStart & CR after fld 1
return tVar
end method2
function method3
local tVar, tStart, L, tdout
set the cursor to watch
put make_data() into tVar
put the long seconds into tStart
repeat for each line L in tVar
if char 1 of L<>"a" then
put L &cr after tdout
end if
end repeat
if last char of tdout=cr then delete last char of tdout
put the long seconds - tStart & CR after fld 1
return tdout
end method3
function method4
constant K = 1000 -- probably should be a bigger number
local tVar, tStart, L, tdout, x
set the cursor to watch
put make_data() into tVar
put the long seconds into tStart
repeat forever
if the number of lines in tVar = 0 then exit repeat
put 0 into x
repeat for each line L in tVar
if char 1 of L<>"a" then
put L &cr after tdout
end if
add 1 to x
if x >= K then exit repeat
end repeat
delete line 1 to K of tVar
end repeat
if last char of tdout=cr then delete last char of tdout
put the long seconds - tStart & CR after fld 1
return tdout
end method4
On 31/08/2014 10:53, FlexibleLearning.com wrote:
> Some benchtesting...
>
> Setup:
> LC7DP10, Windows 7
>
> Source data:
> 10,000 lines of random data
> 100 chars per line
> 3,346 empty lines
>
> Task:
> Strip lines where a given condition is met.
>
> Results:
> Method 1
> Operating on a single variable, 'repeat with' + delete line
> 25.586 secs
>
> Method 2
> Operating on a single variable, 'repeat for each' + filter with empty
> 7.755 secs
>
> Method 3
> Using a second variable for output, 'repeat for each' + second variable
> 0.136 secs
>
> Conclusions:
> If memory is an issue, then Method 2 is best
> If memory is not an issue, then Method 3 is best
>
> Scripts applied:
> #1:
> on mouseUp
> set the cursor to watch
> put the long seconds into tStart
> put fld "Data" into tVar
> repeat with x=the number of lines in tVar down to 1
> if line x of tVar="" then
> delete line x of tVar
> end if
> end repeat
> put tVar into fld "output"
> put the long seconds - tStart into fld "timer1"
> end mouseUp
>
> #2:
> on mouseUp
> set the cursor to watch
> put the long seconds into tStart
> put fld "data" into tVar
> repeat for each line L in tVar
> add 1 to x
> if L="" then
> put "" into line x of tVar
> end if
> end repeat
> put tVar into fld "output"
> put the long seconds - tStart into fld "timer2"
> end mouseUp
>
> #3:
> on mouseUp
> set the cursor to watch
> put fld "Data" into tVar
> put the long seconds into tStart
> repeat for each line L in tVar
> if L<>"" then
> put L &cr after stdout
> end if
> end repeat
> if last char of stdout=cr then delete last char of stdout
> put stdout into fld "output"
> put the long seconds - tStart into fld "timer3"
> end mouseUp
>
>
>> On 30/08/2014 08:45, FlexibleLearning.com wrote:
>>> Peter Haworth <pete at lcsql.com> wrote
>>>
>>>> There's another situation where I use repeat with even though it's a
> little
>>>> slower than repeat for and I also alter the contents of the data I'm
>>>> repeating through without any problems.
>>>>
>>>> repeat with x=the number of lines in tVar down to to 1
>>>> if <data condition on line x of tVar> then
>>>> delete line x of tVar
>>>> end if
>>>> end repeat
>
>>> This is an insightful observation. Nice one, Pete!
>>>
>>> My stock method (and presumably the method you allude to above) is...
>>>
>>> repeat for each line L in tVar
>>> add 1 to x
>>> if <data condition on L> then put "" into line x of tVar
>>> end repeat
>>> filter tVar without empty
>>>
>>> Both methods operate on a single data set and avoid putting the output
>>> into a second variable which, for large datasets, involve an unnecessary
>>> memory overhead..
>>>
>>> Hugh Senior
>>> FLCo
>
>
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
More information about the use-livecode
mailing list