Speed testing: Fastest search method

Alex Tweedly alex at tweedly.net
Sun Aug 31 17:04:46 EDT 2014


Hugh,

The condition you've chosen for deciding whether to delete the line is 
whether or not the line is empty.
So in method 2, replacing those lines by "" has no effect on the data.

That is, I think, an inadequate benchmark. The primary cost you would 
encounter in the real case with method 2 is the data copying caused by 
emptying the line (i.e. all subsequent content in the variable has to be 
copied down into place). So at the very least it gives very optimistic 
times for method 2.

I tried the same code (more or less - added the "filter" for method 2 to 
get the same results) on a data set which contains real lines of data to 
be deleted.

I also added  method4, which tries to get the best of both worlds. It 
restricts the additional memory usage (by building up a second variable, 
but removing sections of the input variable at the same time), and also 
does relatively few deletions (and hence few data copies).

(full script included below)

LC7DP10, MacOSX 10.8.5 Macbook Pro

Source data:
   10,000 lines of 100 chars, alternating between two values (non-random 
so it would be completely repeatable)
   5000 lines begin with char "a", other 5000 being with "b"

Condition for deletion
   First char of the line is "a"
    (hence 5000 lines are deleted

Method 1:
  not timed - was taking too long :-)

Method 2:
   11.51 secs

Method 3:
   0.04 secs

Method 4:
    0.09 secs

Changing the conditional test so that no lines are deleted, the times 
are all < 0.1 seconds


Conclusion:
   if memory is not an issue, method 3 is best
   if memory is an issue, method 4 is best

-- Alex.


on mouseUp
    local t1, t2, t3, t4
    --   put method1() into t1
    put method2() into t2
    put method3() into t3
    put method4() into t4

    put the number of lines in t1 && the number of lines in t2 && the 
number of lines in t3  && the number of lines in t4 after fld 1
end mouseup

function make_data
    local L, L1, t
    put "a" into L
    put "b" into item 100 of L
    put "b" into L1
    put "a" into item 100 of L1
    repeat 5000 times
       put L & CR & L1 & CR after t
    end repeat
    return t
end make_data

function method1
    local tVar, tStart
    set the cursor to watch
    put make_data() into tVar
    put the long seconds into tStart
    repeat with x=the number of lines in tVar down to 1
       if char 1 of line x of tVar="a" then
          delete line x of tVar
       end if
    end repeat
    put the long seconds - tStart & CR after fld 1
    return tVar
end method1

function method2
    local tVar, L, tStart, x
    set the cursor to watch
    put make_data() into tVar
    put the long seconds into tStart
    repeat for each line L in tVar
       add 1 to x
       --      put x && L &CR after msg
       if char 1 of L="a" then
          put "" into line x of tVar
       end if
    end repeat
    filter tVar without empty
    put the long seconds - tStart & CR after fld 1
    return tVar
end  method2

function method3
    local tVar, tStart, L, tdout
    set the cursor to watch
    put make_data() into tVar
    put the long seconds into tStart
    repeat for each line L in tVar
       if char 1 of L<>"a" then
          put L &cr after tdout
       end if
    end repeat
    if last char of tdout=cr then delete last char of tdout
    put the long seconds - tStart & CR after fld 1
    return tdout
end method3

function method4
    constant K = 1000 -- probably should be a bigger number

    local tVar, tStart, L, tdout, x
    set the cursor to watch
    put make_data() into tVar
    put the long seconds into tStart

    repeat forever
       if the number of lines in tVar = 0 then exit repeat
       put 0 into x
       repeat for each line L in tVar
          if char 1 of L<>"a" then
             put L &cr after tdout
          end if
          add 1 to x
          if x >= K then exit repeat
       end repeat
       delete line 1 to K of tVar
    end repeat
    if last char of tdout=cr then delete last char of tdout
    put the long seconds - tStart & CR after fld 1
    return tdout
end method4



On 31/08/2014 10:53, FlexibleLearning.com wrote:
> Some benchtesting...
>
> Setup:
>      LC7DP10, Windows 7
>
> Source data:
>      10,000 lines of random data
>      100 chars per line
>      3,346 empty lines
>
> Task:
>      Strip lines where a given condition is met.
>
> Results:
>      Method 1
>      Operating on a single variable, 'repeat with' + delete line
>      25.586 secs
>
> Method 2
>      Operating on a single variable, 'repeat for each' + filter with empty
>      7.755 secs
>
> Method 3
>      Using a second variable for output, 'repeat for each' + second variable
>      0.136 secs
>
> Conclusions:
>      If memory is an issue, then Method 2 is best
>      If memory is not an issue, then Method 3 is best
>
> Scripts applied:
> #1:
> on mouseUp
>     set the cursor to watch
>     put the long seconds into tStart
>     put fld "Data" into tVar
>     repeat with x=the number of lines in tVar down to 1
>        if line x of tVar="" then
>           delete line x of tVar
>        end if
>     end repeat
>     put tVar into fld "output"
>     put the long seconds - tStart into fld "timer1"
> end mouseUp
>
> #2:
> on mouseUp
>     set the cursor to watch
>     put the long seconds into tStart
>     put fld "data" into tVar
>     repeat for each line L in tVar
>        add 1 to x
>        if L="" then
>           put "" into line x of tVar
>        end if
>     end repeat
>     put tVar into fld "output"
>     put the long seconds - tStart into fld "timer2"
> end mouseUp
>
> #3:
> on mouseUp
>     set the cursor to watch
>     put fld "Data" into tVar
>     put the long seconds into tStart
>     repeat for each line L in tVar
>        if L<>"" then
>           put L &cr after stdout
>        end if
>     end repeat
>     if last char of stdout=cr then delete last char of stdout
>     put stdout into fld "output"
>     put the long seconds - tStart into fld "timer3"
> end mouseUp
>
>
>> On 30/08/2014 08:45, FlexibleLearning.com wrote:
>>> Peter Haworth <pete at lcsql.com> wrote
>>>
>>>> There's another situation where I use repeat with even though it's a
> little
>>>> slower than repeat for and I also alter the contents of the data I'm
>>>> repeating through without any problems.
>>>>
>>>> repeat with x=the number of lines in tVar down to to 1
>>>>      if <data condition on  line x of tVar> then
>>>>         delete line x of tVar
>>>>      end if
>>>> end repeat
>
>>> This is an insightful observation. Nice one, Pete!
>>>
>>> My stock method (and presumably the method you allude to above) is...
>>>
>>> repeat for each line L in tVar
>>>       add 1 to x
>>>       if <data condition on  L> then put "" into line x of tVar
>>> end repeat
>>> filter tVar without empty
>>>
>>> Both methods operate on a single data set and avoid putting the output
>>> into a second variable which, for large datasets, involve an unnecessary
>>> memory overhead..
>>>
>>> Hugh Senior
>>> FLCo
>
>
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode





More information about the use-livecode mailing list