which is faster for searching?

Sun Aug 31 04:37:01 EDT 2014

Hi Alex

Am struggling to follow your comments and conclusions. Are you saying it
would be more memory efficient to store the output in a second variable...

repeat for each line L in tVar
    if <data condition on  line x of tVar> then
      put L &cr after stdout
    end if
end repeat
if last char of stdout=cr then delete last char of stdout
return stdout

If there are no matching lines (worst case), then this would require a 100%
memory overhead.

I am obviously mis-reading you.

Hugh Senior
FLCo

From: Alex Tweedly <alex at tweedly.net>
To: use-livecode at lists.runrev.com
Subject: Re: which is faster for searching?

Two comments ....

1. The two methods below produce different results - the latter one 
removes lines that were empty in the original input data, the other 
doesn't (unless that happens to match "data condition on line x").

2. In the case of large (or huge) datasets, Hugh's concern about 
avoiding memory overhead may be valid - but those cases are exactly the 
ones where you most need to be concerned about efficiency.

If, for example, the input data consists of 10,000,000 lines and perhaps 
each line was 100 chars, then we have 1Gb of data. Let's say we retain 
50% of the lines (say every alternate one) - then the "repeat for each" 
method adds 1/2 Gb of virtual memory, and requires copying only 1/2Gb of 
data during this operation.

However, although both the methods below don't add the virtual memory 
requirement, they copy a LOT of data - each time we delete (or empty) a 
line, that causes all the subsequent data to be copied 'in place', so it 
will require (approx) 2,500,000 Gb of data copying (5M * 1/2Gb average 
remaining data size). So we copy 2.5Pb of data - that's going to cost us 
a whole lot more time than any paging needed for 1/2Gb of extra virtual 
memory.

-- Alex.

On 30/08/2014 08:45, FlexibleLearning.com wrote:
> Peter Haworth <pete at lcsql.com> wrote
>
>> There's another situation where I use repeat with even though it's a
little
>> slower than repeat for and I also alter the contents of the data I'm
>> repeating through without any problems.
>>
>> repeat with x=the number of lines in tVar down to to 1
>>     if <data condition on  line x of tVar> then
>>        delete line x of tVar
>>     end if
>> end repeat

> This is an insightful observation. Nice one, Pete!
>
> My stock method (and presumably the method you allude to above) is...
>
> repeat for each line L in tVar
>      add 1 to x
>      if <data condition on  L> then put "" into line x of tVar
> end repeat
> filter tVar without empty
>
> Both methods operate on a single data set and avoid putting the output
into
> a second variable which, for large datasets, involve an unnecessary memory
> overhead..
>
> Hugh Senior
> FLCo