surprising filter benchmarks

Richard Gaskin ambassador at fourthworld.com
Tue Jul 12 17:10:23 EDT 2005


Eric Chatonet wrote:
> Hi Richard,
> 
> I think the speed depends on the filter complexity.
> For instance:
> 
> on mouseUp
>   repeat 100000
>     if random (2) = 1 then put "zaz" & cr after tList
>     else put "zbz" & cr after tList
>   end repeat
>   -----
>   put the milliseconds into tStart1
>   filter tList with "*a*"
>   put the milliseconds - tStart1 into tResult1
>   -----
>   put the milliseconds into tStart2
>   repeat for each line tLine in tList
>     if "a" is in tList then put tLine & cr after tNewList
>   end repeat
>   delete char -1 of tNewList
>   put the milliseconds - tStart2 into tResult2
>   -----
>   put "Filter: " && tResult1 & cr &"Repeat:" &&  tResult2
> end mouseUp
> 
> Results -
>    Filter: 41
>    Repeat: 117

To get cleaner results I think the second test's "is in tList" should be 
"is in tLine", which also cuts execution time down dramatically.

But the central point remains:  with a small number of criteria the 
filter command does a fine job compared to repeat loops, but for complex 
criteria (in my app it's rare that we'll ever have fewer than three 
distinct comparisons) "repeat for each" does well.

Another advantage of "repeat for each" is that it allows "or" in additon 
to "and", which would require multiple passes with "filter", and makes 
it easy to structure comparisons using parentheses to control the order 
of precedence.

For the moment I'm sticking with the repeat loop for the situation I'm 
currently using it in, but it's good to know that filter is quick for 
simple searches.

-- 
  Richard Gaskin
  Fourth World Media Corporation
  ___________________________________________________________
  Ambassador at FourthWorld.com       http://www.FourthWorld.com


> 
> So may be we have to choose the right method according to the context.
> Two cents that do not make life easier :-)
> 
> Le 12 juil. 05 à 22:26, Richard Gaskin a écrit :
> 
>> I figured the filter command would carry at least some overhead for  
>> its convenience, but I had no idea how much!
>>
>> I wrote the test below to compare it with walking through a list  line 
>> by line, and the results were surprising:
>>
>> on mouseUp
>>   put  fwdbCurTableData() into s -- gets 10,800 lines of
>>   --                                tab-delimited data
>>   --
>>   -- Method 1: filter command
>>   --
>>   put format("*a*\t*r*\tr\t*\t*\t*\t*\t*") into tFilter
>>   put s into result1
>>   put the millisecs into t
>>   filter result1 with tFilter
>>   put the millisecs - t into t1
>>   --
>>   --
>>   -- Method 2: repeat for each
>>   --
>>   set the itemdel to tab
>>   put the millisecs into t
>>   repeat for each line tLine in s
>>     if item 1 of tLine contains "a" \
>>         AND item 2 of tLine contains "r"\
>>         AND item 3 of tLine is "r" then
>>       put tLine&cr  after result2
>>     end if
>>   end repeat
>>   delete last char of result2
>>   put the millisecs - t into t2
>>   --
>>   put result1 into fld "result"
>>   put result2 into fld "result2"
>>   --
>>   put "Filter: "&t1 &cr& "Repeat: "&t2
>> end mouseUp
>>
>>
>>
>> Results -
>>    Filter: 745
>>    Repeat: 40
>>
>> Did I miss something, or am I just seeing the penalty for the  filter 
>> command's generalization?




More information about the use-livecode mailing list