Newbie find, search command reprise ...

Alex Tweedly alex at tweedly.net
Fri Dec 17 18:50:12 EST 2004


Earlier this week, it was said:

> > I want to search for ALL instances of Xmas and display the whole
> > matching line of each found in a list field.
> > So far I can get it to display the first line only ( Father Xmas    Is
> > Drunk) !
> > Any help would appreciated!!! I have checked the tutorials for help
> > but couldn't find what I was looking for,
> > is there an example stack or notes anyone may know of?
> >
>The filter command is the fastest method, but be careful as it is a
>destructive search i.e. it deletes the liens that don't match, so copy
>your original data into a variable, filter that and keep the original
>intact.

and a few people said minor variations on this; summary of them being that 
"filter" was fastest, though not always most convenient.

Well, it ain't always so.

I ran into some performance issues with my latest app this evening - 
something that I expected to be almost instantaneous took a long time. I 
assumed it was just a bug, but eventually convinced myself it wasn't - it 
was just taking too long. So I looked more closely at the parts where I was 
using "filter" quite heavily.

Input data : 10,000 lines, 1.5 Mb; the lines are all much the same size.
Search data: fairly long strings (32 characters each).
Success rate: each string happens between 2 and 5 times in the input.

In this case, "filter" appeared to be really, horribly slow - so I came up 
with a simple benchmark to check it.

on mouseUp
   put the millisecs into tStartTime
   set the itemDel to TAB
   repeat 4 times
     put gFiles into tFList
     filter tFList with "*cb2f8d231f68c5d70b3e62ed0a3c4c8f*"
   end repeat
   put "took " & the millisecs - tStartTime && the number of lines in 
tFList & cr & tFList & cr after msg

   put the millisecs into tStartTime
   put empty into fList
   repeat 4 times
     repeat for each line L in gFilesput gFiles into tFList
       if  "cb2f8d231f68c5d70b3e62ed0a3c4c8f" is in L then put L & CR after 
tFList
     end repeat
   end repeat
   put "took " & the millisecs - tStartTime && the number of lines in 
tFList & cr & tFList & cr after msg


The first version took 800 msec, while the second one took less than 1 msec.

I thought it must be because of the need to copy the input string to avoid 
destroying it (though I do need to do that in the real example), so I 
modified it to

   put gFiles into tFList1
   put gFiles into tFList2
   put gFiles into tFList3
   put gFiles into tFList4
   put the millisecs into tStartTime
   set the itemDel to TAB
   filter tFList1 with "*" & "cb2f8d231f68c5d70b3e62ed0a3c4c8f" & "*"
   filter tFList2 with "*" & "cb2f8d231f68c5d70b3e62ed0a3c4c8f" & "*"
   filter tFList3 with "*" & "cb2f8d231f68c5d70b3e62ed0a3c4c8f" & "*"
   filter tFList4 with "*" & "cb2f8d231f68c5d70b3e62ed0a3c4c8f" & "*"
   put "took " & the millisecs - tStartTime && the number of lines in 
tFList & cr after msg

Note that the time reported excludes the four copies; this still took 760 ms.

So at least in some (reasonable) cases, filter is far from being the 
fastest.  I suspect that the offset method Jacqueline suggested would be 
even faster - but since it was already less than 1 millisec I didn't pursue 
that thought.

-- Alex.


More information about the use-livecode mailing list