Newbie find, search command reprise ...
Alex Tweedly
alex at tweedly.net
Fri Dec 17 18:50:12 EST 2004
Earlier this week, it was said:
> > I want to search for ALL instances of Xmas and display the whole
> > matching line of each found in a list field.
> > So far I can get it to display the first line only ( Father Xmas Is
> > Drunk) !
> > Any help would appreciated!!! I have checked the tutorials for help
> > but couldn't find what I was looking for,
> > is there an example stack or notes anyone may know of?
> >
>The filter command is the fastest method, but be careful as it is a
>destructive search i.e. it deletes the liens that don't match, so copy
>your original data into a variable, filter that and keep the original
>intact.
and a few people said minor variations on this; summary of them being that
"filter" was fastest, though not always most convenient.
Well, it ain't always so.
I ran into some performance issues with my latest app this evening -
something that I expected to be almost instantaneous took a long time. I
assumed it was just a bug, but eventually convinced myself it wasn't - it
was just taking too long. So I looked more closely at the parts where I was
using "filter" quite heavily.
Input data : 10,000 lines, 1.5 Mb; the lines are all much the same size.
Search data: fairly long strings (32 characters each).
Success rate: each string happens between 2 and 5 times in the input.
In this case, "filter" appeared to be really, horribly slow - so I came up
with a simple benchmark to check it.
on mouseUp
put the millisecs into tStartTime
set the itemDel to TAB
repeat 4 times
put gFiles into tFList
filter tFList with "*cb2f8d231f68c5d70b3e62ed0a3c4c8f*"
end repeat
put "took " & the millisecs - tStartTime && the number of lines in
tFList & cr & tFList & cr after msg
put the millisecs into tStartTime
put empty into fList
repeat 4 times
repeat for each line L in gFilesput gFiles into tFList
if "cb2f8d231f68c5d70b3e62ed0a3c4c8f" is in L then put L & CR after
tFList
end repeat
end repeat
put "took " & the millisecs - tStartTime && the number of lines in
tFList & cr & tFList & cr after msg
The first version took 800 msec, while the second one took less than 1 msec.
I thought it must be because of the need to copy the input string to avoid
destroying it (though I do need to do that in the real example), so I
modified it to
put gFiles into tFList1
put gFiles into tFList2
put gFiles into tFList3
put gFiles into tFList4
put the millisecs into tStartTime
set the itemDel to TAB
filter tFList1 with "*" & "cb2f8d231f68c5d70b3e62ed0a3c4c8f" & "*"
filter tFList2 with "*" & "cb2f8d231f68c5d70b3e62ed0a3c4c8f" & "*"
filter tFList3 with "*" & "cb2f8d231f68c5d70b3e62ed0a3c4c8f" & "*"
filter tFList4 with "*" & "cb2f8d231f68c5d70b3e62ed0a3c4c8f" & "*"
put "took " & the millisecs - tStartTime && the number of lines in
tFList & cr after msg
Note that the time reported excludes the four copies; this still took 760 ms.
So at least in some (reasonable) cases, filter is far from being the
fastest. I suspect that the offset method Jacqueline suggested would be
even faster - but since it was already less than 1 millisec I didn't pursue
that thought.
-- Alex.
More information about the use-livecode
mailing list