custom property searching speed question
ambassador at fourthworld.com
Wed Mar 2 14:29:27 CST 2005
Lynch, Jonathan wrote:
> Say you had a custom property set - "theProps"
> And the keys of this set were 1 through 500,000
> So, basically, you had an array with half a million elements stored as a
> custom property set. Then, you wanted to search that array, and do it in
> such a way that the search returned the name and content of the first
> custom property that contains the item for which you search.
> Would the following method be fastest?
> Set the custompropertyset to "theProps"
> Put 0 into Z
> Repeat for each element E in the customproperties of <myobject>
> Add 1 to Z
> If E contains <searchterm> then exit repeat
> End repeat
> Put Z && the customproperties[Z] of <myobject> into field "feedback"
> My questions are this:
> 1) Is there a better or faster-access way to store the array than as a
> custom property set?
Using "repeat for each line" on a string benchmarked about 20% faster
here than using "repeat for each element" on an array.
I only tested on 40,000 lines, though. In general "repeat for each..."
scales well, so I would feel fairly confident extrapolating my results
to larger data sets.
> 2) Is there a search method that is faster than doing all those
> comparisons in transcript? For example, lineoffset and itemoffset are
> supposed to be very fast. Is it possible to use itemoffset on an array,
> or is there anything that works like an elementoffset command would
> work, if it existed?
> 3) Would it be faster to combine the array, and use itemoffset?
Offset can rip through large blocks of text very fast, but it's not very
For example, in my case I had to do comparisons on specific items within
a line. LineOffset will get you to that line, but won't tell you where
within that line it is.
With a low number of hits lineOffset can be faster to find the line, and
then you could evaluate specific elements to find the item if needed.
But for larger numbers of hits it should be slower, since once you find
the line you still need to "get line x", and that requires the engine to
count lines. Requiring the engine to count lines is the bottleneck.
So for my purposes, using "repeat for each line" gave me a consistently
scalable solution which allowed me to query any items within a line
without ever having to count lines. So lazy person that I am, I stopped
there and moved on to other things. :)
> 4) Could filter be made to work in this situation, or would it only give
> the value of the element, but not the name of the element?
Filter may benchmark the fastest if you're looking for an item anywhere
in a line (never tried it myself, since I need to find matches for
specific items within lines, but worth testing).
> 5) Anything faster that I am not thinking of?
Probably. Search algorithms are a deep subject, and there's always one
more clever way to solve a given problem.
I stopped benchmarking these things once I found that "repeat for each
line" was coming out okay. Because I have so many comparisons to
perform on specific items within a line, it gave me a robust (though
admittedly brute force) solutuion with acceptable performance on data
sets larger than will be needed in real-world performance with my app's
But with half a million records it begs the question: Why not consider
a database, where searching is done by compiled code optimized by people
who specialize in such things?
Fourth World Media Corporation
Rev tools and more: http://www.fourthworld.com/rev
More information about the use-livecode