custom property searching speed question

Richard Gaskin ambassador at fourthworld.com
Wed Mar 2 15:29:27 EST 2005


Lynch, Jonathan wrote:
> Say you had a custom property set - "theProps"
> And the keys of this set were 1 through 500,000
> 
> So, basically, you had an array with half a million elements stored as a
> custom property set. Then, you wanted to search that array, and do it in
> such a way that the search returned the name and content of the first
> custom property that contains the item for which you search.
> 
> Would the following method be fastest?
> 
>   Set the custompropertyset to "theProps"
>   Put 0 into Z
>   Repeat for each element E in the customproperties of <myobject>
>     Add 1 to Z
>     If E contains <searchterm> then exit repeat
>   End repeat
>   Put Z && the customproperties[Z] of <myobject> into field "feedback"
> 
> 
> My questions are this:
> 1) Is there a better or faster-access way to store the array than as a
> custom property set?

Using "repeat for each line" on a string benchmarked about 20% faster 
here than using "repeat for each element" on an array.

I only tested on 40,000 lines, though.  In general "repeat for each..." 
scales well, so I would feel fairly confident extrapolating my results 
to larger data sets.

> 2) Is there a search method that is faster than doing all those
> comparisons in transcript? For example, lineoffset and itemoffset are
> supposed to be very fast. Is it possible to use itemoffset on an array,
> or is there anything that works like an elementoffset command would
> work, if it existed?
> 3) Would it be faster to combine the array, and use itemoffset?

Offset can rip through large blocks of text very fast, but it's not very 
precise.

For example, in my case I had to do comparisons on specific items within 
a line.  LineOffset will get you to that line, but won't tell you where 
within that line it is.

With a low number of hits lineOffset can be faster to find the line, and 
then you could evaluate specific elements to find the item if needed.

But for larger numbers of hits it should be slower, since once you find 
the line you still need to "get line x", and that requires the engine to 
count lines.  Requiring the engine to count lines is the bottleneck.

So for my purposes, using "repeat for each line" gave me a consistently 
scalable solution which allowed me to query any items within a line 
without ever having to count lines.  So lazy person that I am, I stopped 
there and moved on to other things. :)

> 4) Could filter be made to work in this situation, or would it only give
> the value of the element, but not the name of the element?

Filter may benchmark the fastest if you're looking for an item anywhere 
in a line (never tried it myself, since I need to  find matches for 
specific items within lines, but worth testing).

> 5) Anything faster that I am not thinking of?

Probably.  Search algorithms are a deep subject, and there's always one 
more clever way to solve a given problem.

I stopped benchmarking these things once I found that "repeat for each 
line" was coming out okay.  Because I have so many comparisons to 
perform on specific items within a line, it gave me a robust (though 
admittedly brute force) solutuion with acceptable performance on data 
sets larger than will be needed in real-world performance with my app's 
audience.

But with half a million records it begs the question:  Why not consider 
a database, where searching is done by compiled code optimized by people 
who specialize in such things?

--
  Richard Gaskin
  Fourth World Media Corporation
  __________________________________________________
  Rev tools and more: http://www.fourthworld.com/rev


More information about the use-livecode mailing list