SC, Rev,and RB speed test

Dar Scott dsc at swcp.com
Sat Apr 17 18:08:02 EDT 2004


On Saturday, April 17, 2004, at 05:47 AM, Brian Yennie wrote:

> Good catch- yeah it probably could use a sweep of everything non 
> alphanumeric (replaced by spaces) before it begins. Rev will includes 
> commas and other punctuation as part of "words"...

I have looked at this a little.

The idea of working with words is important in a lot of cases.  
Fortunately, that can be addressed in a way that is not part of of the 
words X words product.  It is possible to normalized the text and to 
normalize each phrase.  This can be done with some regex.  The 
important part is that it does not involve changing the basic search.  
Replacing all non-word characters with a single space will also have a 
tiny improvement in searching.

I tried a couple things.

I created a set based approach by inverting the text and using 
intersect that is a little faster but will be a lot faster when/if 
nested arrays come.  It is much slower with the regular data.

I created a filter that looked at the min and max location of words and 
did some math on that.  That too had only a 20 % improvement.  With 
less common words or words used only a few times, this would have a 
greater improvement.  This is similar to the set method but uses a sort 
of weak set.

Dar Scott



More information about the use-livecode mailing list