SC, Rev,and RB speed test
Dar Scott
dsc at swcp.com
Sat Apr 17 18:08:02 EDT 2004
On Saturday, April 17, 2004, at 05:47 AM, Brian Yennie wrote:
> Good catch- yeah it probably could use a sweep of everything non
> alphanumeric (replaced by spaces) before it begins. Rev will includes
> commas and other punctuation as part of "words"...
I have looked at this a little.
The idea of working with words is important in a lot of cases.
Fortunately, that can be addressed in a way that is not part of of the
words X words product. It is possible to normalized the text and to
normalize each phrase. This can be done with some regex. The
important part is that it does not involve changing the basic search.
Replacing all non-word characters with a single space will also have a
tiny improvement in searching.
I tried a couple things.
I created a set based approach by inverting the text and using
intersect that is a little faster but will be a lot faster when/if
nested arrays come. It is much slower with the regular data.
I created a filter that looked at the min and max location of words and
did some math on that. That too had only a 20 % improvement. With
less common words or words used only a few times, this would have a
greater improvement. This is similar to the set method but uses a sort
of weak set.
Dar Scott
More information about the use-livecode
mailing list