Fastest memory based search technique (was: Adding 1 to an array)

David Bovill david at openpartnership.net
Sun Jun 24 13:51:41 EDT 2007


On 24/06/07, Jim Ault <JimAultWins at yahoo.com> wrote:
>
> I would recommend that you try to institute a controlled word dictionary,
> rather than let the user create key words since this will defeat the
> description of 'instant'.  Even with very fast databases, the "SQL join"
> operation (which sounds like your method) can get slower and slower.


So whats wrong with the idea of creating arrays and then making unions of
these arrays to get a fast in memory result equivalent to "SQL join"? My
thought is to do this and then make an asynch call to a slower sql based
search for more esoteric results.

You might want to send a note directly to Eric Chatonet about this because I
> believe he has done massive work and study on this topic (and
> multi-lingual
> to boot!).  The key word for searching this archive I believe would be
> "stemmer",  which is a way of defining which words are similar.  This
> would
> be overkill for your project, but the fast techniques he uses could be
> applied in some form.


Ok - good idea - multilingual would be nice actually.

Certainly you are developing a specialized application, but it is a road
> well-traveled by many.


Definitely! Actually I was pretty surprised bu the lack of stuff out there
in database land on this. It seemed like such a standard type of search.

Don't forget the idea adding a comment line at the beginning of the handler
> that contains the keywords used in a single location.
>
> on doThisForMe
> --tagg doThisForMe calcPixels repeat for each union using
>
> end
>
> then harvest them by...
> filter handlersText with "--tagg*"  =>


Good idea - I'll add that (is it something you use?).

I have scripts that automatically extract keywords and link relationships
between the handlers - so a function "xml_GetText" is split into words
"xml,get,text" - I usually throw away the common stuff like "get", then I
extract all the handler calls, and the built in transcript calls and
generate keywords, and dependency relationships from them.

In the old dictionary these keywords and links were stored on the card, and
the user could edit them manually - but in the new one they are stored
online (in subversion) and cached locally as text files. Your idea of adding
keywords inline makes sense for that.

The documentation for the handlers is stored as text in a wiki - but you can
read and write to the wiki directly from a simple stack linked to the script
editor (or use the web interface).



More information about the use-livecode mailing list