Words Indexing strategies

Bernard Devlin bdrunrev at gmail.com
Thu Feb 11 04:13:46 EST 2010


On Wed, Feb 10, 2010 at 10:30 PM, Alejandro Tejada
<capellan2000 at gmail.com> wrote:
> Yes, each one of these 28 text files will be compressed
> in gz format. When users look for a word, or many words,
> only these file(s) are decompressed and searched.

Like Brian, I was going to suggest existing search technologies like
Lucene.  Why re-invent the wheel?  I understand you not wanting to
ship Java and get the user to install it.  However there may be other
pre-existing solutions to your problem.

Is it imperative to your solution that these 28 text compressed text
files are part of the solution?  I mean, are you trying to maintain
the structure of the solution such that someone who comes along and
looks at your solution can see where Rev fits in.  Or can your indexes
be stored in a database?

The reason I say this is because a) Valentina already has two forms of
text searches (one form is very fast but only looks up single words,
the other form can search an entire database using regex but is
slower, and probably not fast enough for your requirements).  Unless
you already have Valentina for each platform, this solution will
involve you in the cost of buying licenses.

The other thing to consider is that sqlite already has a full-text
search facility (although I think you may have to compile it as a
sqlite plug-in and distribute it with your application).   It does
things like word-stemming, stop lists, frequencies, etc.  You would
have to distribute this sqlite add-on with your solution.

http://ft3.sourceforge.net/v0.3/userguide.html
http://michaeltrier.com/2008/7/13/full-text-search-on-sqlite

I suspect that the sqlite option may be the best.  I can't testify to
the speed of its searching, but I seem to remember that the indexing
is asynchronous (not that that would be a consideration in your
application).

Using a pre-existing solution (even one that requires some
modification on your part) has got to be easier than building your own
full-text search system.  The above suggestions could still provide
you with a cross-platform solution.

If you still want a Rev-only solution let us know.  Maybe someone else
will chip in with suggestions :-)

Hope that helps.

Bernard



More information about the use-livecode mailing list