Words Indexing strategies

Bernard Devlin bdrunrev at gmail.com
Fri Feb 12 05:32:12 EST 2010


On Fri, Feb 12, 2010 at 3:24 AM, Alejandro Tejada
<capellan2000 at gmail.com> wrote:
> I have a dll named: dbsqlite.dll (452 K) in my Rev Studio instalation.
> If an experienced database developer could lend a hand, i would be
> really grateful.

Hi Alejandro

Ok, since you don't know anything about databases, I'm assuming that
it is also going to be quite a lot of work to deal with compiling the
sqlite FT plugin for the different platforms too. (I've no idea how
hard that would be or what problems you might meet with - but in
general these things are always harder than they are supposed to be).
So, let's consider a Rev-only solution, and if that looks like a
hopeless case then it will make the work required to deal with
databases and compilation more worthwhile.

As I'm slow on the uptake, I am still not entirely sure what I think
it is you are trying to do.

Am I right that given these search terms: baboon OR monkey AND fruit

and

index file b.tgz contains a line like this: baboon: 1,5,9
index file m.tgz contains a line like this: monkey: 2,7,17
index file f.tgz contains a line like this: fruit: 3,7,23

you would want the result of your search to be: 7 i.e. the number of
the article that matches the boolean search?  Unless I've
misunderstood, what you want to do is combine indexes in order to
satisfy boolean combinations of search terms.

However, it looks to me like the existing indexes don't contain enough
information for you to calculate frequency of occurrence (a measure of
relevance).  And depending on how these pre-existing indexes have been
constructed they may not have any stemming information in them.  You
might be able to build some kind of rough stemming algorithm in Rev
(by doing rough pluralization like 'baboon*', but as Richard pointed
out more complex plurals like 'children' will be where the work
comes).

Are you looking for an approximate solution?  Or do you need greater
flexibility of scope and relevance scores, etc. ?

Bernard



More information about the use-livecode mailing list