index library
Brian Yennie
briany at qldlearning.com
Mon Apr 19 01:23:53 EDT 2004
If you're looking for inspiring documentation, I would recommend
checking out Apple's old AIAT SDK. It's the basis for MacOS
Find-By-Content, but the really cool thing is that the documentation
spells out very nicely how vector and inverted indices work.
Or... try googling "inverted vector index".
In the past I've hooked several engines up with Rev (including AIAT)
but they all required externals and/or separate apps running (there's a
Java-based spinoff of AIAT called "Lucene" from the Apache project
which is interesting but you'd have to write a java app and talk back
and forth most likely).
If you could implement the basic inverted vector index algorithms and
figure out an efficient way to store the indices on disk, it could
become a pretty decent engine in Transcript, even if it might not be
suitable for indexing your hard drive or spidering the web...
For more fun reading, there's stemming (which is pretty crude and
easy), thesauri (which you have to be very careful with or you just
increase noise), stopword removal (i.e. cutting out the "and" and "the"
words), and relevancy ranking. All of this is covered in the
aforementioned AIAT SDK.
Pretty interesting stuff, keep me posted if you take a crack at it- I
can't really co-conspire at the moment but I'd be happy to chime in
where I'm helpful.
HTH,
Brian
> hypertexting of words in a large text corpus. I can find several such
> libraries on web,
> but in languages that dont port well to transcript (ie, needing
> pointers and
> multidim arrays. sigh). I would gladly work with anybody wanting to do
> one.
More information about the use-livecode
mailing list