Porter Stemmer
Richard Gaskin
ambassador at fourthworld.com
Sun Sep 9 13:17:10 EDT 2012
Peter Haworth wrote:
> Hi Richard,
> It;s not a Livecode soltuion but sqlite uses a Porter tokenizer for its
> Full Text Search tables. I imagine the source code (most likely C) will be
> on the sqlite web site somewhere. Alterntively, I found it at
> http://tartarus.org/~martin/PorterStemmer/
>
> Then all you have to do is turn it into an external!
Until externals are as easy to use in LC as they are in SC I try to
avoid them, and LC is fast enough that it's generally easy to do for
text parsing things like this. Besides, native LiveCode script is
inherently cross-platform, but I'd need to make separate versions of an
external for each platform.
Porter is so popular there are a great many examples in other languages:
<http://snowball.tartarus.org/otherlangs/index.html>
Hopefully we can add one there for LiveCode as well.
I've written the caretaker for Eric's site, and have been in contact
with another list member, Andrew Meit, who posted another one in the old
RevOnline as well (why weren't those older resources migrated to the new
RevOnline version?).
Both Eric's and Andrew's use the original Porter algo, while ideally it
should be updated to use the newer Porter2 from 2006.
Andrew has sent me his, but Eric's would be nice to have because his
library also included the Porter stemmers for the romance languages as
well. I don't need those myself right now, but if his estate sees fit
to make them open source they would be a nice addition to the community.
If anyone's interested I'll drop a note if I have a chance to get
Andrew's implementation updated to Porter2.
--
Richard Gaskin
Fourth World
LiveCode training and consulting: http://www.fourthworld.com
Webzine for LiveCode developers: http://www.LiveCodeJournal.com
Follow me on Twitter: http://twitter.com/FourthWorldSys
More information about the use-livecode
mailing list