Porter Stemmer

Richard Gaskin ambassador at fourthworld.com
Sun Sep 9 13:17:10 EDT 2012


Peter Haworth wrote:

> Hi Richard,
> It;s not a Livecode soltuion but sqlite uses a Porter tokenizer for its
> Full Text Search tables.  I imagine the source code (most likely C) will be
> on the sqlite web site somewhere.  Alterntively, I found it at
> http://tartarus.org/~martin/PorterStemmer/
>
> Then all you have to do is turn it into an external!

Until externals are as easy to use in LC as they are in SC I try to 
avoid them, and LC is fast enough that it's generally easy to do for 
text parsing things like this.  Besides, native LiveCode script is 
inherently cross-platform, but I'd need to make separate versions of an 
external for each platform.

Porter is so popular there are a great many examples in other languages:
<http://snowball.tartarus.org/otherlangs/index.html>

Hopefully we can add one there for LiveCode as well.

I've written the caretaker for Eric's site, and have been in contact 
with another list member, Andrew Meit, who posted another one in the old 
RevOnline as well (why weren't those older resources migrated to the new 
RevOnline version?).

Both Eric's and Andrew's use the original Porter algo, while ideally it 
should be updated to use the newer Porter2 from 2006.

Andrew has sent me his, but Eric's would be nice to have because his 
library also included the Porter stemmers for the romance languages as 
well.  I don't need those myself right now, but if his estate sees fit 
to make them open source they would be a nice addition to the community.

If anyone's interested I'll drop a note if I have a chance to get 
Andrew's implementation updated to Porter2.

--
  Richard Gaskin
  Fourth World
  LiveCode training and consulting: http://www.fourthworld.com
  Webzine for LiveCode developers: http://www.LiveCodeJournal.com
  Follow me on Twitter:  http://twitter.com/FourthWorldSys




More information about the use-livecode mailing list