Semi-automatic Index generation?

Eric Chatonet eric.chatonet at sosmartsoftware.com
Wed Jul 30 10:18:48 EDT 2008


Bonjour David,

Le 30 juil. 08 à 16:08, David Bovill a écrit :

> Is there a resource/ index that any one knows of for plain  
> uninteresting
> dull words. I want to take arbitrary chunks of text and search for
> "interesting" words - that is domain specific words that might be  
> useful to
> links to create dictionary entries. This would mean creating a list  
> of words
> and stripping "the" "it" etc. I am imagining it working like a  
> spelling
> dictionary with the ability to manually edit entries - but I'd like  
> a good
> starting list? Not sure what to search for :)

1. You might search for what is called 'stopwords' (non interesting  
words) using any Internet search engine.
2. Have a look also at what is called 'stemming': http:// 
www.comp.lancs.ac.uk/computing/research/stemming/general/ that allow  
to reduce different words to the same form.
3. I have put on RevOnline an English, French, Italian, Spanish,  
German and Portuguese stemmer library (username: sosmartsoftware)  
that could help you too.

Best regards from Paris,
Eric Chatonet.
----------------------------------------------------------------
Plugins and tutorials for Revolution: http://www.sosmartsoftware.com/
Email: eric.chatonet at sosmartsoftware.com/
----------------------------------------------------------------





More information about the use-livecode mailing list