Semi-automatic Index generation?

David Bovill david at openpartnership.net
Thu Jul 31 07:16:46 EDT 2008


Thanks for the tips!

2008/7/31 viktoras didziulis <viktoras at ekoinf.net>

> Hi David,
>
> you might wish to discard the 1000 most frequently used words from your
> list:
> English: http://web1.d25.k12.id.us/home/curriculum/fuw.pdf
> German: http://german.about.com/library/blwfreq01.htm
>
> Another approach is statistical - take the whole text, sort words by their
> frequency (count) of appearance in the text. If you put them on a graph you
> would notice  characteristic 'power law' distribution. Set the absolute or
> relative frequency or count value at which to cut the tail. This tail is
> what holds all the rare or interesting words of the text. For example if the
> text is large you may discard the first 500-1000 words in the list sorted by
> word count. All words that remain should be the ones that are more-less
> interesting.
>
> The easy way produce such a frequency list is by using arrays. The
> principle is like this:
>
> local arrayWords
> repeat for each word myWord in theText
> add 1 to arrayWords[myWord]
> end repeat
>
> now the keys are words and values are word counts in arrayWords.
>
> Best wishes
> Viktoras
>
>
> David Bovill wrote:
>
>> Is there a resource/ index that any one knows of for plain uninteresting
>> dull words. I want to take arbitrary chunks of text and search for
>> "interesting" words - that is domain specific words that might be useful
>> to
>> links to create dictionary entries. This would mean creating a list of
>> words
>> and stripping "the" "it" etc. I am imagining it working like a spelling
>> dictionary with the ability to manually edit entries - but I'd like a good
>> starting list? Not sure what to search for :)
>> _______________________________________________
>> use-revolution mailing list
>> use-revolution at lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your
>> subscription preferences:
>> http://lists.runrev.com/mailman/listinfo/use-revolution
>>
>>
>>
>>
>
> _______________________________________________
> use-revolution mailing list
> use-revolution at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-revolution
>



More information about the use-livecode mailing list