Semi-automatic Index generation?
Devin Asay
devin_asay at byu.edu
Thu Jul 31 13:44:29 EDT 2008
On Jul 31, 2008, at 2:12 AM, viktoras didziulis wrote:
> Hi David,
>
> you might wish to discard the 1000 most frequently used words from
> your
> list:
> English: http://web1.d25.k12.id.us/home/curriculum/fuw.pdf
> German: http://german.about.com/library/blwfreq01.htm
>
> Another approach is statistical - take the whole text, sort words by
> their frequency (count) of appearance in the text. If you put them
> on a
> graph you would notice characteristic 'power law' distribution. Set
> the
> absolute or relative frequency or count value at which to cut the
> tail.
> This tail is what holds all the rare or interesting words of the text.
> For example if the text is large you may discard the first 500-1000
> words in the list sorted by word count. All words that remain should
> be
> the ones that are more-less interesting.
>
> The easy way produce such a frequency list is by using arrays. The
> principle is like this:
>
> local arrayWords
> repeat for each word myWord in theText
> add 1 to arrayWords[myWord]
> end repeat
>
> now the keys are words and values are word counts in arrayWords.
Slick, and so simple. This is going into my script library. Thanks,
Viktoras!
Regards,
Devin
Devin Asay
Humanities Technology and Research Support Center
Brigham Young University
More information about the use-livecode
mailing list