Finding common words and phrases in a large block of text?

Tom Glod tom at makeshyft.com
Thu Oct 25 21:42:03 EDT 2018


Hi Terry, I see, thanks for sharing your handler.  I'm going to run it on
some text and see the output.  LC is sooo good with chunks.....I find it
really fast as well.

All the best, Tom



On Thu, Oct 25, 2018 at 5:07 PM, Terry Judd via use-livecode <
use-livecode at lists.runrev.com> wrote:

> On 26/10/2018 4:27 am, "use-livecode on behalf of Tom Glod via
> use-livecode" <use-livecode-bounces at lists.runrev.com on behalf of
> use-livecode at lists.runrev.com> wrote:
>
>     Hi Terry, glad you found a solution.....
>
>     I have a similar challenge.
>
>     I did a word count, but would love to recognize the same phrases.  Did
> you
>     just compare chunks? ... hash them? (probably redundant?)
>
>     Are there any more hints you can drop about this?
>
>     Thanks,
>
>     Tom
>
> Hi Tom - I've just done something like the code below, which accepts a
> block of text and the maximum 'phrase' length as input and provides an
> array with sorted counts of word runs (so not necessarily sensible phrases)
> of different lengths as output. I think it will be good enough for my
> purposes.
>
> function getWordAndPhraseCounts pText, pMaxPhraseLength
>    put empty into tA1
>    set the itemDel to tab
>    repeat for each sentence tSentence in pText
>       put the number of words in tSentence into tMax
>       repeat with i = 1 to pMaxPhraseLength
>          repeat with j = 1 to (tMax-i+1)
>             put word j to j+i-1 of tSentence into tPhrase
>             add 1 to tA1[i][tPhrase]
>          end repeat
>       end repeat
>    end repeat
>    put empty into tA2
>    repeat for each line tLength in the keys of tA1
>       put empty into tList
>       repeat for each line tPhrase in the keys of tA1[tLength]
>          put tPhrase&tab& tA1[tLength][tPhrase]&cr after tList
>       end repeat
>       delete last char of tList
>       sort lines of tList descending numeric by item 2 of each
>       put tList into tA2[tLength]
>    end repeat
>    return tA2
> end getWordAndPhraseCounts
>
>
>     On Thu, Oct 25, 2018 at 4:27 AM Terry Judd via use-livecode <
>     use-livecode at lists.runrev.com> wrote:
>
>     > OK - was easier than I thought. I have something that works fast
> enough by
>     > iterating through runs of words in each sentence in a block of text,
>     > incrementing counts into an array and then sorting the contents of
> that
>     > array by phrase length and frequency.
>     >
>     > Terry...
>     >
>     > On 25/10/2018 4:56 pm, "use-livecode on behalf of Terry Judd via
>     > use-livecode" <use-livecode-bounces at lists.runrev.com on behalf of
>     > use-livecode at lists.runrev.com> wrote:
>     >
>     >     Hi – I’m looking to analyse some large block of text (journal
>     > abstracts from key educational technology journals over a several
> year
>     > period) to find common words and phrases. Finding common words
> should be
>     > easy enough but I’m not clear on what approach to take for finding
> common
>     > phrases (iterating through the text capturing overlapping word runs
> of
>     > various lengths?). Any ideas on how best to proceed?
>     >
>     >     TIA,
>     >
>     >     Terry...
>     >     _______________________________________________
>     >     use-livecode mailing list
>     >     use-livecode at lists.runrev.com
>     >     Please visit this url to subscribe, unsubscribe and manage your
>     > subscription preferences:
>     >     http://lists.runrev.com/mailman/listinfo/use-livecode
>     >
>     >
>     > _______________________________________________
>     > use-livecode mailing list
>     > use-livecode at lists.runrev.com
>     > Please visit this url to subscribe, unsubscribe and manage your
>     > subscription preferences:
>     > http://lists.runrev.com/mailman/listinfo/use-livecode
>     _______________________________________________
>     use-livecode mailing list
>     use-livecode at lists.runrev.com
>     Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
>     http://lists.runrev.com/mailman/listinfo/use-livecode
>
>
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
>



More information about the use-livecode mailing list