How to get word offset all instances of a string in a chunk of text?

Alex Tweedly alex at tweedly.net
Thu Aug 30 06:10:18 EDT 2018


OK, this time I'm just typing into email - havent tested these 
suggestions :-)


On 30/08/2018 10:24, Keith Clarke via use-livecode wrote:
> Folks,
> Is there a single-pass mechanism or more efficient way of returning the wordOffset of each instance of ‘the’ in ‘the quick brown fox jumped over the lazy dog’ than to use two passes through the text?
Yes. For a single word myWord

put 0 into tOffset
repeat forever
   put trueWordOffset(myWord, tSource, tOffset) into tmp
   if tmp > 0 then
     put tmp & comma after tOffsetList
     put tmp into tOffset
   end if
end repeat

BUT there's a chance that this performs poorly, becuase of repeated 
skipping, so I would also benchmark the simpler
put 0 into tOffset
repeat for each trueWord W in tSource
   add 1 to tOffset
   if W = myWord then
      put tOffset & comma after tOffsetList
   end if
end repeat
> Pass-1. Count the instances of ‘the’ into an array and then
> Pass-2. Repeat for the count of instances using wordOffset, with a wordsToSkip variable derived from the previous loop’s offset
>
> I’m I’m wondering if there’s something I’ve not yet learned about (nested?) arrays that might extend the unique word counter code that Alex, Paul & others helped me to fix a few days ago, to add a sub-array of wordOffset alongside word count?
I'm not entirely sure what you want here, or what the 'N' below are.
Do you want a count and an offsetList for each word ? If so, no need for 
nested arrays.

Then I'd change your second loop below to:

repeat for each trueWord W in tSource
    add 1 to tOffset
    if tANoise[W] then next repeat
    add 1 to tAWordCount[W]
    put tOffset & comma after tAWordOffsets[W]
end repeat

and of course the third loop to

repeat for each key K in tAWordCount
    put k && tAWordCount[K] & CR after tmp
end repeat
sort lines of tmp descending numeric by word 2 of each
put tmp into fld "Words"
  

If I've misunderstood what you want, please say so and I'll try again :-)

Alex.

>
> # Prepare noisewords array
>
> repeat for each trueWord W in tNoiseWords
>
> put true into tANoise[W]
>
> end repeat
>
>
> # Build unique words array
>
> repeat for each trueWord W in tSource
>
> if tANoise[W] then next repeat
>
> add 1 to tAWords[W][N]
>
> end repeat
>
>
> # Convert unique words array to list
>
>
> repeat for each key K in tAWords
>
> put K && tAWords[K][N] & CR after fld "Words"
>
> end repeat
>
>
> sort lines of field "Words" descending numeric by word 2 of each
>
>
> end repeat
>
> Any ideas or steer towards a lesson / worked example greatly appreciated.
> Best,
> Keith
>      
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode





More information about the use-livecode mailing list