How to get word offset all instances of a string in a chunk of text?
Alex Tweedly
alex at tweedly.net
Thu Aug 30 06:10:18 EDT 2018
OK, this time I'm just typing into email - havent tested these
suggestions :-)
On 30/08/2018 10:24, Keith Clarke via use-livecode wrote:
> Folks,
> Is there a single-pass mechanism or more efficient way of returning the wordOffset of each instance of ‘the’ in ‘the quick brown fox jumped over the lazy dog’ than to use two passes through the text?
Yes. For a single word myWord
put 0 into tOffset
repeat forever
put trueWordOffset(myWord, tSource, tOffset) into tmp
if tmp > 0 then
put tmp & comma after tOffsetList
put tmp into tOffset
end if
end repeat
BUT there's a chance that this performs poorly, becuase of repeated
skipping, so I would also benchmark the simpler
put 0 into tOffset
repeat for each trueWord W in tSource
add 1 to tOffset
if W = myWord then
put tOffset & comma after tOffsetList
end if
end repeat
> Pass-1. Count the instances of ‘the’ into an array and then
> Pass-2. Repeat for the count of instances using wordOffset, with a wordsToSkip variable derived from the previous loop’s offset
>
> I’m I’m wondering if there’s something I’ve not yet learned about (nested?) arrays that might extend the unique word counter code that Alex, Paul & others helped me to fix a few days ago, to add a sub-array of wordOffset alongside word count?
I'm not entirely sure what you want here, or what the 'N' below are.
Do you want a count and an offsetList for each word ? If so, no need for
nested arrays.
Then I'd change your second loop below to:
repeat for each trueWord W in tSource
add 1 to tOffset
if tANoise[W] then next repeat
add 1 to tAWordCount[W]
put tOffset & comma after tAWordOffsets[W]
end repeat
and of course the third loop to
repeat for each key K in tAWordCount
put k && tAWordCount[K] & CR after tmp
end repeat
sort lines of tmp descending numeric by word 2 of each
put tmp into fld "Words"
If I've misunderstood what you want, please say so and I'll try again :-)
Alex.
>
> # Prepare noisewords array
>
> repeat for each trueWord W in tNoiseWords
>
> put true into tANoise[W]
>
> end repeat
>
>
> # Build unique words array
>
> repeat for each trueWord W in tSource
>
> if tANoise[W] then next repeat
>
> add 1 to tAWords[W][N]
>
> end repeat
>
>
> # Convert unique words array to list
>
>
> repeat for each key K in tAWords
>
> put K && tAWords[K][N] & CR after fld "Words"
>
> end repeat
>
>
> sort lines of field "Words" descending numeric by word 2 of each
>
>
> end repeat
>
> Any ideas or steer towards a lesson / worked example greatly appreciated.
> Best,
> Keith
>
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
More information about the use-livecode
mailing list