Searching for a word when it's more than one word

Richmond Mathewson richmondmathewson at gmail.com
Sat Sep 1 07:53:12 EDT 2018



On 1/9/2018 2:50 pm, Mark Waddingham via use-livecode wrote:
> On 2018-09-01 13:15, Richmond Mathewson via use-livecode wrote:
>> I've already shovelled Ruyton of the Eleven Towns quite effectively:
>>
>> https://www.dropbox.com/s/n7r7u0c2m9ny3eb/Text%20analyzer%20X.livecode.zip?dl=0 
>>
>>
>> No tokenising, in fact very basic stuff indeed.
>>
>> Not wishing to bang on about over-complcating things . . . . .
>
> There is actually a 'correct' more shovelistic approach (at least I 
> *think* this is correct):
>
> -- Ensure all punctuation is surrounded by space
> repeat for each char tPuncChar in ",.';:()[]{}<>!@£$%^&*-_+=~`?/\|#€" 
> & quote
>   replace tPuncChar with space & tPuncChar & space in tText
> end repeat

Thats a "point" (pun intended) as I just fell foul of a full stop.
>
> -- Ensure all whitespace is space
> replace return with space in tText
> replace tab with space in tText
>
> -- Ensure there is never two spaces next to each other in tText
> repeat while tText contains "  "
>   replace "  " with " " in tText
> end repeat
>
> -- Ensure there is only ever one space between words in phrases
> repeat while tPhrases contains "  "
>   replace "  " with " " in tPhrases
> end repeat
>
> -- We can now use an itemDelimiter of space
> set the itemDelimiter to space
>
> -- Sort the phrases by descending word length.
> sort lines of tPhrases descending numeric by the number of items in each
>
> -- Now check for, and remove each phrase from the source text in turn
> set the wholeMatches to true
> repeat for each line tPhrase in tPhrases
>   -- If the phrase is not present then skip to the next
>   if itemOffset(tPhrase, tText) is 0 then
>     next repeat
>   end if
>
>   -- Accumulate the phrase on the output list
>   put tPhrase & return after tFoundPhrases
>
>   -- Remove the phrase from the input text (we assume here that * does 
> not appear in any phrase)
>   replace tPhrase with "*" in tText
> end repeat
>
> Warmest Regards,
>
> Mark.
>
> P.S. The above will be reasonable quick for small sets of phrases / 
> small source texts - but I think as the size of either increases it 
> will get very slow, very quickly!
>





More information about the use-livecode mailing list