Searching for a word when it's more than one word

Mark Waddingham mark at livecode.com
Sat Sep 1 07:50:42 EDT 2018


On 2018-09-01 13:15, Richmond Mathewson via use-livecode wrote:
> I've already shovelled Ruyton of the Eleven Towns quite effectively:
> 
> https://www.dropbox.com/s/n7r7u0c2m9ny3eb/Text%20analyzer%20X.livecode.zip?dl=0
> 
> No tokenising, in fact very basic stuff indeed.
> 
> Not wishing to bang on about over-complcating things . . . . .

There is actually a 'correct' more shovelistic approach (at least I 
*think* this is correct):

-- Ensure all punctuation is surrounded by space
repeat for each char tPuncChar in ",.';:()[]{}<>!@£$%^&*-_+=~`?/\|#€" & 
quote
   replace tPuncChar with space & tPuncChar & space in tText
end repeat

-- Ensure all whitespace is space
replace return with space in tText
replace tab with space in tText

-- Ensure there is never two spaces next to each other in tText
repeat while tText contains "  "
   replace "  " with " " in tText
end repeat

-- Ensure there is only ever one space between words in phrases
repeat while tPhrases contains "  "
   replace "  " with " " in tPhrases
end repeat

-- We can now use an itemDelimiter of space
set the itemDelimiter to space

-- Sort the phrases by descending word length.
sort lines of tPhrases descending numeric by the number of items in each

-- Now check for, and remove each phrase from the source text in turn
set the wholeMatches to true
repeat for each line tPhrase in tPhrases
   -- If the phrase is not present then skip to the next
   if itemOffset(tPhrase, tText) is 0 then
     next repeat
   end if

   -- Accumulate the phrase on the output list
   put tPhrase & return after tFoundPhrases

   -- Remove the phrase from the input text (we assume here that * does 
not appear in any phrase)
   replace tPhrase with "*" in tText
end repeat

Warmest Regards,

Mark.

P.S. The above will be reasonable quick for small sets of phrases / 
small source texts - but I think as the size of either increases it will 
get very slow, very quickly!

-- 
Mark Waddingham ~ mark at livecode.com ~ http://www.livecode.com/
LiveCode: Everyone can create apps




More information about the use-livecode mailing list