Searching for a word when it's more than one word
Mark Waddingham
mark at livecode.com
Sat Sep 1 07:50:42 EDT 2018
On 2018-09-01 13:15, Richmond Mathewson via use-livecode wrote:
> I've already shovelled Ruyton of the Eleven Towns quite effectively:
>
> https://www.dropbox.com/s/n7r7u0c2m9ny3eb/Text%20analyzer%20X.livecode.zip?dl=0
>
> No tokenising, in fact very basic stuff indeed.
>
> Not wishing to bang on about over-complcating things . . . . .
There is actually a 'correct' more shovelistic approach (at least I
*think* this is correct):
-- Ensure all punctuation is surrounded by space
repeat for each char tPuncChar in ",.';:()[]{}<>!@£$%^&*-_+=~`?/\|#€" &
quote
replace tPuncChar with space & tPuncChar & space in tText
end repeat
-- Ensure all whitespace is space
replace return with space in tText
replace tab with space in tText
-- Ensure there is never two spaces next to each other in tText
repeat while tText contains " "
replace " " with " " in tText
end repeat
-- Ensure there is only ever one space between words in phrases
repeat while tPhrases contains " "
replace " " with " " in tPhrases
end repeat
-- We can now use an itemDelimiter of space
set the itemDelimiter to space
-- Sort the phrases by descending word length.
sort lines of tPhrases descending numeric by the number of items in each
-- Now check for, and remove each phrase from the source text in turn
set the wholeMatches to true
repeat for each line tPhrase in tPhrases
-- If the phrase is not present then skip to the next
if itemOffset(tPhrase, tText) is 0 then
next repeat
end if
-- Accumulate the phrase on the output list
put tPhrase & return after tFoundPhrases
-- Remove the phrase from the input text (we assume here that * does
not appear in any phrase)
replace tPhrase with "*" in tText
end repeat
Warmest Regards,
Mark.
P.S. The above will be reasonable quick for small sets of phrases /
small source texts - but I think as the size of either increases it will
get very slow, very quickly!
--
Mark Waddingham ~ mark at livecode.com ~ http://www.livecode.com/
LiveCode: Everyone can create apps
More information about the use-livecode
mailing list