Matchtext to find a series of words
Jim Ault
JimAultWins at yahoo.com
Wed Nov 29 17:43:59 EST 2006
On 11/29/06 1:26 PM, "J. Landman Gay" <jacque at hyperactivesw.com> wrote:
> I need a matchtext/regex that will find a series of words in a block of
> text, no matter whether they are together or not, and ignoring carriage
> returns. For example:
>
> See if all of these words: dog cat dinosaur
>
> are in this text:
>
> "The purple dinosaur inadvertently stepped on the cat.<cr>
> The white dog howled."
>
> Should return true. Is there such a thing?
I would tackle this using the filter command
replace cr with tab in textStr
set the wholematches to true
filter textStr with "*"& token1&"*"
filter textStr with "*"& token2&"*"
filter textStr with "*"& token3&"*"
if textStr is empty then return false
else return true
A better form would be
function allWordsPresent textStr, wordList
replace cr with tab in textStr
set the wholematches to true
repeat for each word WRD in wordList
filter textStr with ("*" & WRD & "*")
end repeat
return not (textStr is empty)
end allWordsPresent
regEx would be as follows
the OR condition is \b(dog|cat|dinosaur)\b
--where the \b says 'word boundary' to regEx
the AND condition
(?(?=condition)(then1|then2|then3)|(else1|else2|else3))
--major drawback is that you would have to structure the exact number of
words to check [you used 3 in your example] and also be scanned multiple
times 9starting with the hit fo 'dog') since you would be trying 4
combinations. RegEx would stop looking as soon as one of these tested TRUE.
dog
+positive lookbehind (?<=cat
+ positive lookbehind (?<=dinosaur)
dog
+positive lookahead (?<=cat
+ positive lookbehind (?<=dinosaur)
dog
+positive lookahead (?<=cat
+ positive lookahead (?<=dinosaur)
dog
+positive lookbehind (?<=cat
+ positive lookahead (?<=dinosaur)
------ where if any of these = true, then return TRUE, else FALSE
the filter command is far easier to build and debug, and is likely faster
than the complex regex positive lookahead/behind algorithm
Someone more conversant in regEx my show a better solution and be the better
answer to your question.
Jim Ault
Las Vegas
More information about the use-livecode
mailing list