Matchtext to find a series of words

Jim Ault JimAultWins at yahoo.com
Wed Nov 29 17:43:59 EST 2006


On 11/29/06 1:26 PM, "J. Landman Gay" <jacque at hyperactivesw.com> wrote:

> I need a matchtext/regex that will find a series of words in a block of
> text, no matter whether they are together or not, and ignoring carriage
> returns. For example:
> 
> See if all of these words: dog cat dinosaur
> 
> are in this text:
> 
> "The purple dinosaur inadvertently stepped on the cat.<cr>
> The white dog howled."
> 
> Should return true. Is there such a thing?

I would tackle this using the filter command

replace cr with tab in textStr
set the wholematches to true
filter textStr with "*"& token1&"*"
filter textStr with "*"& token2&"*"
filter textStr with "*"& token3&"*"
if textStr  is empty then return false
else return true

A better form would be

function allWordsPresent textStr, wordList
  replace cr with tab in textStr
  set the wholematches to true
  repeat for each word WRD in wordList
    filter textStr with ("*" & WRD & "*")
  end repeat
  return not (textStr is empty)
end  allWordsPresent


regEx would be as follows

the OR condition is \b(dog|cat|dinosaur)\b
--where the \b says 'word boundary' to regEx

the AND condition
 (?(?=condition)(then1|then2|then3)|(else1|else2|else3))
--major drawback is that you would have to structure the exact number of
words to check [you used 3 in your example] and also be scanned multiple
times 9starting with the hit fo 'dog') since you would be trying 4
combinations.  RegEx would stop looking as soon as one of these tested TRUE.
dog
   +positive lookbehind (?<=cat
    + positive lookbehind (?<=dinosaur)
dog
   +positive lookahead (?<=cat
    + positive lookbehind (?<=dinosaur)
dog
   +positive lookahead (?<=cat
    + positive lookahead (?<=dinosaur)
dog
   +positive lookbehind (?<=cat
    + positive lookahead (?<=dinosaur)

------ where if any of these = true, then return TRUE, else FALSE


 the filter command is far easier to build and debug, and is likely faster
than the complex regex positive lookahead/behind algorithm

Someone more conversant in regEx my show a better solution and be the better
answer to your question.

Jim Ault
Las Vegas





More information about the use-livecode mailing list