Pattern Matching in Livecode

Peter M. Brigham pmbrig at gmail.com
Wed Nov 9 10:38:39 EST 2016


On Nov 9, 2016, at 7:46 AM, Alejandro Tejada <capellan2000 at gmail.com> wrote:
> 
> Hi all,
> 
> Recently I made a very long script for searching 3 words among a list
> of 1080 lines of words.
> 
> Download the zipped stack MatchingPatternsv02
> from this forum thread:
> http://forums.livecode.com/viewtopic.php?f=7&t=28288
> 
> I suspect that LiveCode provides better tools for this task, but I
> don't know which are and how to use them. Maybe a simpler solution is
> to employ a regex, arrays operation or a really clever handler.
> 
> How many different methods (functions and commands) provides Livecode
> to make this task of comparing and finding 3 words (taken from a list
> of 12 words) among 1080 lines of 4 words?

Here’s one way, using some text-munching utility functions. The following is a longish list of handlers, but as you can see the basic function find3words() is pretty compact. Once you have the utilities in place in a library, they can be used in all different kinds of contexts to shortcut things.

You might be able to do this with a Regex expression, I don’t know, I’m allergic to Regex. I like to work in pure LC.

— Peter

Peter M. Brigham
pmbrig at gmail.com

--------

— the following function is not tested:

function find3words pList, pWords
   repeat with w = 1 to 3
      put line offsets(word w of pWords,pList) into A[w]
   end repeat
   put A[1] into f1
   put A[2] into f2
   put A[3] into f3
   put intersectLIsts(f1,f2) into out1
   put intersectLIsts(out1,f3) into outlist
   — now have to check that we have found whole items, not just part of a word
   repeat for each item i in outlist
      repeat with w = 1 to 3
         if not (item w of pWords is among the items of line i of pList) then next repeat
         put i & comma after finalList
      end repeat
   end repeat
   if finalList = empty then put 0 into finalList
   return item 1 to -1 of finalList
end find3words

function lineOffsets str, pContainer, matchWhole
   -- returns a comma-delimited list of all the lineOffsets of str
   --    in pContainer
   -- if matchWhole = true then only whole lines are located
   --    else finds line matches everywhere str is part of a line in pContainer
   -- duplicates are stripped out
   -- note: to get the last lineOffset of a string in a container (often useful)
   --    use "item -1 of lineOffsets(...)"
   -- requires offsets()
   
   if matchWhole = empty then put false into matchWhole
   put offsets(str,pContainer) into offList
   if offList = "0" then return "0"
   repeat for each item i in offList
      put the number of lines of (char 1 to i of pContainer) into lineNbr
      if matchWhole then
         if line lineNbr of pContainer <> str then next repeat
      end if
      put 1 into A[lineNbr]
      -- using an array avoids duplicates
   end repeat
   put the keys of A into lineList
   sort lines of lineList ascending numeric
   replace cr with comma in lineList
   return lineList
end lineOffsets

function offsets str, pContainer
   -- returns a comma-delimited list of all the offsets of str in pContainer
   -- returns 0 if str is not found
   -- note: offsets("xx","xxxxxx") returns "1,3,5" not "1,2,3,4,5"
   --     ie, overlapping offsets are not counted
   -- note: to get the last occurrence of a string in a container (often useful)
   --     use "item -1 of offsets(...)"
   
   if str is not in pContainer then return 0
   put 0 into startPoint
   repeat
      put offset(str,pContainer,startPoint) into thisOffset
      if thisOffset = 0 then exit repeat
      add thisOffset to startPoint
      put startPoint & comma after offsetList
      add length(str)-1 to startPoint
   end repeat
   return item 1 to -1 of offsetList -- delete trailing comma
end offsets

function intersectLists listA, listB, pDelim
   -- returns the intersection of two lists, ie., a list of items/lines common to both
   -- if pDelim = empty then looks first for the presence of cr in the lists,
   --       if found, defaults to cr as the delimiter
   --    if no cr found, looks for the presence of comma in the lists,
   --       if found, defaults to comma as the delimiter
   --    if neither found, returns empty (user should have specified another delim)
   -- order of items may be changed, result may require sorting
   -- by Peter M. Brigham, pmbrig at gmail.com — freeware
   --    the idea of using "split tArray with pDelim and pDelim"
   --    comes from Peter Hayworth on the use-LC list -- it's very clever!
   -- requires getDelimiters(), noDupes
   
   if listA = empty or listB = empty then return empty
   if pDelim = empty then
      if listA & listB contains cr then
         put cr into pDelim
      else if listA & listB contains comma then
         put comma into pDelim
      else
         return empty
      end if
   end if
   noDupes listA,pDelim
   noDupes listB,pDelim
   put getDelimiters(listA & listB) into tempDelim
   if tempDelim begins with "Error" then return "Error in getDelimiters()"
   split listA with pDelim and pDelim
   split listB with pDelim and pDelim
   intersect listA with listB
   combine listA with pDelim and tempDelim
   replace tempDelim with empty in listA
   return listA
end intersectLists

function getDelimiters pText, nbrNeeded
   -- returns a cr-delimited list of <nbrNeeded> characters
   --    none of which are found in the variable pText
   -- use for delimiters for, eg, parsing text files, manipulating arrays, etc.
   -- usage: put getDelimiters(pText,2) into tDelims
   --        if tDelims begins with "Error" then exit to top -- or whatever
   --        put line 1 of tDelims into lineDivider
   --        put line 2 of tDelims into itemDivider
   --             etc.
   
   if pText = empty then return "Error: no text specified."
   if nbrNeeded = empty then put 1 into nbrNeeded -- default 1 delimiter
   put "2,3,4,5,6,7,8,16,17,18,19,20,21,22,23,24,25,26" into baseList
   -- low ASCII values, excluding CR, LF, tab, etc.
   put the number of items of baseList into maxNbr
   if nbrNeeded > maxNbr then return "Error: max" && maxNbr && "delimiters."
   repeat for each item testCharNbr in baseList
      put numtochar(testCharNbr) into testChar
      if testChar is not in pText then
         -- found one, store and get next delim
         put testChar & cr after delimList
         if the number of lines of delimList = nbrNeeded
         then return line 1 to -1 of delimList
         -- done
      end if
   end repeat
   -- if we got this far, there was an error
   put the number of lines of delimList into totalFound
   if totalFound = 0 then
      return "Error: cannot get any delimiters."
   else if totalFound = 1 then
      return "Error: can only get 1 delimiter."
   else
      return "Error: can only get" && totalFound && "delimiters."
   end if
end getDelimiters

on noDupes @pList, pDelim
   -- strips duplicate (and empty) lines/items from a list
   -- note: pList is referenced, so the original list will be changed.
   -- if pDelim = empty then looks first for the presence of cr in pList,
   --       if found, defaults to cr as the delimiter
   --    if no cr found, looks for the presence of comma in pList,
   --       if found, defaults to comma as the delimiter
   --    if neither found, exits without changing pList
   --       (user should have specified another delim)
   -- note: the order of the list will likely be changed, may require sorting
   -- note: the split command is inherently case-sensitive
   --    (irrespective of the value of the caseSensitive property),
   --    so "Chuck" and "chuck" will not be considered duplicates
   --    if you need case insensitive, use the noDupes() function instead
   --    this command scales better with very large lists than noDupes()
   -- note: pDelim could be a string of characters, so you could do:
   --    put "apple and orange and pear and orange and banana and apple" into pList
   --    noDupes pList," and "
   --         after which pList will be: "pear and banana and apple and orange"
   -- thanks to Peter Hayworth of the use-LC mailing list --
   --    the idea of using "split tArray with pDelim and pDelim" is very clever!
   -- adjusted by Peter M. Brigham, pmbrig at gmail.com
   -- requires getDelimiters()
   
   if pDelim = empty then
      if cr is in pList then
         put cr into pDelim
      else if comma is in pList then
         put comma into pDelim
      else
         answer "noDupes: no delimiter specified" as sheet
         exit noDupes
      end if
   end if
   put getDelimiters(pList) into tempDelim
   replace pDelim with tempDelim in pList
   split pList by tempDelim and tempDelim
   put the keys of pList into pList
   filter pList without empty
   replace cr with pDelim in pList
end noDupes






More information about the use-livecode mailing list