Finding matched parentheses

Peter M. Brigham pmbrig at gmail.com
Tue Jul 30 09:52:27 EDT 2013


Here is a function generalized to work for any pair, building on dfepstein's function. I *think* it works, did limited testing. Try to find errors please. I'm curious to see benchmarking on this with long strings, if it works.

-- Peter

Peter M. Brigham
pmbrig at gmail.com
http://home.comcast.net/~pmbrig

------------------

function offsetPair str,tIndex,a,b -- str cannot be more than one line 
   -- returns the offset of b matching the occurrence of a at char tIndex in str
   -- if (char tIndex of str) <> a then returns empty -- error
   -- if a or b are not in str then returns 0
   -- if tIndex = empty then assumes the first occurrence of a
   -- if a or b = empty then assumes parentheses search
   if a = empty then put "(" into a
   if b = empty then put ")" into b
   put offsets(a,str) into openParens
   put offsets(b,str) into closeParens
   if openParens = 0 then return 0 
   if closeParens = 0 then return 0
   if tIndex = empty then return firstOffsetPair(a,b,str)
   if tIndex is not among the items of openParens then return empty
   -- char tIndex of str <> a
   put howmany(a,char 1 to tIndex-1 of str) into openBefore
   put length(char 1 to tIndex-1 of str) into lengthBefore
   put char tIndex to -1 of str into workingStr
   put item -openBefore of closeParens into tempCloseIndex
   delete char tempCloseIndex to -1 of workingStr
   put firstOffsetPair(a,b,workingStr) into theCloseIndex
   return tIndex,(theCloseIndex + lengthBefore)
end offsetPair

function firstOffsetPair a,b,str
   -- from dfepstein
   -- str cannot be more than one line 
   -- returns first instance of char a && "matching" instance of char b in str, or 0 if no a or empty if no match 
   put offset(a,str) into ca 
   if ca = 0 then return 0 
   put numToChar(7) into char 1 to ca of str 
   set lineDelimiter to a 
   set itemDelimiter to b 
   repeat with i = 1 to the number of items in str 
      if i = the number of lines in item 1 to i of str then return ca+length(item 1 to i of str) 
   end repeat 
   return empty 
end firstOffsetPair

function offsets str,container,includeOverlaps
   -- returns a comma-delimited list of all the offsets of str in container
   -- returns 0 if not found
   -- third param is optional:
   --     offsets("xx","xxxxxx") returns "1,3,5" not "1,2,3,4,5"
   --     ie, by default, overlapping offsets are not counted
   --     if you want overlapping offsets then pass "true" in 3rd param
   if str is not in container then return 0
   if includeOverlaps = empty then put false into includeOverlaps
   put empty into offsetList
   put 0 into startPoint
   repeat
      put offset(str,container,startPoint) into thisOffset
      if thisOffset = 0 then exit repeat
      add thisOffset to startPoint
      put startPoint & comma after offsetList
      if not includeOverlaps then
         add length(str)-1 to startPoint
      end if
   end repeat
   return item 1 to -1 of offsetList -- delete trailing comma
end offsets

function howmany tg,container
   -- how many tg = <target string> is in container
   -- note that howmany("00","000000") returns 3, not 5
   -- if you want to allow overlapping matches, use:
   --     number of items of offsets(tg,container,"true")
   --     (see offsets() function)
   
   -- requires getDelimiters()
   
   put getDelimiters(container) into divChar
   replace tg with divChar in container
   set the itemdelimiter to divChar
   put the number of items of container into h
   if char -1 of container = divChar then return h
   -- trailing delimiter is ignored
   return h-1
end howmany

function getDelimiters tText,nbr
   -- returns a cr-delimited list of <nbr> characters
   -- not found in the variable tText
   --      use for delimiters for, eg, parsing CSV files
   -- usage: put getDelimiters(CSVtext,2) into tDelims
   --        put line 1 of tDelims into lineDivider
   --        put line 2 of tDelims into itemDivider
   --           etc.
   
   if nbr = empty then put 1 into nbr -- default 1 delimiter
   put "2,3,4,5,6,7,8" into baseList
   -- could use other non-printing ASCII values
   put the number of items of baseList into maxNbr
   if nbr > maxNbr then return "Error: max" && maxNbr && "delimiters."
   repeat with tCount = 1 to nbr
      put true into failed
      repeat with i = 1 to the number of items of baseList
         put item i of baseList into testNbr
         put numtochar(testNbr) into testChar
         if testChar is not in tText then
            -- found one, store and get next delim
            put false into failed
            put testChar into line tCount of delimList
            exit repeat
         end if
      end repeat
      if failed then
         put the number of lines of delimList into nbrFound
         if nbr = 1 then
            return "Cannot get delimiter!"
         else if nbrFound = 0 then
            return "Cannot get any delimiters!"
         else
            return "Can only get" && nbrFound && "delimiters!"
         end if
      end if
      delete item i of baseList
   end repeat
   return delimList
end getDelimiters

----------------------

On Jul 29, 2013, at 4:15 PM, DunbarX at aol.com wrote:

> 
> Hmmm.
> 
> 
> I read the original post as finding the "closing parenthesis ... of a pair"
> 
> 
> "Any pair?" 
> 
> 
> This seems to indicate that all nested parens have to be parsed as a whole. If you want to find a particular
> related couple, you have to use the correct ordered pair derived from the function.
> 
> 
> It is much simpler to find the first ")" and work backward. But that would preclude nested parens, since the 
> firstmost and innermost pair would be the only one found. You cannot find that kind of first "pair" and also
> include nested pairs. That first pair is always minimally small.
> 
> 
> The function could be easily modified to list all pairs within any designated pair, of course.
> 
> 
> 
> Anyway, it was fun to play with.
> 
> 
> Craig
> 
> 
> -----Original Message-----
> From: Peter Haworth <pete at lcsql.com>
> To: How to use LiveCode <use-livecode at lists.runrev.com>
> Sent: Mon, Jul 29, 2013 1:15 pm
> Subject: Re: Finding matched parentheses
> 
> 
> On Mon, Jul 29, 2013 at 9:07 AM, <dunbarx at aol.com> wrote:
> 
>> I tired your script on the string:
>> 
>> 
>> 
>> aa(ss)(xx)(yy)
>> 
>> 
>> 
>> it only returned the parens bracketing "ss"
>> 
> 
> I Think that's what he wants to do - just find the position of the first
> set of parentheses, taking nested parens into account.  But not sure.....
> 
> Personally, I'd use the regex that Thierry posted a couple of days back.
> No recursion involved and one line of code does the job.
> 
> Pete
> lcSQL Software <http://www.lcsql.com>
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription 
> preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
> 
> 
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode





More information about the use-livecode mailing list