Finding matched parentheses

Peter Haworth pete at lcsql.com
Sat Jul 27 15:24:16 EDT 2013


On Thu, Jul 25, 2013 at 9:53 PM, Geoff Canyon <gcanyon at gmail.com> wrote:

> regex is notoriously unable to handle recursion. To see endless heated
> debate, search the web for how to parse HTML using regex.
>

Hi Geoff,
You piqued my interest and indeed there are endless debates about parsing
html!

I also came across the (?R) item which allows for recursion of a regex.
There's an example of using it at
http://php.net/manual/en/regexp.reference.recursive.php that is
specifically aimed at capturing the text between nested parens.  Not quite
what the original problem was but you just need to subtract one from each
start char and add 1 to each end char to get the char positions of the
parens.

It still doesn't handle Mark's earlier example of two completely separate
sets of parens, but I'll bet someone with more regex skills than I could
modify it to do so.

However, it may still not be usable within LC because matchChunk requires
you to know in advance how many capture groups will be found and specify
the requisite number of start/end variable pairs, which has always seemed
strange to me.

I wonder if it might ever be changed to return an array with one numeric
key for each match containing the comma separated start and end chars, or
even a line delimited list of start/end positions.

Pete
lcSQL Software <http://www.lcsql.com>



More information about the use-livecode mailing list