perl regex modifiers
Alex Rice
alrice at ARCplanning.com
Thu Jul 24 23:57:01 EDT 2003
On Thursday, July 24, 2003, at 06:38 PM, Mark Brownell wrote:
> Is there a way to create a matchChunk regex that picks up all
> instances within a document that may contain several of the same > items?
OK probably! But I don't know how to do it. Some thoughts...
In the PCRE manual there are examples of regexes for repeated patterns,
subpatterns, recursive subpatterns, subroutine callouts, and so forth.
Very powerful stuff. Some of which is available to us, and some that is
not. I think anything that refers to "you get this back from the C
function blah blah" the C API is not available to us.
What you describe above, "regex that picks up all instances within a
document", in my experience is a difficult way to approach the problem.
There are two main dimensions to consider in designing your regex.
1) The first dimension is how you break the document up into parts to
feed it into your pattern match, probably in a repeat structure. Each
iteration feeding a new string to matchText or matchChunk. Where this
string comes from, what length it is and what it's delimiters are is
going to really depend on and be complementary to the next dimension.
The fit between these two dimensions is really the art of designing
regular expressions.
2) The second dimension is the width of a single match of the pattern.
It's possible to write one regex to match a many-line xml document, but
it would to be very complex and very unreliable unless you get it
exactly perfect. That's not the way I would approach it. Narrow down
the match to a small part of the document (single node or element).
You do have some flexibility in the width of the this dimension though
because you can have repeated patterns in the regex, and can capture
multiple parts out of the match using parens () in your pattern.
Unfortunately matchText and matchChunk do not take an array for their
match variables (foundVarsList and positionVarsList). However you could
match many things in your pattern like matchText(tStr, "()()()...()",
t1,t2,t3...tn) probably limited only by the max number of parameters in
transcript function call, if there is a max.
Another flexibility in this dimension is the topicality of the pattern
match. The modifiers (?smx) can be used to adjust whether a single line
is matched, multiple lines, how whitespace is handled and how the "."
(any-character) is handled. So the topicality relates in a way back to
the 1st dimension, how are you feeding the data into the pattern match
functions in the first place.
Regular Expressions are extremely powerful, but also pretty darn
confusing when you look at the more advanced usages. Hopefully not any
more confusing now :-)
I used to program in Perl a lot. A lot of things about Perl suck, but
it's regular expressions capabilities are great. We are fortunate to
have this PCRE engine in RR now.
Alex Rice, Software Developer
Architectural Research Consultants, Inc.
http://ARCplanning.com
More information about the use-livecode
mailing list