perl regex modifiers

Alex Rice alrice at ARCplanning.com
Thu Jul 24 23:57:01 EDT 2003


On Thursday, July 24, 2003, at 06:38  PM, Mark Brownell wrote:
>  Is there a way to create a matchChunk regex that picks up all 
> instances within a document that may contain several of the same > items?

OK probably! But I don't know how to do it. Some thoughts...

In the PCRE manual there are examples of regexes for repeated patterns, 
subpatterns, recursive subpatterns, subroutine callouts, and so forth. 
Very powerful stuff. Some of which is available to us, and some that is 
not. I think anything that refers to "you get this back from the C 
function blah blah" the C API is not available to us.

What you describe above, "regex that picks up all instances within a 
document", in my experience is a difficult way to approach the problem. 
There are two main dimensions to consider in designing your regex.

1) The first dimension is how you break the document up into parts to 
feed it into your pattern match, probably in a repeat structure. Each 
iteration feeding a new string to matchText or matchChunk. Where this 
string comes from, what length it is and what it's delimiters are is 
going to really depend on and be complementary to the next dimension. 
The fit between these two dimensions is really the art of designing 
regular expressions.

2) The second dimension is the width of a single match of the pattern. 
It's possible to write one regex to match a many-line xml document, but 
it would to be very complex and very unreliable unless you get it 
exactly perfect. That's not the way I would approach it. Narrow down 
the match to a small part of the document (single node or element).

You do have some flexibility in the width of the this dimension though 
because you can have repeated patterns in the regex, and can capture 
multiple parts out of the match using parens () in your pattern. 
Unfortunately matchText and matchChunk do not take an array for their 
match variables (foundVarsList and positionVarsList). However you could 
match many things in your pattern like matchText(tStr, "()()()...()", 
t1,t2,t3...tn) probably limited only by the max number of parameters in 
transcript function call, if there is a max.

Another flexibility in this dimension is the topicality of the pattern 
match. The modifiers (?smx) can be used to adjust whether a single line 
is matched, multiple lines, how whitespace is handled and how the "." 
(any-character) is handled. So the topicality relates in a way back to 
the 1st dimension, how are you feeding the data into the pattern match 
functions in the first place.

Regular Expressions are extremely powerful, but also pretty darn 
confusing when you look at the more advanced usages. Hopefully not any 
more confusing now :-)

I used to program in Perl a lot. A lot of things about Perl suck, but 
it's regular expressions capabilities are great. We are fortunate to 
have this PCRE engine in RR now.

Alex Rice, Software Developer
Architectural Research Consultants, Inc.
http://ARCplanning.com




More information about the use-livecode mailing list