How trim: Bug in RegExp engine

Mark Greenberg markgreenberg at cox.net
Mon Oct 24 08:29:50 EDT 2005


Jim,
     The book I mentioned, Mastering Regular Expressions by Jeffrey  
E. F. Friedl, is an ultra-detailed explanation of Regular  
Expressions.  It describes two types of engines, one which rewrites  
the expressions before execution and one that doesn't.  Friedl spends  
probably forty or fifty pages on the topic of greediness and  
laziness.  He includes examples of parsing HTML and email strings.   
He lists all the special RegEx characters, most of which are not  
mentioned in our Rev docs.
     Naturally, he did not get into how to use RegEx in Rev.  I've  
used it a great deal in my stacks, and here's what I've found.  At  
first it takes some getting used to treating the Regular Expression  
as a string.  Friedl's examples from various languages that  
incorporate RegEx show the expressions as part of the larger  
language, or at least not as a quoted string.  That gives us the  
flexibility, however, of easily substituting anything that resolves  
to a string in Rev, like constants, variables, and concatenated strings.
     The other main issue is that Rev does not support all the fine  
nuances of Perl-style RegEx, though the docs say it does.  I don't  
remember the details, but I ran into problems trying to use look- 
around features, for instance.  I've come to the conclusion that I  
should try a simple version of what I want first in the Message Box,  
then put it into my script.
     It would fill a hole in the docs if we could come up with a list  
of what works and what doesn't in Rev's version of Regular  
Expressions.  Since so few of us seem to use it, simple examples of  
how to employ MatchText and ReplaceText would be very helpful.  I'd  
be glad to work with you on this.  Keep in mind though that I do this  
on the side, and sometimes my teaching job keeps me from my hobby of  
"playing" with Rev.  : )  Contact me off-list if you wish about this.

         Mark Greenberg

On Oct 23, 2005, at 5:24 PM, Jim Ault wrote:

> My work is basically surface level application and is largely
> trial-and-error using BBEdit to see what different expressions will  
> yield on
> a group of similar strings.
>
> I thought that 'greediness' was to produce the longest possible  
> match, and
> the 'ungreedy (?U)' command was to have regex resolve to the  
> shortest match.
> For me, the use of symbols in sequence has remained somewhat of a  
> mystery
> since the grep engine works in more than one direction throughout a  
> string.
>
> I would like to understand this particular issue a bit better, so I  
> might do
> a conditioned test where 'regex(flavor)|string>result' would simply  
> be a
> table of results based on common tasks.
>
> Perhaps we could compile a list of examples and alternatives.  The  
> reason I
> am interested is that I am using the MatchText, etc to parse some  
> web page
> data, and there can always be unexpected results.  The nature of my  
> project
> means that I really want to avoid the unexpected.
>
> Grepping html or other code is always more of a challenge than  
> plain English
> prose or database tables.




More information about the Use-livecode mailing list