How trim: Bug in RegExp engine
Mark Greenberg
markgreenberg at cox.net
Mon Oct 24 08:29:50 EDT 2005
Jim,
The book I mentioned, Mastering Regular Expressions by Jeffrey
E. F. Friedl, is an ultra-detailed explanation of Regular
Expressions. It describes two types of engines, one which rewrites
the expressions before execution and one that doesn't. Friedl spends
probably forty or fifty pages on the topic of greediness and
laziness. He includes examples of parsing HTML and email strings.
He lists all the special RegEx characters, most of which are not
mentioned in our Rev docs.
Naturally, he did not get into how to use RegEx in Rev. I've
used it a great deal in my stacks, and here's what I've found. At
first it takes some getting used to treating the Regular Expression
as a string. Friedl's examples from various languages that
incorporate RegEx show the expressions as part of the larger
language, or at least not as a quoted string. That gives us the
flexibility, however, of easily substituting anything that resolves
to a string in Rev, like constants, variables, and concatenated strings.
The other main issue is that Rev does not support all the fine
nuances of Perl-style RegEx, though the docs say it does. I don't
remember the details, but I ran into problems trying to use look-
around features, for instance. I've come to the conclusion that I
should try a simple version of what I want first in the Message Box,
then put it into my script.
It would fill a hole in the docs if we could come up with a list
of what works and what doesn't in Rev's version of Regular
Expressions. Since so few of us seem to use it, simple examples of
how to employ MatchText and ReplaceText would be very helpful. I'd
be glad to work with you on this. Keep in mind though that I do this
on the side, and sometimes my teaching job keeps me from my hobby of
"playing" with Rev. : ) Contact me off-list if you wish about this.
Mark Greenberg
On Oct 23, 2005, at 5:24 PM, Jim Ault wrote:
> My work is basically surface level application and is largely
> trial-and-error using BBEdit to see what different expressions will
> yield on
> a group of similar strings.
>
> I thought that 'greediness' was to produce the longest possible
> match, and
> the 'ungreedy (?U)' command was to have regex resolve to the
> shortest match.
> For me, the use of symbols in sequence has remained somewhat of a
> mystery
> since the grep engine works in more than one direction throughout a
> string.
>
> I would like to understand this particular issue a bit better, so I
> might do
> a conditioned test where 'regex(flavor)|string>result' would simply
> be a
> table of results based on common tasks.
>
> Perhaps we could compile a list of examples and alternatives. The
> reason I
> am interested is that I am using the MatchText, etc to parse some
> web page
> data, and there can always be unexpected results. The nature of my
> project
> means that I really want to avoid the unexpected.
>
> Grepping html or other code is always more of a challenge than
> plain English
> prose or database tables.
More information about the use-livecode
mailing list