How trim: Bug in RegExp engine
Marielle Lange
mlange at lexicall.org
Mon Oct 24 18:02:33 EDT 2005
In all programs I know that implement regular expressions, replaceText
("A C","^ *","") will return A C not C. I tested it in BBedit, A C is
returned. This is not due to the greediness of the "*", as this
cannot explain that ^ seems to eat up the A. Perhaps it is due to the
fact the routine is coded within revolution as it seems that the ^
absorbs the first letter.
put replaceText("A C","^p","") -> A C
put replaceText("A C","^","") -> empty.
This looks like a bug.
Marielle
>
> Mark Greenberg wrote:
> Though it's academic now since Bob has his solutions, this isn't a
> Rev bug; it's the way Regular Expressions work (or fail to in this
> case). The problem is in the greediness of the * quantifier.
> Though I can't say that I totally understand why, in cases where
> the RegEx reduces to nothing after the optional parts are removed,
> matching with either the ? or the * quantifiers causes unexpected
> results, regardless of whether the RegEx is in Perl, egrep, or
> wherever.
> This is because the RegEx engine continues to try to find a match
> (to nothingness, I guess), consumes the entire string, and then
> backtracks giving up one character at a time. Why "C" instead of
> "A C"? I don't know, but my RegEx reference book (Mastering
> Regular Expressions by Jeffrey E. F. Friedl) does warn against such
> constructions as "^ *" with a lengthy explanation about greediness
> of the * and ? quantifiers.
------------------------------------------------------------------------
--------
Marielle Lange (PhD), Psycholinguist
Alternative emails: mlange at blueyonder.co.uk, M.Lange at ed.ac.uk
Homepage
http://homepages.lexicall.org/mlange/
Easy access to lexical databases http://lexicall.org
Supporting Education Technologists http://
revolution.lexicall.org
More information about the use-livecode
mailing list