How trim: Bug in RegExp engine
Marielle Lange
mlange at lexicall.org
Tue Oct 25 06:06:08 EDT 2005
The concept of "greediness of the *" has been introduced. Let's
expand. What this means is that when you parse any html or xml file,
you have to be very careful if you know a same tag can occur many
times in your document.
Simple example:
The <b> cat</b> under the <b>table</b> is...
if you use:
put replacetext(tText, "<b>.*</b>", "")
This will give you :
The is...
because * tries to match as many characters as possible.
The way to handle this in php is to add a "?" after the *, to
specifically indicate you want the "*" to be as ungreedy as possible
http://uk.php.net/manual/en/reference.pcre.pattern.modifiers.php
> U (PCRE_UNGREEDY)
> This modifier inverts the "greediness" of the quantifiers so that
> they are not greedy by default, but become greedy if followed by
> "?". It is not compatible with Perl. It can also be set by a (?U)
> modifier setting within the pattern or by a question mark behind a
> quantifier (e.g. .*?).
>
So, let's try:
put replacetext(tText, "<b>.*?</b>","")
He he, this gives the correct result:
The under the is...
------------------------------------------------------------------------
--------
Marielle Lange (PhD), Psycholinguist
Alternative emails: mlange at blueyonder.co.uk, M.Lange at ed.ac.uk
Homepage
http://homepages.lexicall.org/mlange/
Easy access to lexical databases http://lexicall.org
Supporting Education Technologists http://
revolution.lexicall.org
More information about the use-livecode
mailing list