AW: How trim: Bug in RegExp engine
Marielle Lange
mlange at lexicall.org
Mon Oct 24 17:51:27 EDT 2005
Yuk!!!
If the programmers have to check up regular expressions anyway, can I
suggest they make the matchtext consider the full text and not just
match ***within*** a line of text. Okay for "^" and "$" to correspond
to the start and end of a line of text, this is what happens in all
programs that allow for regular expression.
But in all programs that implement regular expressions, the following
happens:
put replaceText("DA" & cr & "CD","A.?C","") -> DD
put replaceText("DA" & tab & "CD","A.?C","") -> DD
In revolution, this happens:
put replaceText("DA" & cr & "CD","A.?C","") -> DA cr CD
put replaceText("DA" & tab & "CD","A.?C","") -> DD
The only way to deal with this is to replace all cr characters with
another one completely unlikely to occur in the text before the
replacetext or matchtext, then replace the other character with cr
after the replacetext and matchtext. Quite unpractical and
unnecessarily slowing down performance when big chunks of texts -- I
use this for xml files -- need to be processed.
Marielle
> Wouter is right:
>
>> This is actually true -- and a serious bug in Revolution's RegExp
>> engine.
>
> The Regular Expression Syntax reference states:
>
> ^ matches the following character at the beginning of the string
> ^A matches "ABC" but not "CAB"
>
> * matches zero or more occurrences of the preceding character or
> pattern
I assumed that Revolution would do what it promised and didn't check
this.
Try
answer replaceText("A C","^ *","")
I get "C", which obviously is not correct.
If I remove the "*", I get "A C"
And
answer replaceText("BAC","^A*","")
gives "C", so "^A*" matches "BA".
This must not happen.
It looks like "^A*" is incorrectly interpreted as "^.*A*".
Parentheses don't help, but might make things worse:
answer replaceText("BAC","(^A)*","")
gives an empty result. No idea why.
All the best
------------------------------------------------------------------------
--------
Marielle Lange (PhD), Psycholinguist
Alternative emails: mlange at blueyonder.co.uk, M.Lange at ed.ac.uk
Homepage
http://homepages.lexicall.org/mlange/
Easy access to lexical databases http://lexicall.org
Supporting Education Technologists http://
revolution.lexicall.org
More information about the use-livecode
mailing list