AW: How trim: Bug in RegExp engine

Marielle Lange mlange at lexicall.org
Mon Oct 24 17:51:27 EDT 2005


Yuk!!!

If the programmers have to check up regular expressions anyway, can I  
suggest they make the matchtext consider the full text and not just  
match ***within*** a line of text. Okay for "^" and "$" to correspond  
to the start and end of a line of text, this is what happens in all  
programs that allow for regular expression.

But in all programs that implement regular expressions, the following  
happens:
put replaceText("DA" & cr & "CD","A.?C","")    -> DD
put replaceText("DA" & tab & "CD","A.?C","") -> DD

In revolution, this happens:
put replaceText("DA" & cr & "CD","A.?C","")    -> DA cr CD
put replaceText("DA" & tab & "CD","A.?C","") -> DD

The only way to deal with this is to replace all cr characters with  
another one completely unlikely to occur in the text before the  
replacetext or matchtext, then replace the other character with cr  
after the replacetext and matchtext. Quite unpractical and  
unnecessarily slowing down performance when big chunks of texts -- I  
use this for xml files -- need to be processed.

Marielle

> Wouter is right:
>
>> This is actually true -- and a serious bug in Revolution's RegExp  
>> engine.
>
> The Regular Expression Syntax reference states:
>
> ^ matches the following character at the beginning of the string
> ^A matches "ABC" but not "CAB"
>
> * matches zero or more occurrences of the preceding character or  
> pattern

I assumed that Revolution would do what it promised and didn't check  
this.

Try
answer replaceText("A C","^ *","")
I get "C", which obviously is not correct.
If I remove the "*", I get "A C"

And
answer replaceText("BAC","^A*","")
gives "C", so "^A*" matches "BA".
This must not happen.

It looks like "^A*" is incorrectly interpreted as "^.*A*".

Parentheses don't help, but might make things worse:
answer replaceText("BAC","(^A)*","")
gives an empty result. No idea why.

All the best

------------------------------------------------------------------------ 
--------
Marielle Lange (PhD),  Psycholinguist

Alternative emails: mlange at blueyonder.co.uk, M.Lange at ed.ac.uk
Homepage                                                            
http://homepages.lexicall.org/mlange/
Easy access to lexical databases                    http://lexicall.org
Supporting Education Technologists              http:// 
revolution.lexicall.org





More information about the use-livecode mailing list