matchText and accented characters
Ken Ray
kray at sonsothunder.com
Tue Oct 16 19:59:47 EDT 2007
On Tue, 16 Oct 2007 12:18:54 -0600, Chris Sheffield wrote:
> Thanks, Andres. But that didn't seem to fix the problem. That
> property, according to the docs, only seems to apply to the numToChar
> and charToNum functions. I did try it just to make sure.
The issue is that PCRE (which is the lib that Rev uses) *optionally*
supports locales, so I don't know if any locales were compiled into the
code that Rev uses. If you knew what you were looking for, you could
replace the accented characters with their hex equivalents and you'd
get a match:
put matchChunk(fld 1,".*(fianc\x8E).*",tStart,tEnd)
in this case "\x8E" means "use hex code 8E", which is ASCII 142, which
is é (at least on my Mac). To determine this, I ran this code:
put baseConvert(charToNum("é"),10,16)
which gave me "8E". So if you know specifically the characters to
match, you can use this.
On the other hand, if you have a big chunk of text and you don't know
if there are accented chars or not, I would personally run it the
"brute force" way:
1) put a copy of the text into another variable
2) replace the accented chars with their non-accented counterparts - a
dozen or so lines like:
- replace "é" with "e" in myVar
- replace "ó" with "o" in myVar
- etc.
3) run your 'matchChunk' on the second "clean" variable using
non-accented text (look for "fiance" and not "fiancé")
4) if you get a hit, use the startChar/endChar variables from the
'matchChunk' to extract the text from the *first* variable (the one
with the accented text)
Just my 2 cents,
Ken Ray
Sons of Thunder Software, Inc.
Email: kray at sonsothunder.com
Web Site: http://www.sonsothunder.com/
More information about the use-livecode
mailing list