matchText and accented characters

Chris Sheffield cmsheffield at gmail.com
Wed Oct 17 10:45:12 EDT 2007


Thanks, Ken. Using the hex equivalents is an interesting suggestion.  
I may look into that further.

As for replacing the accented characters with their non-accented  
equivalents, that is also something I've done in the past, but the  
problem here is that this is Mac/PC cross platform, so it's quite a  
few extra lines of code.

So I decided to simply try the offset function, with wholeMatches set  
to true (although I can't really determine if wholeMatches affects  
offset or not), and that seems to be working fine for me. Still  
testing it out to make sure, but so far so good.

Thanks again for the suggestions.


On Oct 16, 2007, at 5:59 PM, Ken Ray wrote:

> On Tue, 16 Oct 2007 12:18:54 -0600, Chris Sheffield wrote:
>
>> Thanks, Andres. But that didn't seem to fix the problem. That
>> property, according to the docs, only seems to apply to the numToChar
>> and charToNum functions. I did try it just to make sure.
>
> The issue is that PCRE (which is the lib that Rev uses) *optionally*
> supports locales, so I don't know if any locales were compiled into  
> the
> code that Rev uses. If you knew what you were looking for, you could
> replace the accented characters with their hex equivalents and you'd
> get a match:
>
>   put matchChunk(fld 1,".*(fianc\x8E).*",tStart,tEnd)
>
> in this case "\x8E" means "use hex code 8E", which is ASCII 142, which
> is é (at least on my Mac). To determine this, I ran this code:
>
>   put baseConvert(charToNum("é"),10,16)
>
> which gave me "8E". So if you know specifically the characters to
> match, you can use this.
>
> On the other hand, if you have a big chunk of text and you don't know
> if there are accented chars or not, I would personally run it the
> "brute force" way:
>
> 1) put a copy of the text into another variable
> 2) replace the accented chars with their non-accented counterparts - a
> dozen or so lines like:
>        - replace "é" with "e" in myVar
>        - replace "ó" with "o" in myVar
>        - etc.
> 3) run your 'matchChunk' on the second "clean" variable using
> non-accented text (look for "fiance" and not "fiancé")
> 4) if you get a hit, use the startChar/endChar variables from the
> 'matchChunk' to extract the text from the *first* variable (the one
> with the accented text)
>
> Just my 2 cents,
>
> Ken Ray
> Sons of Thunder Software, Inc.
> Email: kray at sonsothunder.com
> Web Site: http://www.sonsothunder.com/
> _______________________________________________
> use-revolution mailing list
> use-revolution at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your  
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-revolution




More information about the use-livecode mailing list