Diacritical Marks and matching text

Matt Denton matt.denton at limelight.com.au
Sun Mar 24 02:04:01 EST 2002


Hey Dar/List

Thanks for the tip Dar, yes Replace was my next logical step, however 
I'm *sure* I read somewhere there was a way to compare/sort/match text 
that ignored text with and without the diacritical marks, thus using a 
built in method rather than repeating the text (minimalist approach).  
Still, I could be wrong... it is annoying to think "I've read about that 
somewhere" and not be able to find that reference. Thanks too for the 
tip on using htmlText, that indeed might be the best path.

Anyone else have any clues?  (I'm sure to find the reference a few days 
later, with a D'oh!)

M@
Matt Denton

On Sunday, March 24, 2002, at 04:37  PM, Dar Scott <dsc at swcp.com> wrote:

>> On Saturday, March 23, 2002, at 05:32 PM, Matt Denton wrote:
>> I'm trying to decode text that has language diacritical marks included,
>> to give the root-letter form: È to e; Á to c etc.
>>
>> I've hunted through the documentation (Text and Data Processing) 
>> looking
>> for some in-built transcript term or way of handling and matching these
>> characters, I vaguely recall somewhere this has been addressed.
>>
>> Short of writing a small parser (not a hard task), does anyone
>> know of commands/methods of handling these characters?  My task is
>> to match typed text in a field with a text  that may or may not
>> have diacritical marks, a field of about 32K of text data.
>
> Just a wild idea...
>
> In a copy "replace" all diacritical letters with the canonical
> letters.  (See "replace" command."  Process with that.
>
> There might be some advantage to using the htmlText where you can
> refer to a letter as é or the like.  Perhaps this would work
> better on multiple platforms.




More information about the use-livecode mailing list