Replace accents

Mark Waddingham mark at livecode.com
Tue May 10 10:31:14 EDT 2016


On 2016-05-10 16:22, Ludovic THEBAULT wrote:
> If the text is typed with Livecode, it's work, but with a text opened
> from a file (or the list of files in a folder, like i use in this
> script, it doesn't work).
> 
> try :
> 
> on mouseup
>    put "Bonjour à tous" into URL("file:" &
> specialfolderpath("Documents")&"/bonjouràtous.txt")
>    set the defaultfolder to  specialfolderpath("Documents")
>    get the files
>    filter it with "bonjour*"
>    get noaccents(it)
>    answer it
> end mouseup
> 
> function noaccents myText
> put "áàâäãåÄÅÀÃéèêëÊËÈÉíìîïÍÎÏÌóòôöõÓÔÒÕÖúùûüÚÛÙÜÑñçÇ'" into accent
>    put "aaaaaaAAAAeeeeEEEEiiiiIIIIoooooOOOOOuuuuUUUUNncC_" into 
> noaccent
> 
>    put 0 into cpt
>    repeat for each char i in accent
>       add 1 to cpt
>       --if i = "É" then breakpoint
>       replace i with char cpt of noaccent in myText
>    end repeat
> return myText
> end noaccents

On Mac, filenames are stored in unicode decomposed form - i.e. a-grave 
is present as 'a,grave' rather than a single 'a-grave' character. I 
think you've just found a concrete instance of this bug - 
http://quality.livecode.com/show_bug.cgi?id=17104.

Basically, the 'replace x with y' command is assuming that the length of 
each instance of the pattern in the target string is the same length as 
the pattern. With Unicode, though, this is not necessarily true as 
'e,acute' (two units) should match 'e-acute' (one unit).

For now, you can work-around the bug by using normalizeText(it, "NFC") - 
which makes sure all accented chars which can be composed (which is all 
of the ones in your replacement list) are composed.

Warmest Regards,

Mark.

-- 
Mark Waddingham ~ mark at livecode.com ~ http://www.livecode.com/
LiveCode: Everyone can create apps




More information about the use-livecode mailing list