Converting from unicode to ASCII

J. Landman Gay jacque at hyperactivesw.com
Sat Sep 26 23:18:43 EDT 2020


On 9/24/20 12:09 PM, J. Landman Gay via use-livecode wrote:
> My original goal was to get the canonical version directly from LC somehow.

Neville Smythe contacted me privately with this brilliant solution, posted here with his consent:

function stripAccents pInput
   local tDecomposed
   local tStripped

   replace "'" with space in pInput -- illegal in sql queries, (my requirement)

   -- Separate the accents from the base letters
   put normalizeText(pInput, "NFD") into tDecomposed
   repeat for each codepoint c in tDecomposed
     -- Copy everything but the accent marks
     if c="Æ" then put "AE" after tStripped
     else if c="Œ" then put "OE" after tStripped
     else if codepointProperty(c, "Diacritic") is false then
       put c after tStripped
     end if
   end repeat
   return tStripped
end stripAccents

This works great for my needs and is exactly what I was looking for. I had no idea we had a 
codepointProperty function, which makes this all possible.

This will work for most European Latin alphabets with a few exceptions. Neville found that 
German, Polish and Dutch may not be completely compatible, there may be some others. There is a 
list of special characters that may need specific replacements here:

<https://maximilian.schalch.de/2018/05/complete-list-of-european-special-characters/>

For now I only need French, so I can probably omit the specific replacements. Maybe Neville 
will chime in if I've left out anything, he's done quite a bit of research into the problem.

-- 
Jacqueline Landman Gay         |     jacque at hyperactivesw.com
HyperActive Software           |     http://www.hyperactivesw.com





More information about the use-livecode mailing list