Converting from unicode to ASCII
J. Landman Gay
jacque at hyperactivesw.com
Sat Sep 26 23:18:43 EDT 2020
On 9/24/20 12:09 PM, J. Landman Gay via use-livecode wrote:
> My original goal was to get the canonical version directly from LC somehow.
Neville Smythe contacted me privately with this brilliant solution, posted here with his consent:
function stripAccents pInput
local tDecomposed
local tStripped
replace "'" with space in pInput -- illegal in sql queries, (my requirement)
-- Separate the accents from the base letters
put normalizeText(pInput, "NFD") into tDecomposed
repeat for each codepoint c in tDecomposed
-- Copy everything but the accent marks
if c="Æ" then put "AE" after tStripped
else if c="Œ" then put "OE" after tStripped
else if codepointProperty(c, "Diacritic") is false then
put c after tStripped
end if
end repeat
return tStripped
end stripAccents
This works great for my needs and is exactly what I was looking for. I had no idea we had a
codepointProperty function, which makes this all possible.
This will work for most European Latin alphabets with a few exceptions. Neville found that
German, Polish and Dutch may not be completely compatible, there may be some others. There is a
list of special characters that may need specific replacements here:
<https://maximilian.schalch.de/2018/05/complete-list-of-european-special-characters/>
For now I only need French, so I can probably omit the specific replacements. Maybe Neville
will chime in if I've left out anything, he's done quite a bit of research into the problem.
--
Jacqueline Landman Gay | jacque at hyperactivesw.com
HyperActive Software | http://www.hyperactivesw.com
More information about the use-livecode
mailing list