Unicode sorting
Dar Scott
dsc at swcp.com
Thu Jun 1 19:21:54 EDT 2006
Wow! Great news for sorting Unicode!
On May 30, 2006, at 5:08 PM, Devin Asay wrote:
> I got your code to work by making some simple changes in the
> sortCodeFromRussian function:
Deven, I've been processing some bits of UTF-8, and something dawned
on me that is probably known by the Unicode experts.
**** A lexical byte sort of well-formed UTF-8 will result in a
Unicode code point sort! *****
That avoids the NUL problem in sort. That means that russianLex()
can return the UTF-8 of the string with your character conversions.
I think the replace command will work with UTF-8, so you can even
avoid a character loop. All you need is 34 replaces and then a
return. OK, that might actually be slower than a character loop.
Dar
Unicode Sophomore
More information about the use-livecode
mailing list