Unicode sorting

Dar Scott dsc at swcp.com
Thu Jun 1 19:21:54 EDT 2006


Wow!  Great news for sorting Unicode!

On May 30, 2006, at 5:08 PM, Devin Asay wrote:

> I got your code to work by making some simple changes in the  
> sortCodeFromRussian function:

Deven, I've been processing some bits of UTF-8, and something dawned  
on me that is probably known by the Unicode experts.

  **** A lexical byte sort of well-formed UTF-8 will result in a  
Unicode code point sort!  *****

That avoids the NUL problem in sort.  That means that russianLex()  
can return the UTF-8 of the string with your character conversions.

I think the replace command will work with UTF-8, so you can even  
avoid a character loop.  All you need is 34 replaces and then a  
return.  OK, that might actually be slower than a character loop.

Dar
Unicode Sophomore





More information about the use-livecode mailing list