Unicode sorting

Dar Scott dsc at swcp.com
Fri May 26 18:50:25 EDT 2006


On May 26, 2006, at 3:57 PM, Devin Asay wrote:

> A 'sort lines' command, after converting upper case to lower, works  
> fairly well, except that, curiously, a space sorts *after* all  
> cyrillic chars.

That's weird.

Space is U+0020.

The basic Cyrillic lower case seem to be U+0430 to U+044F, where 'a'  
is U+0430.

So if your system is UTF16BE (Mac), space, hex 00 20, should sort  
before 'a', hex 04 30, and small YA, hex 04 4F.

If you system is UTF16LE (Win), space, hex 20 00, should sort before  
'a', hex 30 04, and YA, hex 4F 04.

Or am I really mixed up on what you are doing?

(If you convert to upper case and are on Windows the space will sort  
in the middle.)

Dar






More information about the use-livecode mailing list