Unicode sorting
Devin Asay
devin_asay at byu.edu
Tue May 30 19:08:59 EDT 2006
Dar,
I got your code to work by making some simple changes in the
sortCodeFromRussian function:
function sortCodeFromRussianChar utf16Char
set the useUnicode to true
put charToNum(utf16Char) into unicodePoint
## Devin's changes - it turns out leaving the code points in decimal
works perfectly,
## and I only had to make a couple of adjustments.
if unicodePoint > 1039 and unicodePoint < 1072 then -- ignore case
add 32 to unicodePoint
else if unicodePoint = 1105 then -- sort 'yo' with 'ye'
put 1077 into unicodePoint
end if
##
-- switch unicodePoint
-- case 0x0020 -- space
-- get 1
-- break
-- ...
-- default
-- get 255
-- end switch
return unicodePoint --it
end sortCodeFromRussianChar
On May 27, 2006, at 2:05 PM, Dar Scott wrote:
> Try something roughly like this (not tested; typed in raw):
>
> function sortRussian utf16RussianList
> -- use utf8 to get rid of NULs and extra line ends
> put uniDecode(utf16RussianList, "UTF8") into utf8RussianList
> sort lines of utf8RussianList text by russianLex(each)
> return utf8RussianList
> end sortRussian
>
> -- returns string suitable for lexical comparison (Rev sort text)
> -- of a utf8 string made up of Russian subset of Cyrillic plus some
> ASCII
> function russianLex utf8RussianLine
> -- Add adjustments for special words here
> put uniEncode(utf8RussianLine, "UTF8") into utf16RussianLine
> put empty into lex
> repeat with i = 1 to length(utf16RussianLine)-1 step 2 --
> uniCode char loop
> put char i to i+1 of utf16RussianLine into utf16RussianChar
> -- Add char dropping tests here
> put sortCodeFromRussianChar( utf16RussianChar) into sortNumber
> put numTochar( sortNumber ) after lex -- use 1-byte chars for
> sorting
> end repeat
> return lex
> end russianLex
>
> -- returns number in range 1 to 255 indicating sort position of
> -- allowed characters
> function sortCodeFromRussianChar utf16Char
> set the useUnicode to true
> put charToNum(utf16Char) into unicodePoint
> switch unicodePoint
> case 0x0020 -- space
> get 1
> break
> ...
> default
> get 255
> end switch
> return it
> end sortCodeFromRussianChar
>
> This will take some debugging.
Only a little. ;-)
This is a huge help! Thanks a million.
Devin
Devin Asay
Humanities Technology and Research Support Center
Brigham Young University
More information about the use-livecode
mailing list