Importing Cyrillic Text
Devin Asay
devin_asay at byu.edu
Wed Dec 6 17:43:11 EST 2006
On Dec 6, 2006, at 1:20 PM, Richmond Mathewson wrote:
> One of my old chestnuts crops up again:-
The useUnicode property is a common trap, because scripters tend to
think that it is more powerful than it actually is. It should really
be called the useTwoByteCharsWithCharToNumAndNumToChar property,
because it only affects the charToNum and numToChar functions, by
having them look at two byte characters instead of single byte
characters.
>
> on mouseDown
> set the useUnicode to true
> answer file "CHOOSE A FILE TO IMPORT"
> if the result = "cancel" then exit mouseDown
> put it into NAMESTR
> open file NAMESTR for read
> read from file NAMESTR until EOF
> put it into the field "fENTER"
> end mouseDown
>
> open a Bulgarian .txt or .rtf file . . .
Here's what you want to do. First, you have to know how the text file
is encoded.
If it's an RTF file you can try using the rtfText property:
answer file "Choose a RTF file."
if it is empty then exit mouseUp
set the RTFText of field "Stuff" to URL ("file:" & it)
Let's assume it's in UTF-16, the most common unicode format, you
would do something like this:
answer file "Choose a unicode file to read in."
if it is empty then exit mouseUp
put "binfile:" & it into urlName
set the unicodeText of fld "display" to url urlName
If the text is UTF-8, a common format on the Web, you would have to
encode it to utf-16 upon reading it, like this:
answer file "Choose a UTF-8 file to read in."
if it is empty then exit mouseUp
put url ("binfile:" & it) into tRaw
set the unicodetext of fld "display" to \
uniencode(tRaw,"UTF8")
If it's in another encoding it get's dicier, but these techniques are
reliable for unicode files.
HTH
Devin
Devin Asay
Humanities Technology and Research Support Center
Brigham Young University
More information about the use-livecode
mailing list