Importing Cyrillic Text

Devin Asay devin_asay at byu.edu
Wed Dec 6 17:43:11 EST 2006


On Dec 6, 2006, at 1:20 PM, Richmond Mathewson wrote:

> One of my old chestnuts crops up again:-

The useUnicode property is a common trap, because scripters tend to  
think that it is more powerful than it actually is. It should really  
be called the useTwoByteCharsWithCharToNumAndNumToChar property,  
because it only affects the charToNum and numToChar functions, by  
having them look at two byte characters instead of single byte  
characters.
>
> on mouseDown
>   set the useUnicode to true
>   answer file "CHOOSE A FILE TO IMPORT"
>   if the result = "cancel" then exit mouseDown
>   put it into NAMESTR
>   open file NAMESTR for read
>   read from file NAMESTR until EOF
>   put it into the field "fENTER"
> end mouseDown
>
> open a Bulgarian .txt or .rtf file . . .

Here's what you want to do. First, you have to know how the text file  
is encoded.

If it's an RTF file you can try using the rtfText property:

answer file "Choose a RTF file."
if it is empty then exit mouseUp
set the RTFText of field "Stuff" to URL ("file:" & it)

Let's assume it's in UTF-16, the most common unicode format, you  
would do something like this:

   answer file "Choose a unicode file to read in."
   if it is empty then exit mouseUp
   put "binfile:" & it into urlName
   set the unicodeText of fld "display" to url urlName

If the text is UTF-8, a common format on the Web, you would have to  
encode it to utf-16 upon reading it, like this:

   answer file "Choose a UTF-8 file to read in."
   if it is empty then exit mouseUp
   put url ("binfile:" & it) into tRaw
   set the unicodetext of fld "display" to \
       uniencode(tRaw,"UTF8")

If it's in another encoding it get's dicier, but these techniques are  
reliable for unicode files.

HTH

Devin

Devin Asay
Humanities Technology and Research Support Center
Brigham Young University




More information about the use-livecode mailing list