Cyrillic input

Slava Paperno slava at lexiconbridge.com
Wed Jun 1 09:56:07 EDT 2011


Malte,

As I said, I'm discovering these things as I go--I hadn't even heard of LC
until last month. I'm finding that work with Unicode in LC involves a lot of
jumping through hoops, but so far I have been able to do everything I
needed. So don't give up :)

I am not sure why your stack doesn't "know" whether the text in your field
is UTF-16 or plain ANSI, but here is what I do: 

When I read some text from a file into a variable, I assume that it is
UTF-8. There is no harm in that. Even if it turns out to be plain English,
it can still be treated that way.

When I assign that text to a field, I always use 

set the unicodeText of field MyField to uniEncode(myVar, "UTF8")

Now the text in the field is UTF-16. I check to see if the first two bytes
are decimal 255 followed by decimal 254 (or the reverse, 254 followed by
255), and if they are, I delete them, because that's BOM.

I can read and edit the field using the system's multilanguage input, like
the Russian keyboard in Windows. Russian and English can be typed in any
combination, but it is still all UTF-16. Each letter and each punctuation
mark is a two-byte sequence. If you call length(the unicodeText of field
MyField) it will report twice the number of characters that you see in the
field.

So if I have to access character N in the field, I do this:

set useUnicode to true
put char N to char N+1 of field MyField into myChar
answer charToNum(myChar)
That will show you a decimal number, like 1072 if myChar is a lower case
Cyrillic a or an ASCII number if it is an English letter.

Even plain English letters must be accessed like that, as two bytes. For
English, the first byte is a null, and the second is the ASCII of the
letter, but you don't need to think of that. Just treat every letter as a
two-char sequence.

If the user types in that field, what he types is in UTF-16.

If I need to do anything with the text in the field, like store it to a
file, I read it into a variable:

put the unicodeText of field MyField into myVar2

and immediately convert it to UTF-8: 

put uniDecode(myVar2, "UTF16") into myVar2

Now myVar2 is UTF-8 and can be stored in a file or processed by scripts.

There are apparently limitations to what you can do with Cyrillic in LC, but
the things that I have listed all work for me.

Slava 

> -----Original Message-----
> From: use-livecode-bounces at lists.runrev.com [mailto:use-livecode-
> bounces at lists.runrev.com] On Behalf Of Malte Brill
> Sent: Wednesday, June 01, 2011 9:23 AM
> To: use-livecode at lists.runrev.com
> Subject: Re: Re: Cyrillic input
> 
> Thanks mark and Slava!
> 
> well, this is getting me a bit further. Now if only I knew if I could
reliably check if
> the text in my field regular ASCII or UTF encoded, that would really make
my
> day.
> 
> Cheers,
> 
> malte
> 






More information about the use-livecode mailing list