Unicode: baby steps

Fraser Gordon fraser.gordon at livecode.com
Wed Aug 27 15:32:05 EDT 2014


On 27/08/2014 18:17, Graham Samuel wrote:
> Having forgotten all I ever knew about Unicode (it wasn't much), I am trying to understand Unicode in LC, and although I have heard about "just works" I am not sure how to proceed. For example, the code for pi (Greek letter, lower case) is apparently (via internet sources)
>
> U+03C0
>
> it also seems to be encoded as 960, but that's in HTML.
0x3C0 (hexadecimal) is 960 (decimal). For some reason, Unicode
codepoints (their name for a character) are normally given in hex.

> Suppose I want to display pi in a field, "glyphPi". What does the script look like? I've tried:
The approach depends on whether you are using 6.x or 7.0. In 7.0, you
can enter the pi symbol directly in the script editor or you can insert
it using numToCodepoint:

-- Note that you set text, not unicodeText in 7
set the text of field "fld" to numToCodepoint(0x3C0)

In 6.x, you'd have to do something like the following:

-- Will not work on PowerPC!
set the unicodeText of field "fld" to numToChar(0xC0) & numToChar(0x03)

The bytes are in "little-endian" order so the least-significant byte
comes first. Unless you are using a PowerPC machine (in which the bytes
come in the opposite order).

In short, if you want to use Unicode, 7.0 makes it far, far easier. At
least, I think so, but having worked on it for the past year, I might be
a little biased ;)

> oddly enough, all these appear to be legal, and all produce glyphs (some look like Kanji), but none of them are the symbol pi. Is this just a syntactical problem, or have I misunderstood the whole process?
The unicodeText of a field expects 16-bit quantities (rather than
bytes/characters) for each character and isn't smart enough to know
that's not what you're giving it. It interprets each pair of characters
in the string as these 16-bit quantities and ends up displaying random
characters (and, because the vast majority of characters in Unicode by
quantity are East Asian ideographic characters, you'll usually get
something resembling Chinese).

> And when I do get it right, can I copy this field to the clipboard and paste it into another field which will then be visible to a user in the same form? Early experiments suggest I can't, but it could just be the usual finger trouble.
In 7.0, Unicode should copy and paste just fine. I can't say for sure in
6.x - I haven't actually tried it!

Regards,
Fraser




More information about the use-livecode mailing list