Unicode: baby steps

Graham Samuel livfoss at mac.com
Wed Aug 27 16:03:45 EDT 2014


Fraser, that's very very helpful, although still quite mysterious to me. For example I thought Unicode was cleverer than just having a two-byte representation for everything, but allowed single byte representation for the 'lower end' of the catalogue of characters. However I quite see that anything that won't fit into a single byte must occupy two bytes.

I am interested in your statement that in LC 7 you can enter the pi symbol directly in the script editor - doesn't that depend on having a way or representing that symbol, either via a palette or via a combination of keys? No keyboard will cover the whole spectrum of Unicode, so this part I find very difficult to understand. If for example I really wanted to type some Kanji in the middle of a text in a European language, I understand that LC would allow it in the sense of allowing strings rich enough to store it and presumably display technology rich enough to display it, but in practice how would I get such input? My research is really directed towards this sort of question, and the palette seems to be the only viable answer. I wonder if you agree.

I have avoided 7 as I have had enough problems with my current project without wondering what a new engine would do to it, but now I see that if I am to use even one Unicode character, I would be better off testing in the available LC 7 versions.

I will study everything else you have written most carefully.

Thanks again

Graham

On 27 Aug 2014, at 21:32, Fraser Gordon <fraser.gordon at livecode.com> wrote:

> On 27/08/2014 18:17, Graham Samuel wrote:
>> Having forgotten all I ever knew about Unicode (it wasn't much), I am trying to understand Unicode in LC, and although I have heard about "just works" I am not sure how to proceed. For example, the code for pi (Greek letter, lower case) is apparently (via internet sources)
>> 
>> U+03C0
>> 
>> it also seems to be encoded as 960, but that's in HTML.
> 0x3C0 (hexadecimal) is 960 (decimal). For some reason, Unicode
> codepoints (their name for a character) are normally given in hex.
> 
>> Suppose I want to display pi in a field, "glyphPi". What does the script look like? I've tried:
> The approach depends on whether you are using 6.x or 7.0. In 7.0, you
> can enter the pi symbol directly in the script editor or you can insert
> it using numToCodepoint:
> 
> -- Note that you set text, not unicodeText in 7
> set the text of field "fld" to numToCodepoint(0x3C0)
> 
> In 6.x, you'd have to do something like the following:
> 
> -- Will not work on PowerPC!
> set the unicodeText of field "fld" to numToChar(0xC0) & numToChar(0x03)
> 
> The bytes are in "little-endian" order so the least-significant byte
> comes first. Unless you are using a PowerPC machine (in which the bytes
> come in the opposite order).
> 
> In short, if you want to use Unicode, 7.0 makes it far, far easier. At
> least, I think so, but having worked on it for the past year, I might be
> a little biased ;)
> 
>> oddly enough, all these appear to be legal, and all produce glyphs (some look like Kanji), but none of them are the symbol pi. Is this just a syntactical problem, or have I misunderstood the whole process?
> The unicodeText of a field expects 16-bit quantities (rather than
> bytes/characters) for each character and isn't smart enough to know
> that's not what you're giving it. It interprets each pair of characters
> in the string as these 16-bit quantities and ends up displaying random
> characters (and, because the vast majority of characters in Unicode by
> quantity are East Asian ideographic characters, you'll usually get
> something resembling Chinese).
> 
>> And when I do get it right, can I copy this field to the clipboard and paste it into another field which will then be visible to a user in the same form? Early experiments suggest I can't, but it could just be the usual finger trouble.
> In 7.0, Unicode should copy and paste just fine. I can't say for sure in
> 6.x - I haven't actually tried it!
> 
> Regards,
> Fraser
> 
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode





More information about the use-livecode mailing list