the mouseText and Unicode: a 3-char puzzle
dave.cragg at lacscentre.co.uk
Wed Jun 22 01:38:53 CDT 2011
On 21 Jun 2011, at 07:40, Slava Paperno wrote:
> VAR UTF-8
> The FIELD and the VAR UTF-16 reports are entirely predictable, but the VAR
> UTF-8 list is puzzling to me. I expected six bytes, not seven.
I didn't follow the earlier thread, so apologies if I'm not helping here.
You said you were puzzled by the UTF-8 list having seven bytes. But unicode characters in UTF-8 may be from 1 to 5 bytes long. The values of the bytes give a hint to what they represent. A byte value between 192 and 223 is the first byte in a 2-byte character. And a byte value between 224 and 239 is the first byte in a 3-byte character. So in this case, the 226 value is the beginning of the 3-byte sequence for em-dash.
More information about the use-livecode