the mouseText and Unicode: a 3-char puzzle

Dave Cragg dave.cragg at lacscentre.co.uk
Wed Jun 22 02:38:53 EDT 2011


On 21 Jun 2011, at 07:40, Slava Paperno wrote:

> VAR UTF-8
> 194
> 171
> 226
> 128
> 148
> 194
> 187
> 
> The FIELD and the VAR UTF-16 reports are entirely predictable, but the VAR
> UTF-8 list is puzzling to me. I expected six bytes, not seven.

I didn't follow the earlier thread, so apologies if I'm not helping here.

You said you were puzzled by the UTF-8 list having seven bytes. But unicode characters in UTF-8 may be from 1 to 5 bytes long. The values of the bytes give a hint to what they represent. A byte value between 192 and 223 is the first byte in a 2-byte character. And a byte value between 224 and 239 is the first byte in a 3-byte character. So in this case, the 226 value is the beginning of the 3-byte sequence for em-dash.

Cheers
Dave



More information about the use-livecode mailing list