why one char in UTF8 (3 bytes) converted to UTF16 becomes 6 bytes?

Kee Nethery kee at kagi.com
Wed Mar 30 10:13:25 EDT 2011


On Mar 30, 2011, at 12:05 AM, Dave Cragg wrote:

> 
> On 30 Mar 2011, at 02:30, Kee Nethery wrote:
> 
>> I have the don't sign symbol (Combining enclosing circle backslash) in a text file that I read into livecode. For grins, the character between "Petro" and "Max" seen below.
>> 
>> Petro⃠Max
>> 
>> When I scan the bytes, in UTF8, this is encoded as: 226 131 160 also known as E2 83 A0. This is the correct UTF8 encoding for this character.
>> 
>> When I convert this to UTF16 using
>> 
>> uniencode(theUtf8Text) or uniencode(theUtf8Text,"UTF16") the byte values are: 26 32 201 0 32 32
> 
> 
> Shouldn't that be uniEncode(theUtf8Text, "UTF8") ?

Good question. I always get confused with encode decode functions, I much rather prefer a convert function where I specify what it is and what I want. For example:

replace space with "_" in "This is a bunch of text"

I'm pretty sure the second parameter is what you want it encoded into and it figured out what it has but you could be correct. I'll test.

Kee







More information about the use-livecode mailing list