Why do I still need MacToISO, when working with UTF-8?

Mark Waddingham mark at livecode.com
Mon Jan 16 12:45:43 EST 2017


>> Am 16.01.2017 um 18:30 schrieb Mark Waddingham via use-livecode 
>> <use-livecode at lists.runrev.com 
>> <mailto:use-livecode at lists.runrev.com>>:
>> 
>> 
>> Sure - here is how I'd slightly adjust Tiemo's code:
>> 
>> *put fld "name" into myName*
>> -- ...
>> *open file myFile for binary write*
>> *write textEncode(myName, "utf8") to file myFile*
>> *close file myFile*
>> -- ...
>> *open file myFile for binary read*
>> *read from file myFile until EOF*
>> *close file myFile*
>> *put textDecode(it, "utf8") into myName*
> 
> I always thought, that binary reading a text file would result into a
> string with the same encoding and  line endings.

When you read a file in binary mode, what you get is binary data *not* 
text - i.e. it is just a sequence of bytes. The engine cannot tell by 
just looking at the bytes what it could be therefore you have to 
explicitly convert it to something - in this case we convert the 
sequence of bytes to text by interpreting the bytes as UTF-8.

One of the biggest changes from 6 to 7 is that binary strings and text 
strings are no longer the same thing.

Prior to 7, the engine didn't really 'know' anything about Unicode - the 
field did to a certain degree, but nothing else - and it assumed that 
binary strings and text strings were the same thing. Indeed, on Mac the 
engine would assume that a binary string could be treated as a MacRoman 
encoded string (as MacRoman is one byte, one char); and on Windows/Linux 
it would assume that a binary string could be treated as a Latin-1 
encoded string (also a one byte, one char encoding).

This equivalence has been retained in 7 from 6 - which is why stacks 
written in 6 work exactly the same as they do in 7. Specifically, there 
is an implicit auto conversion between binary strings and text strings 
using the platform encoding:

     put <binary data> into tVar
     put "foobar" after tVar

In the second line here, the engine will first convert tVar to a text 
string (assuming MacRoman encoding on Mac) then append "foobar".

> So when i binary read UTF8 files  i still have to textDecode it to 
> UTF8?

Yes - because if you read something as binary, then it is just that - 
binary - it has no structure and is just a sequence of bytes.

A perhaps more obviously example is that you have to explicitly 
decompress data which has been compress'd and explicitly arrayDecode 
data which has been arrayEncode'd. When it is just data, the engine 
doesn't know what it could be so the code processing it has to 
explicitly specify a conversion.

Warmest Regards,

Mark.

-- 
Mark Waddingham ~ mark at livecode.com ~ http://www.livecode.com/
LiveCode: Everyone can create apps




More information about the use-livecode mailing list