Why do I still need MacToISO, when working with UTF-8?
Mark Waddingham
mark at livecode.com
Mon Jan 16 12:45:43 EST 2017
>> Am 16.01.2017 um 18:30 schrieb Mark Waddingham via use-livecode
>> <use-livecode at lists.runrev.com
>> <mailto:use-livecode at lists.runrev.com>>:
>>
>>
>> Sure - here is how I'd slightly adjust Tiemo's code:
>>
>> *put fld "name" into myName*
>> -- ...
>> *open file myFile for binary write*
>> *write textEncode(myName, "utf8") to file myFile*
>> *close file myFile*
>> -- ...
>> *open file myFile for binary read*
>> *read from file myFile until EOF*
>> *close file myFile*
>> *put textDecode(it, "utf8") into myName*
>
> I always thought, that binary reading a text file would result into a
> string with the same encoding and line endings.
When you read a file in binary mode, what you get is binary data *not*
text - i.e. it is just a sequence of bytes. The engine cannot tell by
just looking at the bytes what it could be therefore you have to
explicitly convert it to something - in this case we convert the
sequence of bytes to text by interpreting the bytes as UTF-8.
One of the biggest changes from 6 to 7 is that binary strings and text
strings are no longer the same thing.
Prior to 7, the engine didn't really 'know' anything about Unicode - the
field did to a certain degree, but nothing else - and it assumed that
binary strings and text strings were the same thing. Indeed, on Mac the
engine would assume that a binary string could be treated as a MacRoman
encoded string (as MacRoman is one byte, one char); and on Windows/Linux
it would assume that a binary string could be treated as a Latin-1
encoded string (also a one byte, one char encoding).
This equivalence has been retained in 7 from 6 - which is why stacks
written in 6 work exactly the same as they do in 7. Specifically, there
is an implicit auto conversion between binary strings and text strings
using the platform encoding:
put <binary data> into tVar
put "foobar" after tVar
In the second line here, the engine will first convert tVar to a text
string (assuming MacRoman encoding on Mac) then append "foobar".
> So when i binary read UTF8 files i still have to textDecode it to
> UTF8?
Yes - because if you read something as binary, then it is just that -
binary - it has no structure and is just a sequence of bytes.
A perhaps more obviously example is that you have to explicitly
decompress data which has been compress'd and explicitly arrayDecode
data which has been arrayEncode'd. When it is just data, the engine
doesn't know what it could be so the code processing it has to
explicitly specify a conversion.
Warmest Regards,
Mark.
--
Mark Waddingham ~ mark at livecode.com ~ http://www.livecode.com/
LiveCode: Everyone can create apps
More information about the use-livecode
mailing list