AW: Why do I still need MacToISO, when working with UTF-8?

Mark Waddingham mark at livecode.com
Mon Jan 16 11:41:39 EST 2017


Hi Tiemo,

Okay so, I'm assuming that all this code is running on the Mac client...

> *put fld "name" into myName*

At this point myName contains a (text) string - thus encoding issues 
don't exist (you should think of text strings in memory as being stored 
in an 'encoding neutral' format).

> *open file myFile for binary write*
> *write myName to file myFile*
> *close file myFile*

This piece of code will open a file on disk in the native encoding of 
the platform - so MacRoman. It will convert this from the internal 
encoding to MacRoman on writing. Thus your text file will be a MacRoman 
encoded text file.

> *open file myFile for binary read*
> *read from file myFile until EOF*
> *close file myFile*
> *put it into myName*

This piece of code will read from a file on disk and assume that it is 
in the native encoding of the platform - so, in this case, MacRoman. It 
will convert the content of the file from that to the internal encoding.

Up to this point - because you saved and loaded the file on the same 
platform the content of myName should be as you expect -- unchanged.

> *if the platform is "MacOS" then put macToISO(theName) into theName*

When run on Mac this line will execute and do the following:

    1) Convert theName to a binary string - this uses the native platform 
encoding (MacRoman)
    2) Map each byte from the MacRoman code index to the ISO Latin-1 code 
index

This essentially converts theName from a text string to a binary string 
encoded in Latin-1.

> *put URL ("http://myUser:myPW@myURL" & "mySQL.php?" & 
> URLEncode(theName))
> into rslt*

This line constructs the URL - it is making the assumption that PHP (at 
the other end) will interpret the bytes after the '?' as representing 
Latin-1 encoded text.

> Without macToISO on a Mac client theName enters corrupted in the mySQL 
> db

This is most likely because PHP is defaulting to 8859-1 or Latin-1 as 
the encoding used in URLEncoded fields in a URL. If you don't do 
MacToIso, then you will be passing up MacRoman encoded text (URLencoded) 
to PHP, which can happily be decoded as Latin-1 or 8859-1 (Latin-1 is a 
superset of 8859-1), but with some chars (such as accented letters) in 
different places.

What you need to do here is explicitly UTF8 encode theName before 
passing it to URLEncode, then explicitly decode it as UTF8 on the PHP 
side (or set a property in PHP which changes the default assumption 
about URLs - I apologise for not being more accurate here, my knowledge 
of PHP is a little stale these days!).

Warmest Regards,

Mark.

-- 
Mark Waddingham ~ mark at livecode.com ~ http://www.livecode.com/
LiveCode: Everyone can create apps




More information about the use-livecode mailing list