Unicode: LC 7.0 - PHP - MySQL?

Peter W A Wood peterwawood at gmail.com
Thu Oct 30 03:29:54 EDT 2014


Hello Tiemo

I'm not sure that I have all the answers you are looking for but I hope this will help a little. It is a simplification though to try and make things understandable.

> On 30 Oct 2014, at 00:11, Tiemo Hollmann TB <toolbook at kestner.de> wrote:
> 
> Hello,
> 
> I have a LC 6 program communicating through PHP with a MySQL db. Because my
> background about Unicode, PHP and MySQL is limited I wonder what I have to
> care about, when migrating to LC 7.
> 
> I have read the release notes of LC 7. My limited thinking was, that UniCode
> really has a unique code for each sign on the planet. But why is there a
> UTF-8 / UTF-16.

Yes, UniCode does have a unique code for (almost) every sign on the planet. The unique codes are known as codepoints. There are so many of them that the numbers allocated to them exceed the maximum size of one "character" (or even two). They do all fit into 4 characters though. UTF-8 is a way of storing the Unicode codepoints in single (8-bit) "characters". It may take one, two or three of them to store a Unicode codepoint. UTF-16 is a way of storing Unicode codepoints in double (16-bit) "characters". The vast majority of Unicode codepoints fit into a single "double character", some take two.


> Which one is LC using?

Internally LiveCode uses UTF-16.

> Which one is my MySQL db using?

I suspect that MySQL normally uses UTF-8, I'm sure one of the database experts can correct me if I am wrong. However, what is most important is which encoding system was used to create the data in the first place.

> Idon't find any information about UTF-8/16 in my db description. How is the
> collation of the db related to UTF-x and to LC?. My tables are collated in
> ascii_general_ci. In some of my PHPs a "COLLATE latin1_swedish_ci" is used.
> I have no idea why this Swedish collation is in my german PHP and how it can
> be compatible with my ascii_general_ci DB. (The PHPs are made by third
> party)

This suggests that your PHP scripts are expecting the data they received to be have "encoded" in a specific system, either "ascii_general_ci" or "latin1_swedish_ci". They are different from Unicode.

> 
> What do I have to change in my LC program when migrating to 7. Where to
> start? 

As best as I understand, you will need to use textDecode to convert any external text (ie from PHP or MySQL) to LiveCode text. You will need to use textEncode to convert any LiveCode text to send to any external source.

> Is LCs Unicode really the magic thing, where I don't have to care about any
> charset related thing and all my thinking is just waste?

Internally, it really is magic (from my point of view). Sadly the magicians in Edinburgh have yet to come up with a spell that magically converts somebody else's data to LiveCode Text (UTF16). They need a little help from us users to tell them how the external text is encoded. (That has always been the case when combining differently encoded text data since LiveCode 1).

Regards

Peter





More information about the use-livecode mailing list