OT - MySQL, PHP, and Japanese text

Dave Cragg dcragg at lacscentre.co.uk
Thu Dec 1 07:01:29 EST 2005


On 30 Nov 2005, at 23:45, N Cueto wrote:

Good questions about Japanese and mySQL for which I hoped someone  
would pipe in with some good answers. In the meantime, here's the  
little I know.


>
> 2) at MySQL admin-level, does
> Japanese text require a special
> data type or data setting? It's
> varchar now.

"varchar" is the column "type", which is different from the character  
set used. You can set the default character set for the database,  
individual tables, and columns within tables. (including SJIS and utf8)

For more than you ever wanted to know:

    http://dev.mysql.com/doc/refman/5.0/en/charset.html

So I guess the first thing is to try and find out the character set  
of the database column or table in question. MySQL's default is Latin  
1, but I guess if your hosting service is in Japan, they may have set  
a different default.

There may also be other reasons for getting garbled texts. If you're  
also filling the database fields with data ourself, you'll need to be  
sure that  the method used for inserting the data is comparable to  
how you retrieve it. So for example, if you were using utf8 in the  
database, and the text to insert comes from a Rev field whose  
textfont is set to "Osaka,Japanese", you might do something like this  
to get the text for inserting into the database:

    put unidecode(the unicodeText of field "data",utf8) into tData

And then to put retrieved data back into a similar field:

   put revDataFromQuery(,,gDbId,tSql) into tData
   put uniencode(tData,utf8) into tData
   set the unicodetext of  field "data" to tData


As a little experiment, I tried storing Japanese text in a default  
Latin 1field in mySQL. I had some success by storing and retrieving  
it as UTF8, but not using Rev's UTF16 internal unicode format. The  
reason for the UTF16 problem was that the unicode data included a  
backslash character which mySQL treats specially in a Latin 1 field.  
So what came out was different from what went in. The UTF8 approach  
worked, but I did this with a limited amount of text, so there's no  
guarantee that other pieces of UTF8 don't include problem characters.

Hopefully someone else will step in with something more lucid. :-)

By the way, did you compose your mail on some small and nifty  
Japanese technology? The narrow text lines suggest it's smaller than  
the clunky computer (made from granite) I use. Does it run Rev? :-)

Cheers
Dave





More information about the use-livecode mailing list