OT - MySQL, PHP, and Japanese text
Dave Cragg
dcragg at lacscentre.co.uk
Thu Dec 1 07:01:29 EST 2005
On 30 Nov 2005, at 23:45, N Cueto wrote:
Good questions about Japanese and mySQL for which I hoped someone
would pipe in with some good answers. In the meantime, here's the
little I know.
>
> 2) at MySQL admin-level, does
> Japanese text require a special
> data type or data setting? It's
> varchar now.
"varchar" is the column "type", which is different from the character
set used. You can set the default character set for the database,
individual tables, and columns within tables. (including SJIS and utf8)
For more than you ever wanted to know:
http://dev.mysql.com/doc/refman/5.0/en/charset.html
So I guess the first thing is to try and find out the character set
of the database column or table in question. MySQL's default is Latin
1, but I guess if your hosting service is in Japan, they may have set
a different default.
There may also be other reasons for getting garbled texts. If you're
also filling the database fields with data ourself, you'll need to be
sure that the method used for inserting the data is comparable to
how you retrieve it. So for example, if you were using utf8 in the
database, and the text to insert comes from a Rev field whose
textfont is set to "Osaka,Japanese", you might do something like this
to get the text for inserting into the database:
put unidecode(the unicodeText of field "data",utf8) into tData
And then to put retrieved data back into a similar field:
put revDataFromQuery(,,gDbId,tSql) into tData
put uniencode(tData,utf8) into tData
set the unicodetext of field "data" to tData
As a little experiment, I tried storing Japanese text in a default
Latin 1field in mySQL. I had some success by storing and retrieving
it as UTF8, but not using Rev's UTF16 internal unicode format. The
reason for the UTF16 problem was that the unicode data included a
backslash character which mySQL treats specially in a Latin 1 field.
So what came out was different from what went in. The UTF8 approach
worked, but I did this with a limited amount of text, so there's no
guarantee that other pieces of UTF8 don't include problem characters.
Hopefully someone else will step in with something more lucid. :-)
By the way, did you compose your mail on some small and nifty
Japanese technology? The narrow text lines suggest it's smaller than
the clunky computer (made from granite) I use. Does it run Rev? :-)
Cheers
Dave
More information about the use-livecode
mailing list