More Unicode woes
Kjetil Rå Hauge
k.r.hauge at east.uio.no
Fri Jul 4 09:27:00 EDT 2003
I'm trying to script an application (on Mac OS X) that will enable
users to produce HTML pages in a number of non-latin-1 languages,
with utf-8. For this purpose, I obviously need to export Unicode text
to a file: "write the unicodetext of field "text" to file thefile".
The results are rather frustrating. Cyrillic seems to get translated
to two-byte sequences, but not the correct ones: the first letters of
the alphabet ("abvgd...") come out as "x0x1x2x3x4", where "x" is
ASCII 4. Characters from the CE fonts (Czech, Polish) and Greek cause
BBEdit to complain about a corrupted or malformed utf-8 file, while
Turkish special characters cause no such complaint (but are still
wrong).
I have tried to use the uniencode function instead, with the
",language" parameter, but with similar results. Using "binfile" also
does not change things.
If I export as htmltext instead of unicodetext, all four of these
types are recognised throught their fonts ("font face = "Times CE"
/"Times CY") and the characters are translated to correct HTML
numerical entiities, except for the CE fonts, which are translated as
if their font was changed to plain Times.
Also, when I try to use the character palette, the button "Insert" is
not dimmed, but does not work. (Microsoft Word with its miserable
Unicode support at least has the decency to dim it.)
--
---
Kjetil Rå Hauge, U. of Oslo, PO Box 1030 Blindern, N-0315 Oslo, Norway
Tel. +47/22856710, fax +47/22854140
More information about the use-livecode
mailing list