More Unicode woes

Kjetil Rå Hauge k.r.hauge at east.uio.no
Fri Jul 4 09:27:00 EDT 2003


I'm trying to script an application (on Mac OS X) that will enable 
users to produce HTML pages in a number of non-latin-1 languages, 
with utf-8. For this purpose, I obviously need to export Unicode text 
to a file: "write the unicodetext of field "text" to file thefile".

The results are rather frustrating. Cyrillic seems to get translated 
to two-byte sequences, but not the correct ones: the first letters of 
the alphabet ("abvgd...") come out as "x0x1x2x3x4", where "x" is 
ASCII 4. Characters from the CE fonts (Czech, Polish) and Greek cause 
BBEdit to complain about a corrupted or malformed utf-8 file, while 
Turkish special characters cause no such complaint (but are still 
wrong).

I have tried to use the uniencode function instead, with the 
",language" parameter, but with similar results. Using "binfile" also 
does not change things.

If I export as htmltext instead of unicodetext, all four of these 
types are recognised throught their fonts ("font face = "Times CE" 
/"Times CY") and the characters are translated to correct HTML 
numerical entiities, except for the CE fonts, which are translated as 
if their font was changed to plain Times.

Also, when I try to use the character palette, the button "Insert" is 
not dimmed, but does not work. (Microsoft Word with its miserable 
Unicode support at least has the decency to dim it.)
-- 
---
Kjetil Rå Hauge, U. of Oslo, PO Box 1030 Blindern, N-0315 Oslo, Norway
Tel. +47/22856710, fax +47/22854140



More information about the use-livecode mailing list