unicodeText problem with Japanese

Devin Asay devin_asay at byu.edu
Thu Mar 22 10:38:16 EDT 2007


On Mar 21, 2007, at 11:55 PM, Nicolas Cueto wrote:

> In trying to import Japanese text into a field using:
>
>   set the unicodeText of field "fKanji" to url "binfile:yek/AAA.txt"
>
> I am encountering a compound problem.
>
> The letters themselves of the Japanese text appear,
> but not the end of line markers, i.e., it's all one long
> line with cr's replaced by a blank -- which, using
> charToNumber seems to be the cr marker. (In
> addition, an unrecognizable character appears
> in the very first character position of the field, but
> that's a problem I can resolve with the "delete"
> function.)

I think what you are seeing here is a byte order map. To my knowledge  
Revolution doesn't handle BOMs, so for now it seems your approach, to  
just delete them is the right one.
>
> I tried cr and lf with my Japanese text editor(s), as
> well as different unicode encodings, but neither
> end-of-line marker is handled properly by the
> unicodeText function.
>
> So, as a workaround solution, I first used a text editor
> to substitute all cr's (or lr's) in the original text file with
> a unique Japanese character. Then, after importing
> that modified text with the unicodeText function, I used
> the "replace" function to replace all instances of that
> unique character with cr.
>
> This almost worked, i.e. the text was divided into lines.
> Only, one of the Japanese characters in the original text
> disappears after the "replace with", and, in its stead,
> a blank line appear.
>
> I would try a workaround for that, but there might be
> other Japanese characters that are likewise deleted
> or altered.

Have you tried replacing the cr character with a unicode-encoded  
return char? Something like this (not tested):

get the unicodeText of fld "fKanji"
replace cr with uniencode(cr,"ANSI") in it
set the unicodeText of fld "FKanji" to it

I've also sometimes had good luck with things like this by getting  
the htmlText of flds with unicode text in them and doing the replace  
on the htmlText.

Let us know what you find out.

Devin

Devin Asay
Humanities Technology and Research Support Center
Brigham Young University




More information about the use-livecode mailing list