Wanted: idiot's guide to using Unicode in Rev
Ben Rubinstein
benr_mc at cogapp.com
Wed Dec 8 09:11:57 EST 2004
Thanks to Ron Barber for some helpful responses to my previous mail. I'm
making very slow and stumbling progress - still, progress! Now I've hit
another area in which my incomprehension may be mixing with Rev bugs.
Working with a field which has interited Geneva as it's font. Set the
htmltext of a field to some text with a unicode character, eg
<p>Hell<font lang="el">ō</font>, world.</p><p>Goodbye.</p>
Expected is a single unicode character (o-macron), in otherwise English
text, ie
Hell<o-macron>, world.
Goodbye.
Result is that all the text from the unicode character to the end of the
paragraph is displayed as Japanese characters, ie
Hell<o-macron><japananese characters>
Goodbye.
I can construct the desired behaviour by inserting the character after the
rest of the text has been set; that is, first I set the field to the text
Hello, world.
Goodbye.
then execute the statements
put "<font lang=" & quote & "el" & quote & ">ō</font>" into x
set the htmlText of char 5 of fld 1 to x
Then I get the desired appearance. I then ask for the htmlText of the whole
field, I get the string I started with, that is
<p>Hell<font lang="el">ō</font>, world.</p><p>Goodbye.</p>
In other words, setting the htmlText of the field to the htmlText of the
field changes it (rendering all the characters after the o-macron to
japanese characters. Is this a bug? Is it a known bug? Is it in bugzilla?
Is there a workaround?
Note that the same does not occur with unicodeText - eg given a field
constructed as above, the statement
set the unicodeText of fld 1 to the unicodeText of fld 1
doesn't change the text (but of course it does change any style attributes,
so this isn't by itself a solution to my problem.)
Also note, attempted workaround: explicitly changing the font of the next
character works, ie setting the field to:
<p>Hell<font lang="el">ō</font><font language="en">,</font>
world.</p><p>Goodbye.</p>
so forcing the comma that immediately follows o-macron back to english
works; but this isn't a great solution in my general case, as the next
character might be anything - a plain character, another unicode entity, the
opening of another markup tag. Coding for the general case would be a real
PITA. (I also tried just using the font tags without enclosing a character,
that is
<p>Hell<font lang="el">ō</font><font language="en"></font>,
world.</p><p>Goodbye.</p>
Sadly this didn't work!
Any help, tips, pointers to documentation, or answers to the specific
questions above would be very gratefully received.
Ben Rubinstein | Email: benr_mc at cogapp.com
Cognitive Applications Ltd | Phone: +44 (0)1273-821600
http://www.cogapp.com | Fax : +44 (0)1273-728866
More information about the use-livecode
mailing list