Wanted: idiot's guide to using Unicode in Rev

Ben Rubinstein benr_mc at cogapp.com
Wed Dec 8 09:11:57 EST 2004


Thanks to Ron Barber for some helpful responses to my previous mail.  I'm
making very slow and stumbling progress - still, progress!  Now I've hit
another area in which my incomprehension may be mixing with Rev bugs.

Working with a field which has interited Geneva as it's font. Set the
htmltext of a field to some text with a unicode character, eg
    <p>Hell<font lang="el">ō</font>, world.</p><p>Goodbye.</p>

Expected is a single unicode character (o-macron), in otherwise English
text, ie
        Hell<o-macron>, world.
        Goodbye.

Result is that all the text from the unicode character to the end of the
paragraph is displayed as Japanese characters, ie
        Hell<o-macron><japananese characters>
        Goodbye.

I can construct the desired behaviour by inserting the character after the
rest of the text has been set; that is, first I set the field to the text
        Hello, world.
        Goodbye.

then execute the statements
    put "<font lang=" & quote & "el" & quote & ">ō</font>" into x
    set the htmlText of char 5 of fld 1 to x

Then I get the desired appearance.  I then ask for the htmlText of the whole
field, I get the string I started with, that is
    <p>Hell<font lang="el">ō</font>, world.</p><p>Goodbye.</p>

In other words, setting the htmlText of the field to the htmlText of the
field changes it (rendering all the characters after the o-macron to
japanese characters. Is this a bug?  Is it a known bug?  Is it in bugzilla?
Is there a workaround?

Note that the same does not occur with unicodeText - eg given a field
constructed as above, the statement
    set the unicodeText of fld 1 to the unicodeText of fld 1

doesn't change the text (but of course it does change any style attributes,
so this isn't by itself a solution to my problem.)

Also note, attempted workaround: explicitly changing the font of the next
character works, ie setting the field to:

    <p>Hell<font lang="el">ō</font><font language="en">,</font>
world.</p><p>Goodbye.</p>

so forcing the comma that immediately follows o-macron back to english
works; but this isn't a great solution in my general case, as the next
character might be anything - a plain character, another unicode entity, the
opening of another markup tag.  Coding for the general case would be a real
PITA.  (I also tried just using the font tags without enclosing a character,
that is

    <p>Hell<font lang="el">ō</font><font language="en"></font>,
world.</p><p>Goodbye.</p>

Sadly this didn't work!

Any help, tips, pointers to documentation, or answers to the specific
questions above would be very gratefully received.

 
  Ben Rubinstein               |  Email: benr_mc at cogapp.com
  Cognitive Applications Ltd   |  Phone: +44 (0)1273-821600
  http://www.cogapp.com        |  Fax  : +44 (0)1273-728866



More information about the use-livecode mailing list