Wanted: idiot's guide to using Unicode in Rev

Ben Rubinstein benr_mc at cogapp.com
Thu Dec 2 18:55:58 EST 2004


I've just fiddled (and part of my problem is that I don't know any non-roman
languages, so have very limited capability to understand if I'm doing
something right).

I have an app in which users edit text in a field; simple controls let them
apply simple styling (bold/italic/both) and some more perverse markup; a
popup stack menu assists in entering intering interesting non-roman
characters.

To date these have all been characters accomodated in the 8-bit ISO 8859 1
character set - now I need to handle some which come from extended code
blocks, for example o-macron, that is a latin small letter o with a
horizontal bar above.

I set the unicodeText of a field to the hex bytes 59 01 4D, which should be
"Y", followed by o-macron.  I got some interesting japanese characters in
the field.  What was going on there?  Should I have set the font to
something first, to avoid misinterpretation - but surely the whole point of
saying that it's unicode is that I'm specifying code points, there's no
interpretation needed.

And if I do have to set a font, how do I know which one to specify?  The
documentation talks about "Unicode fonts", and mentions "Osaka,Japanese" as
an example.  How can I find out which Unicode fonts are available - on my
system, on my user's system?  And if I found them, would it tell me which
language they relate to?  And in any case, how should I know which language
o-macron relates to?

If I construct an HTML file to include the text
    Yō
or
    Yō

and open this in a browser, I see in each case a capital Y followed by an
o-macron.  If I set the htmltext of a Rev field to the first of these, it is
reflected exactly, ie "Yō".  If I set the htmlText of the field to
the second, it appears as "YM".

If I copy the text from the browser ("Y<o-macron>"), and paste it into the
field, it appears correctly.  If I then get the htmlText of the field, it
is:
    <p>Y<font face="Arial" lang="el">ō</font></p>

or sometimes I've got
    <p>Y<font face="Lucida Grande" lang="el">ō</font></p>

Again, can some kind soul help me understand the relation of fonts to using
Unicode?  If I want to set a character in a field to one outside the first
255 Unicode code points, do I need to do so in the context of a font?

Another specific question: can I set the label of a button to a unicode
string, or more generally to anything which requires more than simple roman
character set?

TIA,
 
  Ben Rubinstein               |  Email: benr_mc at cogapp.com
  Cognitive Applications Ltd   |  Phone: +44 (0)1273-821600
  http://www.cogapp.com        |  Fax  : +44 (0)1273-728866



More information about the use-livecode mailing list