Unicode from Variable?
Devin Asay
devin_asay at byu.edu
Tue Nov 25 10:21:06 EST 2008
Hi guys,
Wow, where to start? I completely understand your confusion and feel
your pain. When I started with unicode I felt lost, too. Here's a
short list of Aha! points that should help. (Caveat--I'm describing my
understanding purely from a developer perspective; I have little
understanding about how Rev implements unicode "under the hood").
- When we talk about unicode in Rev, we're talking about UTF-16, not
UTF-8 or UTF-32.
- The current implementation of unicode is not perfect, but it is
perfectly usable. (Right-to-left languages are still problematic,
especially if you need to support user input. Display of same is
usually fine.)
- The useUnicode property has very limited application. It only
affects the behavior of the charToNum and numToChar functions. If
useUnicode is false, these 2 functions behave as we're accustomed; if
true, these 2 functions assume two byte characters instead of 1 byte.
- The byte order in which unicode files are stored is dependent upon
the processor in the host machine. That means that if you're
transferring unicode files from, say, a PPC-based machine to an Intel-
based one, UTF-16 files will be scrambled unless you invert the bytes
as you read them in.
- In light of the above, it's usually best to store unicode text as
UTF-8 or even htmlText. These have been the most reliable transfer
formats for me.
- In a Rev field unicode and ascii get mixed up all the time. For
instance, characters that normally fall within the ascii range, like
space, return and common punctuation, are considered ascii. While this
can be confusing, it does ensure that normal Rev chunk expressions
work as expected.
- There is no 100% reliable way I know of to look at a file and
determine heuristically whether it's unicode, or what flavor of
unicode it is.
- The section on unicode in the Rev User Guide (section 6.4) is pretty
good as far as it goes, but doesn't cover all the "gotchas".
- Dealing with unicode in text fields is different that in buttons and
menus.
Anyhow, those are some of the key points. For a more in depth
discussion, see my Unicode presentation from RevLive if you've got the
DVD. Failing that, you're welcome to read my presentation notes at:
http://asay.byu.edu/revUnicode.pdf
The stack I used in that presentation, which shows lots of examples,
is at:
go url "http://asay.byu.edu/unicode-RevLive08.rev"
I'm happy to help if you still have specific issues after you look at
this stuff. Unicode is doable, once you learn the tricks and pitfalls.
Regards,
Devin
On Nov 24, 2008, at 6:45 PM, Scott Rossi wrote:
> Recently, Phil Davis wrote:
>
>> Thanks for asking the questions, Scott. I'm interested in clarity
>> here
>> too since I'll be working with Arabic again in the next few months,
>> and
>> am still a Unicode lightweight.
>
> You want questions? I got a truck-load of 'em...
>
> For instance... I have characters from several languages in the
> text I'm
> working with: Roman, French (accented), Chinese, and Russian. When
> I set
> the unicodeText of a field to the text, the accented French characters
> render incorrectly. Looking in the source text file, it appears the
> original French characters may have been reformatted when saving the
> file as
> UTF-16. Is there any way to keep the French characters intact
> within the
> unicode text?
>
> Thanks & Regards,
>
> Scott Rossi
> Creative Director
> Tactile Media, Multimedia & Design
>
>
> _______________________________________________
> use-revolution mailing list
> use-revolution at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-revolution
Devin Asay
Humanities Technology and Research Support Center
Brigham Young University
More information about the use-livecode
mailing list