Unicode from Variable?

Devin Asay devin_asay at byu.edu
Tue Nov 25 10:21:06 EST 2008


Hi guys,

Wow, where to start? I completely understand your confusion and feel  
your pain. When I started with unicode I felt lost, too. Here's a  
short list of Aha! points that should help. (Caveat--I'm describing my  
understanding purely from a developer perspective; I have little  
understanding about how Rev implements unicode "under the hood").

- When we talk about unicode in Rev, we're talking about UTF-16, not  
UTF-8 or UTF-32.

- The current implementation of unicode is not perfect, but it is  
perfectly usable. (Right-to-left languages are still problematic,  
especially if you need to support user input. Display of same is  
usually fine.)

- The useUnicode property has very limited application. It only  
affects the behavior of the charToNum and numToChar functions. If  
useUnicode is false, these 2 functions behave as we're accustomed; if  
true, these 2 functions assume two byte characters instead of 1 byte.

- The byte order in which unicode files are stored is dependent upon  
the processor in the host machine. That means that if you're  
transferring unicode files from, say, a PPC-based machine to an Intel- 
based one, UTF-16 files will be scrambled unless you invert the bytes  
as you read them in.

- In light of the above, it's usually best to store unicode text as  
UTF-8 or even htmlText. These have been the most reliable transfer  
formats for me.

- In a Rev field unicode and ascii get mixed up all the time. For  
instance, characters that normally fall within the ascii range, like  
space, return and common punctuation, are considered ascii. While this  
can be confusing, it does ensure that normal Rev chunk expressions  
work as expected.

- There is no 100% reliable way I know of to look at a file and  
determine heuristically whether it's unicode, or what flavor of  
unicode it is.

- The section on unicode in the Rev User Guide (section 6.4) is pretty  
good as far as it goes, but doesn't cover all the "gotchas".

- Dealing with unicode in text fields is different that in buttons and  
menus.

Anyhow, those are some of the key points. For a more in depth  
discussion, see my Unicode presentation from RevLive if you've got the  
DVD. Failing that, you're welcome to read my presentation notes at:

http://asay.byu.edu/revUnicode.pdf

The stack I used in that presentation, which shows lots of examples,  
is at:

go url "http://asay.byu.edu/unicode-RevLive08.rev"

I'm happy to help if you still have specific issues after you look at  
this stuff. Unicode is doable, once you learn the tricks and pitfalls.

Regards,

Devin


On Nov 24, 2008, at 6:45 PM, Scott Rossi wrote:

> Recently, Phil Davis wrote:
>
>> Thanks for asking the questions, Scott. I'm interested in clarity  
>> here
>> too since I'll be working with Arabic again in the next few months,  
>> and
>> am still a Unicode lightweight.
>
> You want questions?  I got a truck-load of 'em...
>
> For instance...  I have characters from several languages in the  
> text I'm
> working with: Roman, French (accented), Chinese, and Russian.  When  
> I set
> the unicodeText of a field to the text, the accented French characters
> render incorrectly.  Looking in the source text file, it appears the
> original French characters may have been reformatted when saving the  
> file as
> UTF-16.  Is there any way to keep the French characters intact  
> within the
> unicode text?
>
> Thanks & Regards,
>
> Scott Rossi
> Creative Director
> Tactile Media, Multimedia & Design
>
>
> _______________________________________________
> use-revolution mailing list
> use-revolution at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your  
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-revolution

Devin Asay
Humanities Technology and Research Support Center
Brigham Young University




More information about the use-livecode mailing list