Unicode and chunk expressions

Dar Scott dsc at swcp.com
Wed May 18 15:50:41 EDT 2005

On May 18, 2005, at 12:30 PM, pkc wrote:

> but with the ability to move from unicode to html to ascii and all the 
> way back again, you can start and end with UTF-16 (or start with GB5 
> and end up with UTF-16 if you want), which seems to me to work very 
> well.  That is: input characters with your normal method, get 
> everything to ascii codes for your internal app operation, and have 
> unicode (UTF-16) come out again at the user's end.

It is possible to work with the htmlText.  However, that does have its 
own problems.  You end up with pieces of elements you don't expect.  I 
ran a quick test with Tagalog and getting the last word got part of a 
font element and the trailing </p>.  Of course, since it is all ASCII, 
you can program around that.

As far as using UTF-8, there might be some gotchas in the high codes.  
I don't remember under what circumstances Revolution tries to to 
character conversions.  I ran a test on OS X and high codes don't seen 
to be involved in case insensitivity.

One advantage of using htmlText is the ability to easily display values 
in debugging and in other tests.

However, since htmlText is a proprietary format that might change, I 
would lean toward using UTF-8 in scripts.


     DSC (Dar Scott Consulting & Dar's Lab)
     Programming and software

More information about the Use-livecode mailing list