Unicode Chinese Mac

Dar Scott dsc at swcp.com
Tue May 10 19:12:30 EDT 2005


On May 10, 2005, at 1:53 PM, Dar Scott wrote:

> You can't use =, is a number, contains, line, item, foundchunk, filter 
> (except for a trick), find, +, -, /, *, add, subtract, offset (except 
> with extra scripting), and just about anything.

But as was pointed out earlier, you get some gain by using htmlText 
instead of unicodeText.

Also, UTF8 will work OK for words (usually), items and lines.  Not 
chars; you have to remember that all characters outside of the ASCII 
range are represented by multiple bytes.  The cool thing is that ASCII 
characters cannot be in those multiple bytes.  All of the syntactically 
significant characters in words, items and lines are ASCII and thus the 
coding cannot be embedded in those characters.

You can use (null-free) UTF8 as a key in arrays.  You can use it with 
'=', offset and 'contains', I think, as long as the strings are correct 
UTF8.  If caseSensitive applies to only ASCII characters, then that can 
be true or false.

But since each char is 1 to 4 bytes, the easiest way to get the char 
count is to assume BMP (no surrogates) and convert to UTF16 and half 
the length.

UTF8 has no byte-order, so it can move among OSes without BOM 
consideration.

So, for some types of processing, using UTF8 might be better than host 
UTF16.

Dar

-- 
**********************************************
     DSC (Dar Scott Consulting & Dar's Lab)
     http://www.swcp.com/dsc/
     Programming and software
**********************************************



More information about the use-livecode mailing list