Help with Unicode Text

Dar Scott dsc at swcp.com
Mon Mar 28 14:30:09 EST 2005


On Mar 28, 2005, at 12:06 PM, Dan Friedman wrote:

> Anyone know how to replace a return char in a unicode string?

There are two problems with your method.

Well, the first is really a potential problem depending on what you 
want to do.  Do you mean ASCII carriage return?  Or the Revolution 
newline character (coded the same as ASCII line feed)?

The test character is a single byte character.  However, each character 
in a unicode string is two bytes, 16-bit values in host order, that is, 
UTF16.  Even then you can't just convert the character to two bytes for 
the platform and search.  You might match half of one character and 
half of the next.

The pattern for repeating for each unicode character is like this:

-- for each unicode char uc in sBMP
   repeat with i = 1 to length(sBMP)-1 step 2
     put char i to i+1 of sBMP into uc
     -- body
   end repeat

That assumes there are no surrogates.

One way to convert your ASCII test char is this:

   get uniEncode(c,"UTF8")

So, you can go through each unicode character, accumulating values, but 
replacing those that need replacing.

Dar

-- 
**********************************************
     DSC (Dar Scott Consulting & Dar's Lab)
     http://www.swcp.com/dsc/
     Programming Services and Software
**********************************************



More information about the use-livecode mailing list