character sets- missing feature?

Dar Scott dsc at swcp.com
Wed Oct 1 17:08:00 EDT 2003


On Wednesday, October 1, 2003, at 12:05 PM, Alex Rice wrote:

>> I suspect the mapping to unicode is straight-forward.  You might be 
>> able to do this yourself.  One of my references suggested that for 
>> many encodings all you have to do is change the high byte.  I am 
>> pretty sure this is an oversimplification, but it does indicate that 
>> it may not require a table for random mapping.
>
> Really? I am kind of a character encoding neophyte- can you hook me up 
> with a reference or a couple of lines of code if you have something  
> in mind?

I was wrong.  You will need a small table.

This is a handy reference:

    http://www.cs.tut.fi/~jkorpela/iso8859/


Note:  "Upper half" means A0 (hex) and up.  A mapping table is 
available.

I'd use useUnicode and numToChar() to get the host byte order right.

It might be almost as easy to make a conversion function for the entire 
ISO family.

Off the top of my head, typed directly into mail:

function ISO8859ToUnicode s member
    local unicodeResult, upperHalfTable
    buildISO8859Table upperHalfTable, member
    repeat for each char c in s
       set useUnicode to false
       put put charToNum(c) into code  -- one-byte charToNum()
       if code <= 127 then
          set useUnicode to true
          put numToChar(code) after unicodeResult  -- two-byte 
numToChar()
       else
          if code >= 160 then
             put upperHalfTable[c] after unicodeResult
          else
             throw "bad ISO8859 char!"
          end if
       end if
    end repeat
    return unicodeResult
end ISO8859ToUnicode

Just looking at this, I can see better ways to do this, but you get the 
idea.

I left buildISO8859Table to you.  Remember both keys and elements can 
be binary, so if you want (and I implied above) you can have one-byte 
keys and two-byte elements instead of code keys and elements.  You 
might let empty (no entry) mean an error.  If you do that, you can move 
those 'if's above to the table making handler.  The table building will 
need to get the host ordering right if codes are not used.

I just learning about unicode and ISO-8859, so watch out.

Dar Scott









More information about the use-livecode mailing list