character sets- missing feature?
Dar Scott
dsc at swcp.com
Wed Oct 1 17:08:00 EDT 2003
On Wednesday, October 1, 2003, at 12:05 PM, Alex Rice wrote:
>> I suspect the mapping to unicode is straight-forward. You might be
>> able to do this yourself. One of my references suggested that for
>> many encodings all you have to do is change the high byte. I am
>> pretty sure this is an oversimplification, but it does indicate that
>> it may not require a table for random mapping.
>
> Really? I am kind of a character encoding neophyte- can you hook me up
> with a reference or a couple of lines of code if you have something
> in mind?
I was wrong. You will need a small table.
This is a handy reference:
http://www.cs.tut.fi/~jkorpela/iso8859/
Note: "Upper half" means A0 (hex) and up. A mapping table is
available.
I'd use useUnicode and numToChar() to get the host byte order right.
It might be almost as easy to make a conversion function for the entire
ISO family.
Off the top of my head, typed directly into mail:
function ISO8859ToUnicode s member
local unicodeResult, upperHalfTable
buildISO8859Table upperHalfTable, member
repeat for each char c in s
set useUnicode to false
put put charToNum(c) into code -- one-byte charToNum()
if code <= 127 then
set useUnicode to true
put numToChar(code) after unicodeResult -- two-byte
numToChar()
else
if code >= 160 then
put upperHalfTable[c] after unicodeResult
else
throw "bad ISO8859 char!"
end if
end if
end repeat
return unicodeResult
end ISO8859ToUnicode
Just looking at this, I can see better ways to do this, but you get the
idea.
I left buildISO8859Table to you. Remember both keys and elements can
be binary, so if you want (and I implied above) you can have one-byte
keys and two-byte elements instead of code keys and elements. You
might let empty (no entry) mean an error. If you do that, you can move
those 'if's above to the table making handler. The table building will
need to get the host ordering right if codes are not used.
I just learning about unicode and ISO-8859, so watch out.
Dar Scott
More information about the use-livecode
mailing list