formattedText and Unicode
Richard Gaskin
ambassador at fourthworld.com
Sun Aug 9 22:06:27 EDT 2009
Phil Davis wrote:
> Don't know if this will help, but Klaus posted a response to Ken Ray in
> "Re: Detecting UTF-8 Encoded Files" on 7 Aug. It contains helpful hints
> about detecting what Unicode file format you're dealing with - I don't
> know if the tips work universally, but maybe that's a starting place.
That was just what I needed. Well, mostly anyway. Thanks to Mark
Waddingham, Klaus, and Mark Smith for his swapBytes function, now I have
some progress here.
The code posted below is as far as I've gotten. It displays every test
file on my drive almost perfectly, including UTF8 and UTF16 in both big-
and little-endian.
Two challenges remain:
While the glyphs appear to be good, the line spacing is way off.
Looking at the same files in TextEdit shows a lot of blank lines, but in
the Rev field they're all bunched up together.
And second, I've found no way to get the formattedText in any form that
looks usable. :(
Any tips on those would be much appreciated. Thanks again for the code
examples that got me this far.
--
Richard Gaskin
Fourth World
Revolution training and consulting: http://www.fourthworld.com
Webzine for Rev developers: http://www.revjournal.com
----------------------------------------------------------
on mouseUp
answer file "Selecf a file:"
if it is empty then exit to top
put url ("binfile:"&it) into tData
set the unicodeText of fld 1 to RawDataToUTF16(tData)
end mouseUp
function RawDataToUTF16 pData
-- Examine the data to determine encoding:
switch
case charToNum(byte 1 of pData) = 0
put "UTF16BE" into tTextEncoding
break
case charToNum(byte 1 of pData) = 0xFE and charToNum(char 2 of pData)
= 0xFF
delete byte 1 to 2 of pData
put "UTF16BE" into tTextEncoding
break
case charToNum(byte 1 of pData) = 0xFF and charToNum(char 2 of pData)
= 0xFE
delete byte 1 to 2 of pData
put "UTF16LE" into tTextEncoding
break
default
put "UTF8" into tTextEncoding
break
end switch
--
if tTextEncoding begins with "UTF16" then
-- Check byte order, swapping if needed:
if the processor is "x86" then
put "LE" into tHostByteOrder
else
put "BE" into tHostByteOrder
end if
if byte -2 to -1 of tTextEncoding <> tHostByteOrder then
put swapbytes(pData) into pData
end if
-- Already utf16, so nothing more needs to be done:
put pData into tFieldData
else
-- Convert from utf8 to Rev's native utf16:
put uniEncode(pData, "UTF8") into tFieldData
end if
--
return tFieldData
end RawDataToUTF16
function swapBytes pString
repeat with n = 1 to length(pString) - 1 step 2
put byte n+1 of pString & byte n of pString after swappedString
end repeat
return swappedString
end swapBytes
More information about the use-livecode
mailing list