formattedText and Unicode

Richard Gaskin ambassador at fourthworld.com
Sun Aug 9 22:06:27 EDT 2009


Phil Davis wrote:
> Don't know if this will help, but Klaus posted a response to Ken Ray in 
> "Re: Detecting UTF-8 Encoded Files" on 7 Aug. It contains helpful hints 
> about detecting what Unicode file format you're dealing with - I don't 
> know if the tips work universally, but maybe that's a starting place.

That was just what I needed.  Well, mostly anyway.  Thanks to Mark 
Waddingham, Klaus, and Mark Smith for his swapBytes function, now I have 
some progress here.

The code posted below is as far as I've gotten.  It displays every test 
file on my drive almost perfectly, including UTF8 and UTF16 in both big- 
and little-endian.

Two challenges remain:

While the glyphs appear to be good, the line spacing is way off. 
Looking at the same files in TextEdit shows a lot of blank lines, but in 
the Rev field they're all bunched up together.

And second, I've found no way to get the formattedText in any form that 
looks usable. :(

Any tips on those would be much appreciated.  Thanks again for the code 
examples that got me this far.

--
  Richard Gaskin
  Fourth World
  Revolution training and consulting: http://www.fourthworld.com
  Webzine for Rev developers: http://www.revjournal.com

----------------------------------------------------------

on mouseUp
   answer file "Selecf a file:"
   if it is empty then exit to top
   put url ("binfile:"&it) into tData
   set the unicodeText of fld 1 to RawDataToUTF16(tData)
end mouseUp


function RawDataToUTF16 pData
   -- Examine the data to determine encoding:
   switch
   case charToNum(byte 1 of pData) = 0
     put "UTF16BE" into tTextEncoding
     break
   case charToNum(byte 1 of pData) = 0xFE and charToNum(char 2 of pData) 
= 0xFF
     delete byte 1 to 2 of pData
     put "UTF16BE" into tTextEncoding
     break
   case charToNum(byte 1 of pData) = 0xFF and charToNum(char 2 of pData) 
= 0xFE
     delete byte 1 to 2 of pData
     put "UTF16LE" into tTextEncoding
     break
   default
     put "UTF8" into tTextEncoding
     break
   end switch
   --
   if tTextEncoding begins with "UTF16" then
     -- Check byte order, swapping if needed:
     if the processor is "x86" then
       put "LE" into tHostByteOrder
     else
       put "BE" into tHostByteOrder
     end if
     if byte -2 to -1 of tTextEncoding <> tHostByteOrder then
       put swapbytes(pData) into pData
     end if
     -- Already utf16, so nothing more needs to be done:
     put pData into tFieldData
   else
     -- Convert from utf8 to Rev's native utf16:
     put uniEncode(pData, "UTF8") into tFieldData
   end if
   --
   return tFieldData
end RawDataToUTF16


function swapBytes pString
   repeat with n = 1 to length(pString) - 1 step 2
     put byte n+1 of pString & byte n of pString after swappedString
   end repeat
   return swappedString
end swapBytes



More information about the use-livecode mailing list