formattedText and Unicode

Richard Gaskin ambassador at fourthworld.com
Mon Aug 10 00:35:02 EDT 2009


In Mark Wieder's example I had mistakenly thought that he was converting 
Unicode line endings only for his PC-specific storage needs.  Turns out 
that this is needed to display as well (though I'm still not sure why it 
should be necessary).

So here's the latest function for putting binary data from a file into a 
form suitable for tucking into the unicodeText of a field:


function RawDataToUTF16 pData
   -- Examine the data to determine encoding:
   switch
   case charToNum(byte 1 of pData) = 0
     put "UTF16BE" into tTextEncoding
     break
   case charToNum(byte 1 of pData) = 0xFE and charToNum(char 2 of pData) 
= 0xFF
     delete byte 1 to 2 of pData
     put "UTF16BE" into tTextEncoding
     break
   case charToNum(byte 1 of pData) = 0xFF and charToNum(char 2 of pData) 
= 0xFE
     delete byte 1 to 2 of pData
     put "UTF16LE" into tTextEncoding
     break
   default
     put "UTF8" into tTextEncoding
     break
   end switch
   --
   if tTextEncoding begins with "UTF16" then
     -- Check byte order, swapping if needed:
     if the processor is "x86" then
       put "LE" into tHostByteOrder
     else
       put "BE" into tHostByteOrder
     end if
     if byte -2 to -1 of tTextEncoding <> tHostByteOrder then
       put swapbytes(pData) into pData
     end if
     -- Already utf16, so nothing more needs to be done:
     #put uniEncode(uniDecode(pData, utf16),16) into tFieldData
   put pData into tFieldData
   else
     -- Convert from utf8 to Rev's native utf16:
     put uniEncode(pData, "UTF8") into tFieldData
   end if
   --
  replace CRLF with cr in tFieldData
    replace numtochar(13) with cr in tfieldData
   return tFieldData
end RawDataToUTF16


I still don't have a solution for using formattedText with Unicode, but 
may be able to find an algorithm for what I'm doing which bypasses that.

Many thanks are due to Devin Asay.  His summary notes here are a helpful 
introduction to working with Unicode in Rev:
<http://revolution.byu.edu/unicode/unicodeInRev.php>


PS: While having this solution is cool, and seems to reliably handle a 
wider range of files than even TextEdit does in its automatic mode, I 
find myself thinking there should be an easier way to do something as 
simple as putting text into a field.  Maybe Rev 5.0? :)

--
  Richard Gaskin
  Fourth World
  Revolution training and consulting: http://www.fourthworld.com
  Webzine for Rev developers: http://www.revjournal.com




More information about the use-livecode mailing list