Livecode and UTF8

Simon Knight simon at smknight.co.uk
Tue Jul 28 20:29:59 CEST 2015


Hi,

I have an app that works with old fashioned text i.e. characters with a 
code value of less than 128.  Recently I enabled cut and paste and the 
app gets confused it text is pasted in with character values > 127.  I 
have done the obvious and an filtering out all characters with an Ascii 
value of >127 but in the longer term I want to convert a few high bit 
characters to a low bit versions e.g. smart quotes to dumb quotes.

Some of the text that gets pasted from my email client is in UTF8. I 
have done some web research and now know a little about UTF8.  I have 
written a routine that captures any UTF8 code patterns and passes the 
UTF string to a routine for conversion.

A UTF8 string may be between one and four bytes long and every byte has 
a value greater than 127.  I wish to extract the UTF character value and 
use the value to do the conversion.  My question is: does livecode have 
any method of converting  a UTF8 character string to either a UTF16 
string or to the numeric value of the character which I believe is the 
same if leading zeros are ignored?  For instance a smart open quote 
appears in my data as a series of three bytes : [E2-hex,80-hex,9C-hex] 
the numeric value of the character is encoded within the bits of the 
three bytes and will take some bit shifting to extract : the UTF8 string 
decodes to 201C-hex or 8220 base 10.

At present I am working with Livecode 6.7 and have read about and tried 
the uniEncode and uniDecode functions.  The description of these 
functions does not make any sense to me as they seem to be about adding 
or removing every other byte which can't work with UTF8.

I have tried various versions of the following button code attempting to 
get a result of 8220 base 10:

on mouseUp
      put numToChar(226) into tString -- E2 hex
      put numToChar(128) after tString -- 80 hex
      put numToChar(156) after tString -- 9C hex

      Set the UseUniCode to true

      put "source string :" &  tstring

      put uniDecode(tString,"UTF8") into tResult

      put CharToNum(tResult) into tNumberResult  -- seeking value 8220 
in base 10

end mouseUp

So do I have to knuckle down and start bit shifting?

thanks for reading
Simon



More information about the use-livecode mailing list