Livecode and UTF8
simon at smknight.co.uk
Tue Jul 28 20:29:59 CEST 2015
I have an app that works with old fashioned text i.e. characters with a
code value of less than 128. Recently I enabled cut and paste and the
app gets confused it text is pasted in with character values > 127. I
have done the obvious and an filtering out all characters with an Ascii
value of >127 but in the longer term I want to convert a few high bit
characters to a low bit versions e.g. smart quotes to dumb quotes.
Some of the text that gets pasted from my email client is in UTF8. I
have done some web research and now know a little about UTF8. I have
written a routine that captures any UTF8 code patterns and passes the
UTF string to a routine for conversion.
A UTF8 string may be between one and four bytes long and every byte has
a value greater than 127. I wish to extract the UTF character value and
use the value to do the conversion. My question is: does livecode have
any method of converting a UTF8 character string to either a UTF16
string or to the numeric value of the character which I believe is the
same if leading zeros are ignored? For instance a smart open quote
appears in my data as a series of three bytes : [E2-hex,80-hex,9C-hex]
the numeric value of the character is encoded within the bits of the
three bytes and will take some bit shifting to extract : the UTF8 string
decodes to 201C-hex or 8220 base 10.
At present I am working with Livecode 6.7 and have read about and tried
the uniEncode and uniDecode functions. The description of these
functions does not make any sense to me as they seem to be about adding
or removing every other byte which can't work with UTF8.
I have tried various versions of the following button code attempting to
get a result of 8220 base 10:
put numToChar(226) into tString -- E2 hex
put numToChar(128) after tString -- 80 hex
put numToChar(156) after tString -- 9C hex
Set the UseUniCode to true
put "source string :" & tstring
put uniDecode(tString,"UTF8") into tResult
put CharToNum(tResult) into tNumberResult -- seeking value 8220
in base 10
So do I have to knuckle down and start bit shifting?
thanks for reading
More information about the use-livecode