Livecode and UTF8

Wed Jul 29 20:52:16 EDT 2015

Simon

I did not reply sooner as I’m not such an experienced LiveCoder. Also, I am concentrating on learning LiveCode versions from 7 onwards. Handling Unicode is much, much easier in those versions. 

In the code you posted here, you appear to have used uniDecode where you needed to use uniEncode.

I ran this in the message box in LiveCode 8:

put numToChar(226) into tString -- E2 hex
put numToChar(128) after tString -- 80 hex
put numToChar(156) after tString -- 9C hex
put  unicode uniEncode(tString, "UTF8”)

It successfully displayed this:

“

I am not familiar with using Unicode in the older versions but noticed that you typed “useUniCode” instead of “useUnicode” in the code example in your message.

I hope this helps to get you started.

Peter

> On 29 Jul 2015, at 02:29, Simon Knight <simon at smknight.co.uk> wrote:
> 
> Hi,
> 
> I have an app that works with old fashioned text i.e. characters with a code value of less than 128.  Recently I enabled cut and paste and the app gets confused it text is pasted in with character values > 127.  I have done the obvious and an filtering out all characters with an Ascii value of >127 but in the longer term I want to convert a few high bit characters to a low bit versions e.g. smart quotes to dumb quotes.
> 
> Some of the text that gets pasted from my email client is in UTF8. I have done some web research and now know a little about UTF8.  I have written a routine that captures any UTF8 code patterns and passes the UTF string to a routine for conversion.
> 
> A UTF8 string may be between one and four bytes long and every byte has a value greater than 127.  I wish to extract the UTF character value and use the value to do the conversion.  My question is: does livecode have any method of converting  a UTF8 character string to either a UTF16 string or to the numeric value of the character which I believe is the same if leading zeros are ignored?  For instance a smart open quote appears in my data as a series of three bytes : [E2-hex,80-hex,9C-hex] the numeric value of the character is encoded within the bits of the three bytes and will take some bit shifting to extract : the UTF8 string decodes to 201C-hex or 8220 base 10.
> 
> At present I am working with Livecode 6.7 and have read about and tried the uniEncode and uniDecode functions.  The description of these functions does not make any sense to me as they seem to be about adding or removing every other byte which can't work with UTF8.
> 
> I have tried various versions of the following button code attempting to get a result of 8220 base 10:
> 
> on mouseUp
>     put numToChar(226) into tString -- E2 hex
>     put numToChar(128) after tString -- 80 hex
>     put numToChar(156) after tString -- 9C hex
> 
>     Set the UseUniCode to true
> 
>     put "source string :" &  tstring
> 
>     put uniDecode(tString,"UTF8") into tResult
> 
>     put CharToNum(tResult) into tNumberResult  -- seeking value 8220 in base 10
> 
> end mouseUp
> 
> So do I have to knuckle down and start bit shifting?
> 
> thanks for reading
> Simon
> 
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode <http://lists.runrev.com/mailman/listinfo/use-livecode>

“
LEFT DOUBLE QUOTATION MARK
Unicode: U+201C, UTF-8: E2 80 9C

ƒ
LATIN SMALL LETTER F WITH HOOK
Unicode: U+0192, UTF-8: C6 92