working with unicodeFormattedText

Dar Scott dsc at swcp.com
Mon Jun 10 16:52:15 EDT 2013


I neglected to explain why.

The short "why" is that what you get from unicodeText is UTF-16 (16-bit characters, mostly) in native byte order, that is, the order the computer likes.  Those same characters can be represented in UTF-8, which is nice for text that is mostly ASCII, is robust concerning byte-order issues, is efficient in memory needs (but not compressed) and yet can represent all of Unicode.  LiveCode strings (in the current version) are really just byte sequences we interpret as characters.  Each Unicode character we rip out of a field is two bytes.  

Dar

On Jun 10, 2013, at 2:45 PM, Dar Scott wrote:

> Try this.
> 
> To put a UTF-8 string into the field...
>  set the unicodeText of field "unicodeText" to uniEncode(UTF8String,"UTF8")
> 
> To get a UTF-8 string from the field...
>  put uniDecode(  the unicodeText of field "Unicode Text", "UTF8"  ) into UTF8String
> 
> I combined two operations into single lines; I hope that doesn't obscure things.
> 
> Dar
> 
> 
> 
> On Jun 10, 2013, at 2:30 PM, Dr. Hawkins wrote:
> 
>> On Mon, Jun 10, 2013 at 12:37 PM, Dar Scott <dsc at swcp.com> wrote:
>> 
>>> That sequence is not really invalid UTF-8.  NUL-SPACE is valid in a strict
>>> sense, just unlikely.  However, it does look very much like UTF-16BE.
>>> 
>> 
>> It came, at some point, from a mac keyboard, hung around in an openoffice
>> spreadsheet, and now I'm cutting & pasting into a field that processes it.
>> 
>> 
>>> You need to convert this to UTF-8 using uniDecode().  The property
>>> unicodeFormattedText will give you UTF-16 in native ordering.
>>> 
>> 
>> I tried
>>    put unidecode(fld "newAbrevs", "UTF8") into theData
>> 
>> and get the same error.  similarly for
>> 
>>        put unidecode(fld "newAbrevs") into theData
>> 
>> 
>> (And unicodeFormattedText will insert extra line-ends.  If you don't want
>>> that, use unicodeText.)
>>> 
>> 
>> All I really want to do is stay utf8 from start to finish :)
>> 
>> And what's in the DB needs to be directly usable by openoffice and the like
>> without any pre-processing.
>> 
>> Is there some way that everything pasted in would automatically be
>> converted from the host system character set (mac/windows/linux) to UTF8?
>> 
>> -- 
>> Dr. Richard E. Hawkins, Esq.
>> (702) 508-8462
>> _______________________________________________
>> use-livecode mailing list
>> use-livecode at lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
> 
> 
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode





More information about the use-livecode mailing list