Unicode

Fraser Gordon fraser.gordon at livecode.com
Mon Jan 26 06:42:25 EST 2015


On 26 Jan 2015, at 02:15, Peter Haworth <pete at lcsql.com> wrote:

> Thanks Peter.  If that's the case, I'm not seeing much in the way of a
> coding advantage over pre 7.0.  Sounds like using textEncode/textDecode
> instaed of uniencode/unidecode?

Assuming you have UTF-8 encoded data from a source outside LiveCode:

local tUTF8Data	— This is binary data
local tString		— This is a textual string
put textDecode(tUTF8Data, “UTF-8”) into tString

The important difference is that uniEncode becomes textDecode - because you are decoding some binary data to text. 

The big difference between 7.0 and previous versions is that Unicode text works everywhere - you don’t need to use special Unicode properties or commands any more.

> 
> That does answer another question I had though which is what is needed if
> the database is UTF-16 encoded.  Sounds like nothing needs to be done.  I
> guess I'll have to set up some tests.

If your external data is UTF-16 you still need to textDecode it - if you don’t, it will treat the data as 8-bit text and you’ll get corrupted text back. This 8-bit default is necessary from a backwards compatibility point of view - if we changed it to accept UTF-16 by default, anyone who gets text from an external source and doesn’t textDecode it will suddenly find that their stacks don’t work.

One way of looking at things is that all external interfaces (files, processes, etc) return binary data and you need to do something to turn that into text (textDecode) and you need to turn your text into binary data when writing to them (textEncode). By using something like UTF-8 as an encoding, it also avoids the problems that occur because the “native” encoding differs between our platforms - it is MacRoman on OSX, CP1252 on Windows and ISO-8859-1 on Linux.

Regards,
Fraser





More information about the use-livecode mailing list