Unicode

Devin Asay devin_asay at byu.edu
Mon Jan 26 12:01:07 EST 2015


On Jan 26, 2015, at 4:42 AM, Fraser Gordon <fraser.gordon at livecode.com> wrote:

> 
> On 26 Jan 2015, at 02:15, Peter Haworth <pete at lcsql.com> wrote:
> 
>> Thanks Peter.  If that's the case, I'm not seeing much in the way of a
>> coding advantage over pre 7.0.  Sounds like using textEncode/textDecode
>> instaed of uniencode/unidecode?
> 
> Assuming you have UTF-8 encoded data from a source outside LiveCode:
> 
> local tUTF8Data	— This is binary data
> local tString		— This is a textual string
> put textDecode(tUTF8Data, “UTF-8”) into tString
> 
> The important difference is that uniEncode becomes textDecode - because you are decoding some binary data to text. 
> 
> The big difference between 7.0 and previous versions is that Unicode text works everywhere - you don’t need to use special Unicode properties or commands any more.
> 
>> 
>> That does answer another question I had though which is what is needed if
>> the database is UTF-16 encoded.  Sounds like nothing needs to be done.  I
>> guess I'll have to set up some tests.
> 
> If your external data is UTF-16 you still need to textDecode it - if you don’t, it will treat the data as 8-bit text and you’ll get corrupted text back. This 8-bit default is necessary from a backwards compatibility point of view - if we changed it to accept UTF-16 by default, anyone who gets text from an external source and doesn’t textDecode it will suddenly find that their stacks don’t work.
> 
> One way of looking at things is that all external interfaces (files, processes, etc) return binary data and you need to do something to turn that into text (textDecode) and you need to turn your text into binary data when writing to them (textEncode). By using something like UTF-8 as an encoding, it also avoids the problems that occur because the “native” encoding differs between our platforms - it is MacRoman on OSX, CP1252 on Windows and ISO-8859-1 on Linux.
> 
> Regards,
> Fraser


It would be great if there were a stack property we could set that would specify what format outputted text would be. The default could be “native”; i.e., the native encoding for the platform, but then we could set it to things like “utf8” or “utf16” or “ISO”. It would essentially do the textEncode/decode for us.

Is this an idea that appeals to folks here?

Devin


Devin Asay
Office of Digital Humanities
Brigham Young University





More information about the use-livecode mailing list