Unicode

Fri Jan 30 07:44:42 EST 2015

Leaving the Caribbean alone for a bit ... Fraser wrote a couple of blog posts
on Unicode back in March and April 2014:
http://livecode.com/blog/2014/03/31/examining-unicode-part-i-the-dissection/
http://livecode.com/blog/2014/04/02/examining-unicode-part-ii-digesting-text/

The second of these posts finishes with this information on identifying what
encoding has been used:

"..Unfortunately, the URL syntax does not offer the same convenience. It
can, however, auto-detect the correct encoding to use in some circumstances:
when reading from a file URL, the beginning of the file is examined for a
“byte order mark” that specifies the encoding of the text. It also uses the
encoding returned by the web server when HTTP URLs are used. If the encoding
is not recognised, it assumes the platform’s native text encoding is used.
As the native encodings do not support Unicode, it is usually better to be
explicit when writing to files, etc. 

An an aside, we are hoping to improve the URL syntax in order to allow for
the same auto-conversion but have not yet settled on what it will be."

Fraser Gordon-3 wrote
> On 26 Jan 2015, at 02:15, Peter Haworth <

> pete@

> > wrote:
> 
>> Thanks Peter.  If that's the case, I'm not seeing much in the way of a
>> coding advantage over pre 7.0.  Sounds like using textEncode/textDecode
>> instaed of uniencode/unidecode?
> 
> Assuming you have UTF-8 encoded data from a source outside LiveCode:
> 
> local tUTF8Data	— This is binary data
> local tString		— This is a textual string
> put textDecode(tUTF8Data, “UTF-8”) into tString
> 
> The important difference is that uniEncode becomes textDecode - because
> you are decoding some binary data to text. 
> 
> The big difference between 7.0 and previous versions is that Unicode text
> works everywhere - you don’t need to use special Unicode properties or
> commands any more.
> 
>> 
>> That does answer another question I had though which is what is needed if
>> the database is UTF-16 encoded.  Sounds like nothing needs to be done.  I
>> guess I'll have to set up some tests.
> 
> If your external data is UTF-16 you still need to textDecode it - if you
> don’t, it will treat the data as 8-bit text and you’ll get corrupted text
> back. This 8-bit default is necessary from a backwards compatibility point
> of view - if we changed it to accept UTF-16 by default, anyone who gets
> text from an external source and doesn’t textDecode it will suddenly find
> that their stacks don’t work.
> 
> One way of looking at things is that all external interfaces (files,
> processes, etc) return binary data and you need to do something to turn
> that into text (textDecode) and you need to turn your text into binary
> data when writing to them (textEncode). By using something like UTF-8 as
> an encoding, it also avoids the problems that occur because the “native”
> encoding differs between our platforms - it is MacRoman on OSX, CP1252 on
> Windows and ISO-8859-1 on Linux.
> 
> Regards,
> Fraser
> 
> 
> _______________________________________________
> use-livecode mailing list

> use-livecode at .runrev

> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode

-----
"Some are born coders, some achieve coding, and some have coding thrust upon them." - William Shakespeare & Hugh Senior

--
View this message in context: http://runtime-revolution.278305.n4.nabble.com/Unicode-tp4688182p4688293.html
Sent from the Revolution - User mailing list archive at Nabble.com.