Unicode

Peter Haworth pete at lcsql.com
Mon Jan 26 19:35:39 EST 2015


Well I guess I spoke too soon :-)  When I said I had things working, I
meant I could successfully get data from a UTF8 database and display it
correctly.

I'm now trying to get input from field controls and get it into the
database.  I found a lorem ipsum generator that would create text in
various languages to I got some Russian text from it and pasted it into an
LC field.

In my handler, I need to put the contents of the field into a variable and
then hand it off from there to an INSERT statement. I've tried every
combination of unicodeText, uniencode, unidecode, or none of the above to
get the correct value into the variable but it either ends up as question
marks or something that looks nothing like the characters in the field.

This is all with pre 7.0.  I think I'm beginning to understand why 7.0 is a
lot better to use than pre 7.0 when heavy unicode handling is needed!

But in the meantime, how should I be handling the above situation in pre
7.0?

Pete
lcSQL Software <http://www.lcsql.com>
Home of lcStackBrowser <http://www.lcsql.com/lcstackbrowser.html> and
SQLiteAdmin <http://www.lcsql.com/sqliteadmin.html>

On Mon, Jan 26, 2015 at 9:01 AM, Devin Asay <devin_asay at byu.edu> wrote:

>
> On Jan 26, 2015, at 4:42 AM, Fraser Gordon <fraser.gordon at livecode.com>
> wrote:
>
> >
> > On 26 Jan 2015, at 02:15, Peter Haworth <pete at lcsql.com> wrote:
> >
> >> Thanks Peter.  If that's the case, I'm not seeing much in the way of a
> >> coding advantage over pre 7.0.  Sounds like using textEncode/textDecode
> >> instaed of uniencode/unidecode?
> >
> > Assuming you have UTF-8 encoded data from a source outside LiveCode:
> >
> > local tUTF8Data       — This is binary data
> > local tString         — This is a textual string
> > put textDecode(tUTF8Data, “UTF-8”) into tString
> >
> > The important difference is that uniEncode becomes textDecode - because
> you are decoding some binary data to text.
> >
> > The big difference between 7.0 and previous versions is that Unicode
> text works everywhere - you don’t need to use special Unicode properties or
> commands any more.
> >
> >>
> >> That does answer another question I had though which is what is needed
> if
> >> the database is UTF-16 encoded.  Sounds like nothing needs to be done.
> I
> >> guess I'll have to set up some tests.
> >
> > If your external data is UTF-16 you still need to textDecode it - if you
> don’t, it will treat the data as 8-bit text and you’ll get corrupted text
> back. This 8-bit default is necessary from a backwards compatibility point
> of view - if we changed it to accept UTF-16 by default, anyone who gets
> text from an external source and doesn’t textDecode it will suddenly find
> that their stacks don’t work.
> >
> > One way of looking at things is that all external interfaces (files,
> processes, etc) return binary data and you need to do something to turn
> that into text (textDecode) and you need to turn your text into binary data
> when writing to them (textEncode). By using something like UTF-8 as an
> encoding, it also avoids the problems that occur because the “native”
> encoding differs between our platforms - it is MacRoman on OSX, CP1252 on
> Windows and ISO-8859-1 on Linux.
> >
> > Regards,
> > Fraser
>
>
> It would be great if there were a stack property we could set that would
> specify what format outputted text would be. The default could be “native”;
> i.e., the native encoding for the platform, but then we could set it to
> things like “utf8” or “utf16” or “ISO”. It would essentially do the
> textEncode/decode for us.
>
> Is this an idea that appeals to folks here?
>
> Devin
>
>
> Devin Asay
> Office of Digital Humanities
> Brigham Young University
>
>
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
>



More information about the use-livecode mailing list