Unicode (was Getting Kanji from a .csv file)

Dar Scott dsc at swcp.com
Sat Jun 8 19:14:08 EDT 2013


I'm guessing that most installations, maybe the vast majority, would have encoding set up for UTF-8.  (You will have to query SQLite experts on how many people really use an encoding other than UTF-8.)

When (and if) you decide to support UTF-16 (native endian), UTF-16BE and UTF-16LE, it should be straightforward.  Maybe by then uniEncode and uniDecode will have endian options and it will be even easier.  This will save the need for a dialog box.  

Dar

On Jun 8, 2013, at 4:41 PM, Peter Haworth wrote:

> Thanks Dar, I think I get the picture now.  I'll stick with UTF8 for now.
> 
> 
> 
> Pete
> lcSQL Software <http://www.lcsql.com>
> 
> 
> On Sat, Jun 8, 2013 at 11:11 AM, Dar Scott <dsc at swcp.com> wrote:
> 
>> I encourage you to go to full Unicode.
>> 
>> That means (for now) using the unicodeText to get text into and out of a
>> field.
>> 
>> Then convert that to UTF8 and back for a database with UTF-8 encoding.
>> 
>> (And at this point you can say that your program only works with UTF-8
>> encoding set for SQLite.)
>> 
>> It is my understanding that SQLite does not have any lossy encodings.
>> That is, you don't lose anything by saving to the db.  That is, all
>> encodings are Unicode.  My comments are based on that.
>> 
>> You probably can't reliably move a db with UTF-16 encoding from one
>> machine to another.
>> 
>> Since your program is general, you probably want to accommodate db with
>> UTF-16, UTF-16LE, and UTF-16BE.
>> 
>> I'm guessing you can store the Unicode you get from the unicodeText
>> directly for UTF-16.
>> 
>> For the others you might have to byte swap.
>> 
>> Long ago I made a enhancement suggestion to include UTF-16LE and UTF-16BE
>> in uniEncode and uniDecode.  I don't think it is there, so you will have to
>> do it yourself.
>> 
>> Essentially, you see if what is native for your machine matches your
>> target encoding.  If not swap.  To see if the chars are stored little
>> endian or not ...
>> 
>> Gotta run.
>> 
>> Dar
>> 
>> 
>> On Jun 8, 2013, at 11:20 AM, Peter Haworth wrote:
>> 
>>> I apologize up front for being particularly clueless on this whole
>>> character encoding concept.  I'm still trying to adjust to speaking
>>> American English as opposed to the Queen's English so not too suprising
>> I'm
>>> not grasping unicode too well!
>>> 
>>> I understand the concepts and the use of uniencode and unidecode but I
>>> don;t understand when I need to care.
>>> 
>>> I'll use my SQLiteAdmin program as an example.  It provides schema
>>> maintenance and data browsing/update features for SQLite databases and
>> uses
>>> most of the standard LC controls, including datagrids.  Users can enter
>>> data into it and have it used to INSERT, UPDATE, or DELETE rows.  They
>> can
>>> also type in SELECT criteria and have the qualifying data displayed in
>>> field and datagrid controls. Currently, there is no attempt to do any
>>> encoding or decoding of data.
>>> 
>>> On my computers here in the USA, I've never had any issues using it on
>> any
>>> of my databases, but I've never tried to access one whose contents
>> weren't
>>> in American English..
>>> 
>>> Now let's say someone in a country whose language requires the use of
>>> unicode encoding purchases the program.  WIll it work OK for that person
>> in
>>> terms of entering data into the controls and displaying data in the
>>> controls from their database, assuming that the database contains UTF8
>>> encoded data?  Or do I have to uniencode/decode to ensure things work
>> right?
>>> 
>>> Now let's say the database is using UTF16 encoding, or anything other
>> than
>>> UTF8.  I can detect that situation in the database and I think I would
>> need
>>> to use uniencode/decode to deal with it?
>>> 
>>> Now the user takes his UTF8 database and puts it on a colleague's
>> computer
>>> here in the USA with the computer's language settings set to American
>>> English.  I would then need to decode/encode.... I think.
>>> 
>>>> From the original thread, it seems clear that when I import data into
>> the
>>> database via SQLiteAdmin, I do need to be aware of the encoding in the
>>> imported file and that there may be a way to detect that within the file
>>> depending on how it was produced. Conversely, when I export data, I
>> should
>>> try to create the same marker in the file.
>>> 
>>> And finally, is the simplest way to take care of this to simply
>>> uniencode/decode everything using the databases encoding without regard
>> as
>>> to whether that's necessary or not?
>>> 
>>> Pete
>>> lcSQL Software <http://www.lcsql.com>
>>> _______________________________________________
>>> use-livecode mailing list
>>> use-livecode at lists.runrev.com
>>> Please visit this url to subscribe, unsubscribe and manage your
>> subscription preferences:
>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>> 
>> 
>> _______________________________________________
>> use-livecode mailing list
>> use-livecode at lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your
>> subscription preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
>> 
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode





More information about the use-livecode mailing list