Unicode (was Getting Kanji from a .csv file)

Dar Scott dsc at swcp.com
Sat Jun 8 14:11:18 EDT 2013


I encourage you to go to full Unicode.

That means (for now) using the unicodeText to get text into and out of a field.

Then convert that to UTF8 and back for a database with UTF-8 encoding.  

(And at this point you can say that your program only works with UTF-8 encoding set for SQLite.)

It is my understanding that SQLite does not have any lossy encodings.  That is, you don't lose anything by saving to the db.  That is, all encodings are Unicode.  My comments are based on that.

You probably can't reliably move a db with UTF-16 encoding from one machine to another.  

Since your program is general, you probably want to accommodate db with UTF-16, UTF-16LE, and UTF-16BE.

I'm guessing you can store the Unicode you get from the unicodeText directly for UTF-16.

For the others you might have to byte swap.  

Long ago I made a enhancement suggestion to include UTF-16LE and UTF-16BE in uniEncode and uniDecode.  I don't think it is there, so you will have to do it yourself.

Essentially, you see if what is native for your machine matches your target encoding.  If not swap.  To see if the chars are stored little endian or not ...

Gotta run.

Dar


On Jun 8, 2013, at 11:20 AM, Peter Haworth wrote:

> I apologize up front for being particularly clueless on this whole
> character encoding concept.  I'm still trying to adjust to speaking
> American English as opposed to the Queen's English so not too suprising I'm
> not grasping unicode too well!
> 
> I understand the concepts and the use of uniencode and unidecode but I
> don;t understand when I need to care.
> 
> I'll use my SQLiteAdmin program as an example.  It provides schema
> maintenance and data browsing/update features for SQLite databases and uses
> most of the standard LC controls, including datagrids.  Users can enter
> data into it and have it used to INSERT, UPDATE, or DELETE rows.  They can
> also type in SELECT criteria and have the qualifying data displayed in
> field and datagrid controls. Currently, there is no attempt to do any
> encoding or decoding of data.
> 
> On my computers here in the USA, I've never had any issues using it on any
> of my databases, but I've never tried to access one whose contents weren't
> in American English..
> 
> Now let's say someone in a country whose language requires the use of
> unicode encoding purchases the program.  WIll it work OK for that person in
> terms of entering data into the controls and displaying data in the
> controls from their database, assuming that the database contains UTF8
> encoded data?  Or do I have to uniencode/decode to ensure things work right?
> 
> Now let's say the database is using UTF16 encoding, or anything other than
> UTF8.  I can detect that situation in the database and I think I would need
> to use uniencode/decode to deal with it?
> 
> Now the user takes his UTF8 database and puts it on a colleague's computer
> here in the USA with the computer's language settings set to American
> English.  I would then need to decode/encode.... I think.
> 
>> From the original thread, it seems clear that when I import data into the
> database via SQLiteAdmin, I do need to be aware of the encoding in the
> imported file and that there may be a way to detect that within the file
> depending on how it was produced. Conversely, when I export data, I should
> try to create the same marker in the file.
> 
> And finally, is the simplest way to take care of this to simply
> uniencode/decode everything using the databases encoding without regard as
> to whether that's necessary or not?
> 
> Pete
> lcSQL Software <http://www.lcsql.com>
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode





More information about the use-livecode mailing list