Unicode (was Getting Kanji from a .csv file)

Richmond richmondmathewson at gmail.com
Sat Jun 8 13:47:08 EDT 2013


Why do I have a funny feeling you are over-complicating things?

Let's start with a few questions:

1. "Queen's English": well as Queen Mary IV and III ( 
http://www.jacobite.ca/kings/mary4.htm)
   couldn't get her head round any English to save her life, that must 
be a serious problem . . . LOL.

2. Here, in Bulgaria, everybody who works with computers has English (of 
some sort), and a lot of
databases function in a Latinised form of Bulgarian (not very 
linguistically satisfactory, but functional as per databases).

3. If you are going to target users whose language uses characters that 
require double-byte encoding
that is when the "fun" starts; and as Livecode seems only to function 
with Unicode characters in the
first linguistic plane some are going to be forever inaccessible 
regardless of what you do.

4.  Work out which language groups you are going to target first, 
instead of working unneccessarily
to provide something that will be "all things to all men" when, 
possibly, only a subset of "all men" are
likely to be using it.

Having been messing around with Unicode for about 5 years I would have 
thought the safest way to haddle data is quite different;
as charToNum can find the unique Unicode address of each character and 
numToChar can find the character again which not just
convert all your data into a series of delimited unicode addresses?

Data can then be stored in exactly the same fashion regardless of which 
language it was entered into.

e.g.  2339,2337,2351 can be stored and manipulated just as easily as 
41,37,62, where the first set is 3 Hindi chars, while the second
is a set of Latin chars. You could 'pad' small numbers like this 
002339,002337,002351 and 000041,000037,000062 to aid searching.

Richmond.

On 06/08/2013 08:20 PM, Peter Haworth wrote:
> I apologize up front for being particularly clueless on this whole
> character encoding concept.  I'm still trying to adjust to speaking
> American English as opposed to the Queen's English so not too suprising I'm
> not grasping unicode too well!
>
> I understand the concepts and the use of uniencode and unidecode but I
> don;t understand when I need to care.
>
> I'll use my SQLiteAdmin program as an example.  It provides schema
> maintenance and data browsing/update features for SQLite databases and uses
> most of the standard LC controls, including datagrids.  Users can enter
> data into it and have it used to INSERT, UPDATE, or DELETE rows.  They can
> also type in SELECT criteria and have the qualifying data displayed in
> field and datagrid controls. Currently, there is no attempt to do any
> encoding or decoding of data.
>
> On my computers here in the USA, I've never had any issues using it on any
> of my databases, but I've never tried to access one whose contents weren't
> in American English..
>
> Now let's say someone in a country whose language requires the use of
> unicode encoding purchases the program.  WIll it work OK for that person in
> terms of entering data into the controls and displaying data in the
> controls from their database, assuming that the database contains UTF8
> encoded data?  Or do I have to uniencode/decode to ensure things work right?
>
> Now let's say the database is using UTF16 encoding, or anything other than
> UTF8.  I can detect that situation in the database and I think I would need
> to use uniencode/decode to deal with it?
>
> Now the user takes his UTF8 database and puts it on a colleague's computer
> here in the USA with the computer's language settings set to American
> English.  I would then need to decode/encode.... I think.
>
>  From the original thread, it seems clear that when I import data into the
> database via SQLiteAdmin, I do need to be aware of the encoding in the
> imported file and that there may be a way to detect that within the file
> depending on how it was produced. Conversely, when I export data, I should
> try to create the same marker in the file.
>
> And finally, is the simplest way to take care of this to simply
> uniencode/decode everything using the databases encoding without regard as
> to whether that's necessary or not?
>
> Pete
> lcSQL Software <http://www.lcsql.com>
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode





More information about the use-livecode mailing list