Unicode (was Getting Kanji from a .csv file)
pete at lcsql.com
Sat Jun 8 18:39:52 EDT 2013
On Sat, Jun 8, 2013 at 10:47 AM, Richmond <richmondmathewson at gmail.com>wrote:
> Why do I have a funny feeling you are over-complicating things?
I don't know but I hope you're right!
> Let's start with a few questions:
> 1. "Queen's English": well as Queen Mary IV and III (
> couldn't get her head round any English to save her life, that must be a
> serious problem . . . LOL.
Did she speak a unicode dialect of Bavarian?
> 2. Here, in Bulgaria, everybody who works with computers has English (of
> some sort), and a lot of
> databases function in a Latinised form of Bulgarian (not very
> linguistically satisfactory, but functional as per databases).
> 3. If you are going to target users whose language uses characters that
> require double-byte encoding
> that is when the "fun" starts; and as Livecode seems only to function with
> Unicode characters in the
> first linguistic plane some are going to be forever inaccessible
> regardless of what you do.
OK, I won't worry about that then.
> 4. Work out which language groups you are going to target first, instead
> of working unneccessarily
> to provide something that will be "all things to all men" when, possibly,
> only a subset of "all men" are
> likely to be using it.
Well that's the problem. This isn't an end user application, it's a
general purpose utility that could be purchased by anyone in any country.
> Having been messing around with Unicode for about 5 years I would have
> thought the safest way to haddle data is quite different;
> as charToNum can find the unique Unicode address of each character and
> numToChar can find the character again which not just
> convert all your data into a series of delimited unicode addresses?
> Data can then be stored in exactly the same fashion regardless of which
> language it was entered into.
> e.g. 2339,2337,2351 can be stored and manipulated just as easily as
> 41,37,62, where the first set is 3 Hindi chars, while the second
> is a set of Latin chars. You could 'pad' small numbers like this
> 002339,002337,002351 and 000041,000037,000062 to aid searching.
> Yeah but this isn't data that I have any control over, it's other peoples'
databases used in their applications so it's already in the database in
whatever encoding they've chosen.
> On 06/08/2013 08:20 PM, Peter Haworth wrote:
>> I apologize up front for being particularly clueless on this whole
>> character encoding concept. I'm still trying to adjust to speaking
>> American English as opposed to the Queen's English so not too suprising
>> not grasping unicode too well!
>> I understand the concepts and the use of uniencode and unidecode but I
>> don;t understand when I need to care.
>> I'll use my SQLiteAdmin program as an example. It provides schema
>> maintenance and data browsing/update features for SQLite databases and
>> most of the standard LC controls, including datagrids. Users can enter
>> data into it and have it used to INSERT, UPDATE, or DELETE rows. They can
>> also type in SELECT criteria and have the qualifying data displayed in
>> field and datagrid controls. Currently, there is no attempt to do any
>> encoding or decoding of data.
>> On my computers here in the USA, I've never had any issues using it on any
>> of my databases, but I've never tried to access one whose contents weren't
>> in American English..
>> Now let's say someone in a country whose language requires the use of
>> unicode encoding purchases the program. WIll it work OK for that person
>> terms of entering data into the controls and displaying data in the
>> controls from their database, assuming that the database contains UTF8
>> encoded data? Or do I have to uniencode/decode to ensure things work
>> Now let's say the database is using UTF16 encoding, or anything other than
>> UTF8. I can detect that situation in the database and I think I would
>> to use uniencode/decode to deal with it?
>> Now the user takes his UTF8 database and puts it on a colleague's computer
>> here in the USA with the computer's language settings set to American
>> English. I would then need to decode/encode.... I think.
>> From the original thread, it seems clear that when I import data into the
>> database via SQLiteAdmin, I do need to be aware of the encoding in the
>> imported file and that there may be a way to detect that within the file
>> depending on how it was produced. Conversely, when I export data, I should
>> try to create the same marker in the file.
>> And finally, is the simplest way to take care of this to simply
>> uniencode/decode everything using the databases encoding without regard as
>> to whether that's necessary or not?
>> lcSQL Software <http://www.lcsql.com>
>> use-livecode mailing list
>> use-livecode at lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your
>> subscription preferences:
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
More information about the Use-livecode