Unicode (was Getting Kanji from a .csv file)

Peter Haworth pete at lcsql.com
Sat Jun 8 18:39:52 EDT 2013


On Sat, Jun 8, 2013 at 10:47 AM, Richmond <richmondmathewson at gmail.com>wrote:

> Why do I have a funny feeling you are over-complicating things?
>

I don't know but I hope you're right!


>
> Let's start with a few questions:
>
> 1. "Queen's English": well as Queen Mary IV and III (
> http://www.jacobite.ca/kings/**mary4.htm<http://www.jacobite.ca/kings/mary4.htm>
> )
>   couldn't get her head round any English to save her life, that must be a
> serious problem . . . LOL.
>

Did she speak a unicode dialect of Bavarian?

>
> 2. Here, in Bulgaria, everybody who works with computers has English (of
> some sort), and a lot of
> databases function in a Latinised form of Bulgarian (not very
> linguistically satisfactory, but functional as per databases).
>
> 3. If you are going to target users whose language uses characters that
> require double-byte encoding
> that is when the "fun" starts; and as Livecode seems only to function with
> Unicode characters in the
> first linguistic plane some are going to be forever inaccessible
> regardless of what you do.
>

OK, I won't worry about that then.

>
> 4.  Work out which language groups you are going to target first, instead
> of working unneccessarily
> to provide something that will be "all things to all men" when, possibly,
> only a subset of "all men" are
> likely to be using it.
>

Well that's the problem.  This isn't an end user application, it's a
general purpose utility that could be purchased by anyone in any country.

>
> Having been messing around with Unicode for about 5 years I would have
> thought the safest way to haddle data is quite different;
> as charToNum can find the unique Unicode address of each character and
> numToChar can find the character again which not just
> convert all your data into a series of delimited unicode addresses?
>
> Data can then be stored in exactly the same fashion regardless of which
> language it was entered into.
>
> e.g.  2339,2337,2351 can be stored and manipulated just as easily as
> 41,37,62, where the first set is 3 Hindi chars, while the second
> is a set of Latin chars. You could 'pad' small numbers like this
> 002339,002337,002351 and 000041,000037,000062 to aid searching.
>
> Yeah but this isn't data that I have any control over, it's other peoples'
databases used in their applications so it's already in the database in
whatever encoding they've chosen.

Richmond.
>
>
> On 06/08/2013 08:20 PM, Peter Haworth wrote:
>
>> I apologize up front for being particularly clueless on this whole
>> character encoding concept.  I'm still trying to adjust to speaking
>> American English as opposed to the Queen's English so not too suprising
>> I'm
>> not grasping unicode too well!
>>
>> I understand the concepts and the use of uniencode and unidecode but I
>> don;t understand when I need to care.
>>
>> I'll use my SQLiteAdmin program as an example.  It provides schema
>> maintenance and data browsing/update features for SQLite databases and
>> uses
>> most of the standard LC controls, including datagrids.  Users can enter
>> data into it and have it used to INSERT, UPDATE, or DELETE rows.  They can
>> also type in SELECT criteria and have the qualifying data displayed in
>> field and datagrid controls. Currently, there is no attempt to do any
>> encoding or decoding of data.
>>
>> On my computers here in the USA, I've never had any issues using it on any
>> of my databases, but I've never tried to access one whose contents weren't
>> in American English..
>>
>> Now let's say someone in a country whose language requires the use of
>> unicode encoding purchases the program.  WIll it work OK for that person
>> in
>> terms of entering data into the controls and displaying data in the
>> controls from their database, assuming that the database contains UTF8
>> encoded data?  Or do I have to uniencode/decode to ensure things work
>> right?
>>
>> Now let's say the database is using UTF16 encoding, or anything other than
>> UTF8.  I can detect that situation in the database and I think I would
>> need
>> to use uniencode/decode to deal with it?
>>
>> Now the user takes his UTF8 database and puts it on a colleague's computer
>> here in the USA with the computer's language settings set to American
>> English.  I would then need to decode/encode.... I think.
>>
>>  From the original thread, it seems clear that when I import data into the
>> database via SQLiteAdmin, I do need to be aware of the encoding in the
>> imported file and that there may be a way to detect that within the file
>> depending on how it was produced. Conversely, when I export data, I should
>> try to create the same marker in the file.
>>
>> And finally, is the simplest way to take care of this to simply
>> uniencode/decode everything using the databases encoding without regard as
>> to whether that's necessary or not?
>>
>> Pete
>> lcSQL Software <http://www.lcsql.com>
>> ______________________________**_________________
>> use-livecode mailing list
>> use-livecode at lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your
>> subscription preferences:
>> http://lists.runrev.com/**mailman/listinfo/use-livecode<http://lists.runrev.com/mailman/listinfo/use-livecode>
>>
>
>
> ______________________________**_________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/**mailman/listinfo/use-livecode<http://lists.runrev.com/mailman/listinfo/use-livecode>
>



More information about the use-livecode mailing list