Sort, Find, RawKeyDown ... / diacritical problems.

Emmanuel Companys mcompanys at mac.com
Wed Mar 5 05:01:00 EST 2003


Le lundi, 3 mars 2003, à 18:45 Europe/Paris, Pierre a écrit :

> De: Pierre <pierre.bernaert>
> Date: Sam 1 mars 2003  09:20:33 Europe/Paris
> À: Liste Révolution <use-revolution at lists.runrev.com>
> Objet: Sort,  Find, RawKeyDown ... / diacritical problems.
>
> Hi everyone,
>
> I'm a french and i'm in the process of translating major (For me)  
> applications written with Hypercard to RR 1.1.
>
> The main tools of some of them mainly relies on "HyperText" approach.
>
> I found, using RR, that SORT and FIND commands didn't work  with words  
> having diacritical (Lettres accentuées).
>
> I did send a mail to Kevin Miller below is its reply.
>
> ----------------------------------------------------------------------- 
> -----------------------------------------------------
> 					My mail and the Answer from Kevin
>
> De: Kevin Miller <kevin at runrev.com>
> Date: Jeu 20 fév 2003  20:39:03 Europe/Paris
> À: <pierre.bernaert at wanadoo.fr>
> Objet: Rép : Find, Sort and more generally speaking "Diacritical" ...
>
> On 19/2/03 8:25 am, Pierre <pierre.bernaert at www.runrev.com> wrote:
>
> As you surely know, in RR 1.1 diacritical doesn't work as far as:
>
> • At least FIND and SORT commands are concerned.
> • I believe "RawKeyDown" is concerned too
>
> In both cases you get wrong answers (This was working fine with HC
> using "International" for sorting).
>
> I'm not used to "Unicode", as far as I know it will be implemented in
> Version 2.
>
> Will "Unicode" solve this major problem for "Hyper Text" applications ?
>
> Thank to tell me what are the plans on this topic.
>
> You can script around these issues, relatively easily.  Unicode  
> supports
> entering and display international text.  We don't plan to make any  
> further
> changes to these functions for 2.0, but could consider revisiting this  
> (at
> the scripting level at least) for 2.1.  In the mean time, try asking  
> on the
> use-revolution mailing list, someone there is bound to be able to help  
> write
> a script to overcome these issues.
>
> Kind regards,
>
> Kevin
> ----------------------------------------------------------------------- 
> -----------------------------------------------------------------------
>
> As far as I am concerned I don't see how to deal with this, so my  
> question to the members of this list is
>
> Has some members an idea on how to solved this and better has someone  
> made it work ??

I encountered the same problem one year ago when I was working on my  
program Polylexis, that you may download from my iDisk.

HyperCard has had "sort ascii" sorting according to the ascii value of  
each character, AND "sort international" with ignored both the upper  
case/low case difference AND any diacritics. This was a "pis-aller"  
more or less acceptable solution for French. José Ileras, a RR user  
from Barcelona, told me that he had corresponded about the sorting  
problem with the R-R staff and that it should be taken care of in  
version 1.1; but I didn't find it was really implmented.

Besides, the problem is not solved by just ignoring diacritics :

FIRST: Where will be sorted special chars such as the german "SZ" (ß)?;  
or "bar-o" (ø), the "edh", "thorn", "bar-D", the "medium dot"  (·), etc  
for that matter?

SECOND: the diacritized letters have a special sorting behavior  
depending on the language:
a) ñ is sorted as a different letter in spanish,  betwen N and O, in  
dictionnaries; and so is ç and  many diacrized letters in several  
languages. Ignoring the diacritic then totally changes the alphabetical  
presentation.
b) even when the diacritized letter is not considered as a separate  
letter, the typographic rules of the language my asign them a special  
place: for instance, in French "Macon" may come before "Maçon", "Lez"  
before "Lés", "Prés" before "Près" and so on. By simply ignoring the  
diacritics we get a random sorting.

THIRD: Digraphs may have a special behavior too:
a) Ligatures such as æ or œ will be equated to their upper case  
correspondents by "international sorting"; but where will they be  
listed? After the Z? As a separate letter (between  and B, or O and P),  
as if they were normal digraphs (ae, oe)? equated to "ä" or "ö" (and  
"ø")?

Kevin Miller says:

"We don't plan to make any further changes to these functions for 2.0,  
but could consider revisiting this (at the scripting level at least)  
for 2.1."
I wish many R-R user will encourage him to do so (although I don't  
understand what  "at the scripting level at least" means). The use of  
Unicode by istelf will not solve all the sorting problems, and,  
besides, it still has many inconveniences, mostly practical, and some  
technical; they certainly will be overcome but to use this two-byte  
system now, for chars that had been correctly sorted in any computer  
for 20 years is simply illogical.

The work around I used in Polylexis is less elegant that Jan  
Schenkels's scripts  
(http://lists.runrev.com/pipermail/use-revolution/2002-September/ 
008173.html), and I don't knw which one is faster. But mine takes care  
of all the points, while Jan's simple makes the diacritics and the case  
to be ignored.

My script put the field to be sorted in a local, then reads the first  
word of each line of the local, and puts into a second item the "fake  
form" of it. This "fake form" is obtained by changing the diacritized  
letter by a digraph (or trigraph) that will ensure the correct location  
in the list: for instance, for spanish it will replace "ñ" by " "nzz"  
and "ch" by "cz" and "ll" by "lzz". Then I' just have to sort by item  
2". Afterwards item 2 is deleted, and the contents of the local  
replaces that of the original field. I wrote a different "make fake"  
function for each of the 9 languages used in my program.

Of course, this is slow in old computers. R-R should have at least: a)  
an option "sort international", and b) externals or plug-ins for at  
least the most used one-byte compliant languages. It would be nice if  
we had an option "sort system selected language" using the sorting  
system selected by the operating system; but I don't know how much this  
would be difficult, in a cross platform program...

Manuel






































































































































































































































-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 6654 bytes
Desc: not available
URL: <http://lists.runrev.com/pipermail/use-livecode/attachments/20030305/14e83773/attachment.bin>


More information about the use-livecode mailing list