Finding and sorting by diacriticals again

Wouter Abraham wouter.abraham at pi.be
Tue Mar 11 20:13:01 EST 2003


> Re: Finding and sorting by diacriticals
>
>     * From: Ben Rubinstein
>     * Subject: Re: Finding and sorting by diacriticals
> * Date: Wed, 05 Mar 2003 06:46:09 -0800
>
> on 3/5/03 11:18 AM, [EMAIL PROTECTED] wrote
>
> > I wrote a HyperTalk handler to perform an offset function that's  
> sensitive
> > to both diacriticals and capitalization:
>
> Out of interest, can you explain in what ways the 'caseSensitive'  
> property
> falls short in this regard?
>
> I've had some problems with that property myself, but (without having  
> tried
> it) I would have expected that it would do what was required here.  If  
> we
> can establish what it's not doing, perhaps we can press for some  
> changes or
> an additional function.  I think this is a very common need.
>
>   Ben Rubinstein               |  Email: [EMAIL PROTECTED]
>   Cognitive Applications Ltd   |  Phone: +44 (0)1273-821600
>   http://www.cogapp.com        |  Fax  : +44 (0)1273-728866
>
>
> Sort, Find, RawKeyDown ... / diacritical problems.
>
>     * From: Pierre
>     * Subject: Sort, Find, RawKeyDown ... / diacritical problems.
> * Date: Tue, 04 Mar 2003 02:47:18 -0800
>

Hi all,

I couldn't find it in the Metacard Reference nor in the Transcript  
Language Dictionary of Revolution, but as a matter of fact the sort  
command is also  caseSensitive. Meaning, if the caseSensitive is set to  
true the sort order may vary, but not always in a consistent way.
A little investigation in the way the sort order varies, can give a  
clue about how the sort algorithm was conceived and when to use it or  
not.
For example:

the string
"a,b,c,d,e,f,g,h,i,j,k,l,A,B,C,D,E,F,G,H,I,J,K,L,ä,å,à,ã,â,á,æ,Ä,Å,À,Ã, 
,Á,Æ"

sort in rev 1.1.1 gives
"a,A,b,B,c,C,d,D,e,E,f,F,g,G,h,H,i,I,j,J,k,K,l,L,á,Á,à,À,â,Â,ä,Ä,ã,Ã,å,Å 
,æ,Æ"
caseSensitive sort in rev 1.1.1 gives
"A,B,C,D,E,F,G,H,I,J,K,L,a,b,c,d,e,f,g,h,i,j,k,l,Ä,Å,á,à,â,ä,ã,å,Æ,æ,À,Ã 
,Â,Á"
sort in mc 2.4.3  gives
"a,A,b,B,c,C,d,D,e,E,f,F,g,G,h,H,i,I,j,J,k,K,l,L,Ä,Å,á,à,â,ä,ã,å,Æ,æ,À,Ã 
,Â,Á"
caseSensitive sort in mc 2.4.3  gives
"A,B,C,D,E,F,G,H,I,J,K,L,a,b,c,d,e,f,g,h,i,j,k,l,Ä,Å,á,à,â,ä,ã,å,Æ,æ,À,Ã 
,Â,Á"
international sort in rev 1.1.1 gives
"Æ,Á,Â,ã,Ã,à,À,å,Å,ä,Ä,æ,á,â,L,K,J,I,H,G,F,E,D,C,B,A,l,k,j,i,h,g,f,e,d,c 
,b,a"
caseSensitive  international sort in rev 1.1.1 gives
"Æ,Á,Â,Ã,À,Å,Ä,æ,á,â,ã,à,å,ä,L,K,J,I,H,G,F,E,D,C,B,A,l,k,j,i,h,g,f,e,d,c 
,b,a"
international sort in mc 2.4.3  gives
"a,A,á,Á,à,À,â,Â,ä,Ä,ã,Ã,å,Å,æ,Æ,b,B,c,C,d,D,e,E,f,F,g,G,h,H,i,I,j,J,k,K 
,l,L"
caseSensitive international sort in mc 2.4.3  gives
"A,Á,À,Â,Ä,Ã,Å,a,á,à,â,ä,ã,å,Æ,æ,B,b,C,c,D,d,E,e,F,f,G,g,H,h,I,i,J,j,K,k 
,L,l"

Often the build in sort order is not what we would like to see after a  
sort. So an idea for a little function emerges.
The example is made for comma delimited lists, but is easily adapted  
for other cases.
To make it work, take a button and make a custom property called  
SortOrder.
Put the desired sort order into it. Use the following format as this is  
easier to edit (the script will make one long string of it).
For example:

0123456789
AaÁáÀàÂâÄäÃãÅåÆæ
Bb
CcÇç
Dd
EeÉéÊêËëÈè
Ff
Gg
Hh
IiÍíÎîÏïÌì
Jj
Kk
Ll
Mm
NnÑñ
OoÓóÒòÔôÖöÕõØøŒœ
Pp
Qq
Rr
Ss
Tt
UuÜüÚúÙùÛû
Vv
Ww
Xx
YyŸÿ
Zz


function ctnSort pTextToSort
   get the SortOrder of me              #### that is if the custom  
property is also in the same object as the script
   repeat for each line kk in it
     put kk after so
   end repeat
   set the caseSensitive to true     #### important to make the offset  
function work correctly
   repeat for each item ii in pTextToSort
     repeat for each char cc in ii
       put offset(cc,so) & space after tempVar
     end repeat
     delete last char of tempVar
     put tempVar & "," & ii & cr after theSorter
     put "" into tempVar
   end repeat
   sort lines of theSorter numeric by first item of each
   repeat for each line ll in theSorter
     put item 2 of ll & "," after theResult
   end repeat
   delete last char of theResult
   return theResult
end ctnSort

By the way, if the caseSensitive is set to true the find command will  
find the right "Élephant" among  
"œuf,Elephant,Eléphant,Élephant,Éléphant,èléphant,Tigre,"

Have a nice night,
WA




More information about the use-livecode mailing list