Sorting strangeness

Paul Dupuis paul at researchware.com
Wed Sep 18 10:43:16 EDT 2019


Again, thank you Mark - this helped immensely.

In making a commercial text analytics application, we do a lot of 
sorting of user entered textual data. With the move from LC6.7.11 to 
LC9.0.5rc1 we discovered the sort text issue bug, but them we many 
dozens of sort statements through out our code, we wanted to be sure 
switching all text sorts to international was the right way to go. All 
are for user facing data. The alternative was to abstract them to a new 
function that used any of the work-arounds you identified for the sort 
text issue.

Thanks again!

On 9/18/2019 2:57 AM, Mark Waddingham via use-livecode wrote:
> On 2019-09-16 21:54, Paul Dupuis via use-livecode wrote:
>> IF sort lines of <var> ascending text was working correctly for lines
>> of mixed ASCII and Unicode, for someone sorting lines of text that can
>> be Native text, Unicode text (both RTL and LTR), or mixtures of both,
>> is it better to use SORT ... TEXT or SORT ... INTERNATIONAL? I don't
>> know enough about what the "international" (using the system locale
>> settings)  and Unicode may mean in relation to one another?
>
> You should use 'sort international' when you are displaying a sorted list
> to a user who is looking through it manually.
>
> The ordering provided by 'sort text' is purely by unicode codepoint, 
> which
> has no direct relation to 'expected' order when read by a human and 
> instead
> is determined by technical considerations (structuring a large 21-bit 
> space,
> frequency of use and, most importantly, round-tripping to legacy 
> encodings and
> standards).
>
> The core of the sort order provided by 'international' sorting is the
> Unicode Collation Algorithm - which provides (at its code) a 
> locale-independent
> order for all the languages/scripts present in Unicode. e.g. Latin 
> European
> languages are generally expected to come before Greek which is 
> expected to
> come before Cyrillic.
>
> This core order is then tailored by locale to enable account to be taken
> of the individual expectations of the user of the sorted list. For 
> example,
> different languages have different sort orders for what you might 
> consider
> the 'same letters' due to using the same glyphs. For example, a 
> Swedish user
> would expect 'z' to sort before 'ö'; whereas a German user would expect
> 'ö' to sort before 'z'.
>
> The engine uses ICU's implementation of Unicode collation, and 
> supports a wide
> range of locales - the locale used is read on engine startup from the 
> user's
> system settings.
>
> Hope this helps,
>
> Mark.
>





More information about the use-livecode mailing list