Sorting strangeness

Paul Dupuis paul at researchware.com
Mon Sep 16 16:54:42 EDT 2019


Mark,

Thank you, as always, for your incredible depth of knowledge of the engine!

Bug filed: https://quality.livecode.com/show_bug.cgi?id=22378 with 
sample stack and your comments.

If I can impose on you, I have one more question related to this topic:

IF sort lines of <var> ascending text was working correctly for lines of 
mixed ASCII and Unicode, for someone sorting lines of text that can be 
Native text, Unicode text (both RTL and LTR), or mixtures of both, is it 
better to use SORT ... TEXT or SORT ... INTERNATIONAL? I don't know 
enough about what the "international" (using the system locale 
settings)  and Unicode may mean in relation to one another?

Thank you again,

Paul Dupuis
Researchware


On 9/16/2019 2:45 PM, Mark Waddingham via use-livecode wrote:
> On 2019-09-16 19:01, Paul Dupuis via use-livecode wrote:
>> Thanks Bob for being one of the folks on the list who always tries to
>> offer a solutions for people.
>>
>> That said, I have solutions a plenty. My real question is for
>> LIVECODE, LTD or perhaps someone like Mark Waddingham who could
>> actually tell whether this is the expected behavior (not a BUG, but
>> probably should be documented) or an aberrant behavior (a BUG and
>> should be reported)
>
> Its definitely a bug - sorting a field with that content works 
> correctly, but sorting a variable doesn't.
>
> After staring at the string for a while it occurred to me that the 
> line which is sorting incorrectly is all ASCII - "Norwegian Norsk" - 
> indeed the following causes the string to sort correctly again:
>
>   sort <original text> ascending text by (each & 
> (numToCodepoint(0xFFEF)))
>
> When sort is done, it first splits the input string into separate 
> strings - one for each line. In this case the "Norwegian Norsk" line 
> becomes a native string, whereas all the others are unicode. The above 
> forces all lines on which the string is sorted to be forced to unicode 
> so the bug doesn't manifest.
>
> Poking around some more, this also seems to work correctly:
>
>   set the caseSensitive to true
>   sort <original text> ascending text
>
> So there appears to be a difference between the sort keys being 
> generated for unicode and native strings - at least when caseSensitive 
> is false.
>
> The field case works because the field coerces all content to unicode 
> (as the text APIs on all platforms take UTF-16 these days), and I 
> believe there is an optimization in place if you sort a field by lines 
> - it doesn't have to cut anything up, it just uses the backing string 
> from each paragraph.
>
> I have a feeling I know precisely where the problem lies, so if you 
> file a bug (for once) we should be able to fix it quite rapidly.
>
> Warmest Regards,
>
> Mark.
>
> P.S. Another way to get the correct result is to do this (which is 
> essentially what the engine does internally if caseSensitive is true):
>   set the caseSensitive to true
>   sort <original text> ascending text by toLower(each)
>
>





More information about the use-livecode mailing list