Sorting strangeness
Paul Dupuis
paul at researchware.com
Mon Sep 16 16:54:42 EDT 2019
Mark,
Thank you, as always, for your incredible depth of knowledge of the engine!
Bug filed: https://quality.livecode.com/show_bug.cgi?id=22378 with
sample stack and your comments.
If I can impose on you, I have one more question related to this topic:
IF sort lines of <var> ascending text was working correctly for lines of
mixed ASCII and Unicode, for someone sorting lines of text that can be
Native text, Unicode text (both RTL and LTR), or mixtures of both, is it
better to use SORT ... TEXT or SORT ... INTERNATIONAL? I don't know
enough about what the "international" (using the system locale
settings) and Unicode may mean in relation to one another?
Thank you again,
Paul Dupuis
Researchware
On 9/16/2019 2:45 PM, Mark Waddingham via use-livecode wrote:
> On 2019-09-16 19:01, Paul Dupuis via use-livecode wrote:
>> Thanks Bob for being one of the folks on the list who always tries to
>> offer a solutions for people.
>>
>> That said, I have solutions a plenty. My real question is for
>> LIVECODE, LTD or perhaps someone like Mark Waddingham who could
>> actually tell whether this is the expected behavior (not a BUG, but
>> probably should be documented) or an aberrant behavior (a BUG and
>> should be reported)
>
> Its definitely a bug - sorting a field with that content works
> correctly, but sorting a variable doesn't.
>
> After staring at the string for a while it occurred to me that the
> line which is sorting incorrectly is all ASCII - "Norwegian Norsk" -
> indeed the following causes the string to sort correctly again:
>
> sort <original text> ascending text by (each &
> (numToCodepoint(0xFFEF)))
>
> When sort is done, it first splits the input string into separate
> strings - one for each line. In this case the "Norwegian Norsk" line
> becomes a native string, whereas all the others are unicode. The above
> forces all lines on which the string is sorted to be forced to unicode
> so the bug doesn't manifest.
>
> Poking around some more, this also seems to work correctly:
>
> set the caseSensitive to true
> sort <original text> ascending text
>
> So there appears to be a difference between the sort keys being
> generated for unicode and native strings - at least when caseSensitive
> is false.
>
> The field case works because the field coerces all content to unicode
> (as the text APIs on all platforms take UTF-16 these days), and I
> believe there is an optimization in place if you sort a field by lines
> - it doesn't have to cut anything up, it just uses the backing string
> from each paragraph.
>
> I have a feeling I know precisely where the problem lies, so if you
> file a bug (for once) we should be able to fix it quite rapidly.
>
> Warmest Regards,
>
> Mark.
>
> P.S. Another way to get the correct result is to do this (which is
> essentially what the engine does internally if caseSensitive is true):
> set the caseSensitive to true
> sort <original text> ascending text by toLower(each)
>
>
More information about the use-livecode
mailing list