Sorting strangeness
Mark Waddingham
mark at livecode.com
Mon Sep 16 14:45:30 EDT 2019
On 2019-09-16 19:01, Paul Dupuis via use-livecode wrote:
> Thanks Bob for being one of the folks on the list who always tries to
> offer a solutions for people.
>
> That said, I have solutions a plenty. My real question is for
> LIVECODE, LTD or perhaps someone like Mark Waddingham who could
> actually tell whether this is the expected behavior (not a BUG, but
> probably should be documented) or an aberrant behavior (a BUG and
> should be reported)
Its definitely a bug - sorting a field with that content works
correctly, but sorting a variable doesn't.
After staring at the string for a while it occurred to me that the line
which is sorting incorrectly is all ASCII - "Norwegian Norsk" - indeed
the following causes the string to sort correctly again:
sort <original text> ascending text by (each &
(numToCodepoint(0xFFEF)))
When sort is done, it first splits the input string into separate
strings - one for each line. In this case the "Norwegian Norsk" line
becomes a native string, whereas all the others are unicode. The above
forces all lines on which the string is sorted to be forced to unicode
so the bug doesn't manifest.
Poking around some more, this also seems to work correctly:
set the caseSensitive to true
sort <original text> ascending text
So there appears to be a difference between the sort keys being
generated for unicode and native strings - at least when caseSensitive
is false.
The field case works because the field coerces all content to unicode
(as the text APIs on all platforms take UTF-16 these days), and I
believe there is an optimization in place if you sort a field by lines -
it doesn't have to cut anything up, it just uses the backing string from
each paragraph.
I have a feeling I know precisely where the problem lies, so if you file
a bug (for once) we should be able to fix it quite rapidly.
Warmest Regards,
Mark.
P.S. Another way to get the correct result is to do this (which is
essentially what the engine does internally if caseSensitive is true):
set the caseSensitive to true
sort <original text> ascending text by toLower(each)
--
Mark Waddingham ~ mark at livecode.com ~ http://www.livecode.com/
LiveCode: Everyone can create apps
More information about the use-livecode
mailing list