How to find offsets in Unicode Text fast
brian at milby7.com
Mon Nov 12 13:13:06 EST 2018
I noticed something similar, but did not have a chance to dig into it. If
I copied the complex character that Geoff inserted (is that Kanji?) into
the string to search, I also got no results for UTF32. But, if I also
copied it into the string to find field, then the results worked
partially. Case folding was broken. (åå𠜎)(aaåå𠜎ÅÅ𠜎åå𠜎aa)(3,9 case
sensitive or not for UTF32)
On Mon, Nov 12, 2018 at 11:57 AM Niggemann, Bernd <Bernd.Niggemann at uni-wh.de>
> Thank you Brian for putting the test stack up. It makes it easier to test
> various non-ASCII texts.
> As your testing shows the UTF16 variant can be misleading.
> Unfortunately I also found a case of UTF32 not working.
> I copied from Icelandic Wikipedia from the entry about the capital
> Reykjavik some text as source (haystack) and put the Icelandic word for
> Reykjavik (Reykjavík) into the delimiter(needle).
> Using UTF16 works but alas UTF32 does not find anything.
> So now it seems that my attempt to fool the offset function into greater
> speed by using either UTF16 or UTF32 textEncoded versions of "needle" and
> "haystack" is not reliable.
> Probably there is an explanation for this which eludes me.
> Sorry to have to retract my proposition for being unreliable. Would have
> loved to use the speed gain for "offset" which is horribly slow for
> non-ASCII text.
> Kind regards
> Am 12.11.2018 um 12:00 schrieb use-livecode-request at lists.runrev.com:
> From: Brian Milby
> To: How to use LiveCode <use-livecode at lists.runrev.com>
> Subject: Re: How to find offsets in Unicode Text fast
> I just tried one additional test. Search for "åå" within "aaååÅÅååaa".
> (On a Mac keyboard, the characters are made with A, Option-A, and
> Shift-Option-A.) The Offset UTF16 version does not return the correct
> result if case sensitive is false (returns the same value as if it were
> true: 3,7). Every other version correctly performs the case folding
More information about the Use-livecode