How to find offsets in Unicode Text fast

Niggemann, Bernd Bernd.Niggemann at uni-wh.de
Mon Nov 12 12:56:45 EST 2018


Thank you Brian for putting the test stack up. It makes it easier to test various non-ASCII texts.

As your testing shows the UTF16 variant can be misleading.

Unfortunately I also found a case of UTF32 not working.

I copied from Icelandic Wikipedia from the entry about the capital Reykjavik some text as source (haystack) and put the Icelandic word for Reykjavik (Reykjavík) into the delimiter(needle).

Using UTF16 works but alas UTF32 does not find anything.

So now it seems that my attempt to fool the offset function into greater speed by using either UTF16 or UTF32 textEncoded versions of "needle" and "haystack" is not reliable.

Probably there is an explanation for this which eludes me.

Sorry to have to retract my proposition for being unreliable. Would have loved to use the speed gain for "offset" which is horribly slow for non-ASCII text.

Kind regards
Bernd



Am 12.11.2018 um 12:00 schrieb use-livecode-request at lists.runrev.com<mailto:use-livecode-request at lists.runrev.com>:

From: Brian Milby
To: How to use LiveCode <use-livecode at lists.runrev.com<mailto:use-livecode at lists.runrev.com>>
Subject: Re: How to find offsets in Unicode Text fast


I just tried one additional test.  Search for "åå" within "aaååÅÅååaa".
(On a Mac keyboard, the characters are made with A, Option-A, and
Shift-Option-A.)  The Offset UTF16 version does not return the correct
result if case sensitive is false (returns the same value as if it were
true: 3,7).  Every other version correctly performs the case folding
(3,4,5,6,7).



More information about the use-livecode mailing list