How to find offsets in Unicode Text fast

Niggemann, Bernd Bernd.Niggemann at
Mon Nov 12 15:08:42 EST 2018


Please see my remarks out failing UTF-32 with some Icelandic characters. Currently I would not recommend offset(UTF-32 text) unless one knows which character set is suited to be used and is in control of that character set. The same goes for UTF-16.

I also thought that byteOffset would be faster for case-sensitive search in UTF-32 text. It turned out to be slower than offset(UTF-32 text).

>Ben Rubinstein via use-livecode<> Mon, 12 Nov 2018 11:38:26 -0800<>

>Coming late to this discussion. Very excited by this approach of converting everything to UTF-32 in order to do fast offsets.

>In the meantime I'd be suspicious about doing a case-insensitive search in this way; but my guess would be that, if your use-case will accept case->sensitivity, it would be safer (and faster?) to use byteOffset on the UTF-32 data rather than offset.

Kind regards

More information about the use-livecode mailing list