How to find offsets in Unicode Text fast
Monte Goulding
monte at appisle.net
Mon Nov 12 19:21:10 EST 2018
Hi Folks
I was a bit perplexed by this so I had a quick look about the engine and I see the issue. The problem is you are using `offset` which works on characters. Characters in LiveCode are neither unicode codepoints or bytes. They are graphemes. This means that when you have chars to skip the entire string needs to be parsed to find the grapheme boundaries so that the index can be translated into graphemes to skip. Note that if the strings you were dealing with weren’t unicode then the translation of chars to graphemes is 1 -> 1 so there’s no big cost which is why things are much faster when you textEncode and offset that.
So! Change to using codepointOffset and hopefully it will be much speedier!
Cheers
Monte
More information about the use-livecode
mailing list