OMG text processing performance 6.7 - 9.5 - correction

Thu Feb 6 04:29:27 EST 2020

Belay my claim about the offsets found from using an offset search on raw text and on the utf-8 version of that text giving exactly the same offset numbers for corresponding hits - they don’t of course. The offsets reported in the raw text are binary 8-bit character offsets, the offsets reported in the utf-8 encoded text are unicode character offsets, as they must be.

Apologies, I was reading my data incorrectly.

Neville