What is LC's internal text format?

Monte Goulding monte at appisle.net
Tue Nov 13 19:49:27 EST 2018



> On 14 Nov 2018, at 11:39 am, Monte Goulding via use-livecode <use-livecode at lists.runrev.com> wrote:
> 
>> You generally want to use codepoint in 7+ generally where previously you used char unless you know you are dealing with a binary string and then you use byte.
> 
> Sorry! I have written codepoints here when I was thinking codeunits! Use codeunits rather than codepoints as they are a fixed number of bytes (2). Codepoints may be 2 or 4 bytes so there is a cost in figuring out the number of codepoints or the exact byte codepoint x refers to. So for chunk expressions on unicode strings use `codeunit x to y`.

Argh… sorry again… codeunits are a fixed number of bytes but that fixed number depends on whether the string is native encoded (1 byte) or UTF-16 (2 bytes)!

And for completeness codeunit/codepoint is not equivalent to char. If you really need to count graphemes then you will need to use char.

Cheers

Monte


More information about the use-livecode mailing list