Finding invisible/non printable characters in a string
curry at pair.com
Mon May 10 13:23:48 EDT 2021
> Would I be right in thinking if codepoint count > the number of chars
> in a text string, then it probably contains invisible characters?
Negative; there are other possibilities. Including....
> There are characters that consist of more than one codepoint -
> composite versions of characters for accents. See
Yes! An example would be Hindi: नमस्ते
Which per LC has 6 codepoints and 4 chars.
Another fun accent example is using a "Zalgo" generator:
68 codepoints vs 8 chars! Zalgo heaps random accents onto characters.
But as you can see, many languages and notation systems have modifiers.
Typically such accents combine with another character, but thanks to
the Magic of Bugs you can also see them breaking free and "doing their
own thing" sometimes, as if they were separate characters, by pasting
some Thai or Myanmar text into an LC field and resizing that field:
On the possibility of invisible characters, there is also an LC bug
which inserts one or more nulls after pasted text on Windows:
But the nulls count as characters, so the codepoint count still matches.
I thought there was another category of Unicode values affected besides
the combining modifiers, but if so, it's eluding me at the moment. :)
Custom Software Development
"Better Methods, Better Results"
LiveCode Training and Consulting
More information about the use-livecode