Unicode mysteries

Neville Smythe neville.smythe at optusnet.com.au
Thu Mar 26 14:22:16 EDT 2020


> 
>> Which should correspond to codepoints
>>       1F3F4 E0067 E0062 E0073 E0063 E0074 E007F
>> And indeed if I manually build a UTF-16 string with these code points
>> it does display as the flag of Scotland. So the lesson is that the
>> reported chunks are not to be naively trusted  --- tho not exactly a
>> bug given the documentation warning.
> 
> Well this would be a bug! If you try codepoint 1..14 - then you will see 
> that they alternate between a codepoint and zero - the codepoints appear 
> to correspond to the relevant surrogate pair codeunits. i.e. codepoint 
> is misinterpreting the index as a codeunit index, rather than a 
> codepoint index :|
> 
> If you file a bug then I suspect this can be fixed quite quickly (famous 
> last words of course!).


Thanks Mark, I will file a bug report.

I don’t *really* need the actual font the system uses to display unsupported codepoints. I was thinking of using it as a lazy way to find out which single codepoints are supported rather than having to parse the cmap tables in the font file. As a way of learning about unicode I was trying to writing an LC version of the character map/PopChar utilities; a project doomed to failure because it’s just too hard to find out which multi-codepoint glyphs are supported by a font. This is a question frequently asked on forums, but it seems there is no answer other than reverse engineering the morx table in the fontfile, which is way too complex to be worth the effort. There is a published list for Emoji fonts but that would not be possible for general ligatures or glyph variations presumably.

Any comment on the LC behaviour of treating the Rainbow flag (which is a multi-codepoint glyph composed of three characters)
as 3 separate text characters, requiring 3 backspace operations to delete it in a field, rather than a single backspace as works in TextEdit?  [The first backspace eliminates the rainbow flag glyph but leaves the white flag showing; the second backspace eliminates the invisible join codepoint, so to the user seems to do nothing; the third backspaced finally eliminates the last glyph.]  Is this a design choice or a bug?

Bob: I am looking at the Digest, where nonstandard characters (even, annoyingly, quotes) are replaced by question marks, which makes code snippets very hard to read. Is there a setting I should change to fix this?

Neville


More information about the use-livecode mailing list