Unicode mysteries

Mark Waddingham mark at livecode.com
Thu Mar 26 05:52:44 EDT 2020


On 2020-03-26 06:53, Neville Smythe via use-livecode wrote:
> Which should correspond to codepoints
>        1F3F4 E0067 E0062 E0073 E0063 E0074 E007F
> And indeed if I manually build a UTF-16 string with these code points
> it does display as the flag of Scotland. So the lesson is that the
> reported chunks are not to be naively trusted  --- tho not exactly a
> bug given the documentation warning.

Well this would be a bug! If you try codepoint 1..14 - then you will see 
that they alternate between a codepoint and zero - the codepoints appear 
to correspond to the relevant surrogate pair codeunits. i.e. codepoint 
is misinterpreting the index as a codeunit index, rather than a 
codepoint index :|

If you file a bug then I suspect this can be fixed quite quickly (famous 
last words of course!).

> Another question (which I think has been raised before but I don’t
> think there was an answer?). When a character (codepoint) in a string
> is displayed, if the requested font does not have that codepoint the
> OS substitutes a glyph from another font (or the missing character
> glyph if no font supports the codepoint). So for example if you change
> the font of the above flag of Scotland to Arial, it still displays as
> the flag of Scotland, even though this glyph is not in Arial. LC will
> still report that the font of this character is Arial: from what I can
> gather this is not the fault of LC, the OS is doing this substitution
> behind its back (TextEdit does the same). But is there any way to find
> out (programatically) the actual font being used?

Unfortunately not easily - fallback mechanisms of this sort occur quite 
low down in the text layout / rendering code. What do you need to know 
what font is actually being used for?

Warmest Regards,

Mark.

-- 
Mark Waddingham ~ mark at livecode.com ~ http://www.livecode.com/
LiveCode: Everyone can create apps




More information about the use-livecode mailing list