Unicode mysteries
Mark Waddingham
mark at livecode.com
Thu Mar 26 05:52:44 EDT 2020
On 2020-03-26 06:53, Neville Smythe via use-livecode wrote:
> Which should correspond to codepoints
> 1F3F4 E0067 E0062 E0073 E0063 E0074 E007F
> And indeed if I manually build a UTF-16 string with these code points
> it does display as the flag of Scotland. So the lesson is that the
> reported chunks are not to be naively trusted --- tho not exactly a
> bug given the documentation warning.
Well this would be a bug! If you try codepoint 1..14 - then you will see
that they alternate between a codepoint and zero - the codepoints appear
to correspond to the relevant surrogate pair codeunits. i.e. codepoint
is misinterpreting the index as a codeunit index, rather than a
codepoint index :|
If you file a bug then I suspect this can be fixed quite quickly (famous
last words of course!).
> Another question (which I think has been raised before but I don’t
> think there was an answer?). When a character (codepoint) in a string
> is displayed, if the requested font does not have that codepoint the
> OS substitutes a glyph from another font (or the missing character
> glyph if no font supports the codepoint). So for example if you change
> the font of the above flag of Scotland to Arial, it still displays as
> the flag of Scotland, even though this glyph is not in Arial. LC will
> still report that the font of this character is Arial: from what I can
> gather this is not the fault of LC, the OS is doing this substitution
> behind its back (TextEdit does the same). But is there any way to find
> out (programatically) the actual font being used?
Unfortunately not easily - fallback mechanisms of this sort occur quite
low down in the text layout / rendering code. What do you need to know
what font is actually being used for?
Warmest Regards,
Mark.
--
Mark Waddingham ~ mark at livecode.com ~ http://www.livecode.com/
LiveCode: Everyone can create apps
More information about the use-livecode
mailing list