text copied form LC generated PDF, WTF?

Mark Waddingham mark at livecode.com
Thu Feb 20 08:55:00 EST 2020


On 2020-02-18 18:40, Klaus major-k via use-livecode wrote:
> Hi friends,
> 
> I know that copying text form a PDF file may result in unexspected 
> results,
> but this is really ridicoulous!?
> 
> I created a PDF from LC (selected "Save as PDF" in the macOS Print 
> dialog)
> and when I copy some text and past it into TextEdit, this is what i 
> get:
> <https://major-k.de/staxx/text_from_lc_pdf.jpg>
> Where on earth are my numbers and where is my text?
> 
> Any insights very appreciated!

As requested by Klaus on the forum thread 
(http://forums.livecode.com/viewtopic.php?f=9&t=33683&start=15) this 
isn't a bug.

TL;DR version - extracting text from PDFs is hard, and viewers all do it 
differently with different levels of 'correctness'.

The fonts used and the layout can affect what they can produce.

In this case, the stack in question was being printed with the default 
system theme fonts (on macOS this is .SFNSText it would seem) - and for 
whatever reason that font generates glyphs for numbers in the PDF which 
PDF viewers don't seem to be able to map back to actual digits.

Upshot - make sure the controls you are printing have an explicit font 
setting to a 'normal' font if you want to be able to copy text from any 
PDF you might generate as a result :)

Warmest Regards,

Mark.

-- 
Mark Waddingham ~ mark at livecode.com ~ http://www.livecode.com/
LiveCode: Everyone can create apps




More information about the use-livecode mailing list