text copied form LC generated PDF, WTF?

Klaus major-k klaus at major-k.de
Thu Feb 20 09:31:30 EST 2020


Hi Mark,

> Am 20.02.2020 um 14:55 schrieb Mark Waddingham via use-livecode <use-livecode at lists.runrev.com>:
> 
> On 2020-02-18 18:40, Klaus major-k via use-livecode wrote:
>> Hi friends,
>> I know that copying text form a PDF file may result in unexspected results,
>> but this is really ridicoulous!?
>> I created a PDF from LC (selected "Save as PDF" in the macOS Print dialog)
>> and when I copy some text and past it into TextEdit, this is what i get:
>> <https://major-k.de/staxx/text_from_lc_pdf.jpg>
>> Where on earth are my numbers and where is my text?
>> Any insights very appreciated!
> 
> As requested by Klaus on the forum thread (http://forums.livecode.com/viewtopic.php?f=9&t=33683&start=15) this isn't a bug.
> TL;DR version - extracting text from PDFs is hard, and viewers all do it differently with different levels of 'correctness'.
> The fonts used and the layout can affect what they can produce.
> In this case, the stack in question was being printed with the default system theme fonts (on macOS this is .SFNSText it would seem) - and for whatever reason that font generates glyphs for numbers in the PDF which PDF viewers don't seem to be able to map back to actual digits.
> Upshot - make sure the controls you are printing have an explicit font setting to a 'normal' font if you want to be able to copy text from any PDF you might generate as a result :)

thank you very much for this valuable hint! :-D

> Warmest Regards,
> 
> Mark.

Best

Klaus

--
Klaus Major
https://www.major-k.de
klaus at major-k.de





More information about the use-livecode mailing list