text copied form LC generated PDF, WTF?
Mark Waddingham
mark at livecode.com
Thu Feb 20 08:55:00 EST 2020
On 2020-02-18 18:40, Klaus major-k via use-livecode wrote:
> Hi friends,
>
> I know that copying text form a PDF file may result in unexspected
> results,
> but this is really ridicoulous!?
>
> I created a PDF from LC (selected "Save as PDF" in the macOS Print
> dialog)
> and when I copy some text and past it into TextEdit, this is what i
> get:
> <https://major-k.de/staxx/text_from_lc_pdf.jpg>
> Where on earth are my numbers and where is my text?
>
> Any insights very appreciated!
As requested by Klaus on the forum thread
(http://forums.livecode.com/viewtopic.php?f=9&t=33683&start=15) this
isn't a bug.
TL;DR version - extracting text from PDFs is hard, and viewers all do it
differently with different levels of 'correctness'.
The fonts used and the layout can affect what they can produce.
In this case, the stack in question was being printed with the default
system theme fonts (on macOS this is .SFNSText it would seem) - and for
whatever reason that font generates glyphs for numbers in the PDF which
PDF viewers don't seem to be able to map back to actual digits.
Upshot - make sure the controls you are printing have an explicit font
setting to a 'normal' font if you want to be able to copy text from any
PDF you might generate as a result :)
Warmest Regards,
Mark.
--
Mark Waddingham ~ mark at livecode.com ~ http://www.livecode.com/
LiveCode: Everyone can create apps
More information about the use-livecode
mailing list