Parsing a PDF file

Mike Bonner bonnmike at gmail.com
Fri Jul 8 12:48:34 EDT 2016


Its ugly but, could you use pdf.js to extract the text  in a browser widget
showing the pdf?  http://git.macropus.org/2011/11/pdftotext/example/

Not sure what else is in pdf.js but it looks interesting.

On Fri, Jul 8, 2016 at 10:30 AM, Paul Dupuis <paul at researchware.com> wrote:

> On 7/8/2016 11:55 AM, Colin Holgate wrote:
> > I was trying an export as spreadsheet from Acrobat Pro, but that didn’t
> work. Doing a Save as Text from Acrobat Reader was more successful, but the
> columns come out in a different order, and some columns get combined into a
> single string.
>
> Over the few years, I have spent a ridiculous amount of time exploring
> PDF access via LiveCode is every way possible. Ultimately, for our needs
> we created the XPDF external and transferred it to LiveCode, but we
> explored javascript extraction from a browser. Interapplication
> communication, shell command line tools, etc., etc.
>
> The reality is the PDF format is great for visually representing a
> printed page and totally sucks for text content - that is actually
> getting the characters of the document rather than an image of the
> characters.
>
> There is NO really mapping of characters to their appearance in the PDF
> other than geometric position on the page. You get no font information,
> no size, no styles, zip. You get line breaks at the end of every visible
> line and you can get line breaks in what appears to be the middle of
> content depending upon how the original source document was rendered
> into a PDF. Headers and footers end up in the middle of paragraphs. You
> have no real way to tell a line break from a paragraph break and more.
>
> In truth a NEW portable document format needs to be invented that
> connects and preserves content to its appearance, but I suspect that
> people who want to keep both intact and portable are just using HTML5
> and CSS3.
>
>
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
>



More information about the use-livecode mailing list