Reading PDF - a cry for help

Vokey, John vokey at uleth.ca
Fri Sep 30 03:12:43 EDT 2011


The problem (success?) with pdf is that it is, uniquely, pdf: it is text, stylized text, bitmaps, vector graphics, and everything else.  Covert eps, for example, to pdf and all is fine (you can easily extract the eps from the pdf).  Add a second pdf (or another eps) to that same file, and the eps becomes encoded pdf: the same is true for stylized text or anything else.  All of which is to say, you cannot easily extract a precise image from a more complex pdf file, especially one that is vector graphics in form.  You can, as Preview does, cut from a pdf an aspect of that pdf as a pdf image, but that pdf won't be the original graphic (typically).  You can also extract the text in a simple ascii form that, typically on Windows, loses all the ligatures (on the Mac it is usually more successful).  TeXShop, unlike Preview, for example, does try to extract eps from pdf, but fails if the encoding was other than eps-->pdf.  Preview just doesn't bother as it is not usually possible.  OTH, pdf-->pdf always works, which is one of the principle reasons pdf dominates everywhere (except in that dark world of Windows).

I do most of my work in LaTeX, and most of my figures are vector graphics.  That means an entire manuscript when compiled to pdf, including all the stylized text, tables and figures is *at most* a few hundred K.  I have books I have written in LaTeX that over hundreds of pages and figures are still at most a few megabytes when compiled to pdf.  Even one of those figures of those documents if converted from pdf to say, png, or tiff, or jpeg would be larger than the entire document in pdf.  My point is simple: if in pdf stay there: it is already the best format.


On 2011-09-29, at 6:56 PM, use-livecode-request at lists.runrev.com wrote:

> I find all of this somewhat tantalizing, but the only way I've found to make a PDF document useful in what I'm doing is to take a screen shot of it and then paste or import it as an image into the other application. Though I do this mostly in MacDraft, I should imagine that the same technique can be used in LC, since I often use MD as a method of transitioning different kinds of images into LC. Of course I'm interested in what you "see" in a PDF; not what else there might be there, of which I know nothing. I don't understand all of this "parsing" of data from or in a PDF.
> 
> Joe Wilkins

--
Please avoid sending me Word or PowerPoint attachments.
See <http://www.gnu.org/philosophy/no-word-attachments.html>

-Dr. John R. Vokey






More information about the use-livecode mailing list