Reading PDF - a cry for help

Joe Lewis Wilkins pepetoo at cox.net
Thu Sep 29 13:02:12 EDT 2011


I find all of this somewhat tantalizing, but the only way I've found to make a PDF document useful in what I'm doing is to take a screen shot of it and then paste or import it as an image into the other application. Though I do this mostly in MacDraft, I should imagine that the same technique can be used in LC, since I often use MD as a method of transitioning different kinds of images into LC. Of course I'm interested in what you "see" in a PDF; not what else there might be there, of which I know nothing. I don't understand all of this "parsing" of data from or in a PDF.

Joe Wilkins


On Sep 29, 2011, at 9:50 AM, Dar Scott wrote:

> 
> On Sep 29, 2011, at 9:24 AM, Ken Ray wrote:
>> Are you looking at just extracting the images? Or other relevant parts of the PDF? The reason I ask is that it looks like binary data is always contained between two lines: "stream" and "endstream", so extracting just the streaming data should be pretty quick to do; although the next step would be going to read the bytes of what was extracted and then determine if it's an image or some other thing that had to be represented with a "stream" in the PDF...
> 
> 
> There are a couple issues that complicate this in general.  
> 
> The parameters needed to process the stream need to be parsed and they can be far away.  
> 
> There are many stream filters (some complicated compression) and they can be nested.  I looked at a corpus of PDF files and, yeah, a several are used in practice.
> 
> However, if one needs to parse the output of a specific program or a specific model of a scanner, then the work to do parsing in LiveCode is a lot less.
> 
> I hope that makes sense; I'm a little under the weather today.
> 
> Dar





More information about the use-livecode mailing list