Reading PDF - a cry for help

stephen barncard stephenREVOLUTION2 at barncard.com
Thu Sep 29 12:27:05 CDT 2011


I just created a PDF with two lines of text and a 16x16 graphic and
following text. A very basic test pattern.   Then I opened up the document
in BBEdit.

Yikes!   A huge amount of data is there - some binary and some plain text
for such simple data. A minefield of fun and a big project to work with.

Here's the specs ( a large PDF document) right from the source.     PDF
SPECS DOCUMENT<http://partners.adobe.com/public/developer/en/pdf/PDFReference.pdf>

Good luck.

On 29 September 2011 10:11, Dar Scott <dsc at swcp.com> wrote:

> There are command-line utilities that will take a pdf page and render it
> onto an image and store the image as a standard file.  Some work with
> multiple page documents.  These can work with the LiveCode shell() function.
>
> Dar
>
>
> On Sep 29, 2011, at 11:02 AM, Joe Lewis Wilkins wrote:
>
> > I find all of this somewhat tantalizing, but the only way I've found to
> make a PDF document useful in what I'm doing is to take a screen shot of it
> and then paste or import it as an image into the other application. Though I
> do this mostly in MacDraft, I should imagine that the same technique can be
> used in LC, since I often use MD as a method of transitioning different
> kinds of images into LC. Of course I'm interested in what you "see" in a
> PDF; not what else there might be there, of which I know nothing. I don't
> understand all of this "parsing" of data from or in a PDF.
> >
> > Joe Wilkins
> >
> >
> > On Sep 29, 2011, at 9:50 AM, Dar Scott wrote:
> >
> >>
> >> On Sep 29, 2011, at 9:24 AM, Ken Ray wrote:
> >>> Are you looking at just extracting the images? Or other relevant parts
> of the PDF? The reason I ask is that it looks like binary data is always
> contained between two lines: "stream" and "endstream", so extracting just
> the streaming data should be pretty quick to do; although the next step
> would be going to read the bytes of what was extracted and then determine if
> it's an image or some other thing that had to be represented with a "stream"
> in the PDF...
> >>
> >>
> >> There are a couple issues that complicate this in general.
> >>
> >> The parameters needed to process the stream need to be parsed and they
> can be far away.
> >>
> >> There are many stream filters (some complicated compression) and they
> can be nested.  I looked at a corpus of PDF files and, yeah, a several are
> used in practice.
> >>
> >> However, if one needs to parse the output of a specific program or a
> specific model of a scanner, then the work to do parsing in LiveCode is a
> lot less.
> >>
> >> I hope that makes sense; I'm a little under the weather today.
> >>
> >> Dar
> >
> >
> > _______________________________________________
> > use-livecode mailing list
> > use-livecode at lists.runrev.com
> > Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> > http://lists.runrev.com/mailman/listinfo/use-livecode
>
> ---------------------------
> Dar Scott
> dba
> Dar Scott Consulting
> 8637 Horacio Place NE
> Albuquerque, NM 87111
>
> Lab, home, office phone: +1 505 299 9497
> For Skype and fax, please contact.
> dsc at swcp.com
>
> Computer Programming and tinkering,
> often making LiveCode libraries and
> externals, sometimes writing associated
> microcontroller firmware.
> ---------------------------
>
>
>
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
>



-- 



Stephen Barncard
San Francisco Ca. USA

more about sqb  <http://www.google.com/profiles/sbarncar>


More information about the use-livecode mailing list