PDF

Mike Bonner bonnmike at gmail.com
Sat May 12 16:16:47 EDT 2018


Thanks Richard.  This helps cut my search down considerably.  I had already
set up an ubuntu vm on my unraid server, so I should be able to get
something going. Much appreciated.

On Sat, May 12, 2018 at 2:08 PM, Richard Gaskin via use-livecode <
use-livecode at lists.runrev.com> wrote:

> Mike Bonner wrote:
>
> > I haven't needed to do this before, but is there a (relatively) easy
> > way to extract the text from a bunch of pdf files?  I'm hoping I can
> > build some indexes for the boatload of files I want to go through
> > (THough, I guess I could bipass LC and just grep my heart out)
> >
> > Any suggestions?
>
> Long term:
>
> Per Postel's Law, reduce the stockpile of PDFs littering humanity's
> infosphere by generating none except in the increasingly rare cases where
> no other format is a better choice.
>
> PDF is an archaic format held over from the days when nearly all display
> devices had screens at least as wide as a printed page.  Back in the '90s,
> when it was popularized, a fixed-size format emulating a printed piece of
> paper was not an unreasonable thing to do.
>
> But times have changed.  We rarely kill trees just to read anymore, so the
> bounds of a printed page are approaching meaninglessness.
>
> This becomes critically important for delivering an enjoyable reading
> experience when we consider that an ever-smaller minority of our time is
> spent on screens large enough to accommodate that size.
>
> Many of our screens are much smaller, and moreover they vary enough to
> make any single fixed size needlessly cumbersome.
>
> Attempting to read PDFs on a phone ranges from mildly annoying to
> prohibitively frustrating.
>
> That unnecessary pain is easily replaced these days with modern formats
> that reflow text to fit any of the many devices we might be using at any
> given moment.
>
> There's a good argument for using EPub as that foundation.
>
> But that's a long-term solution, and while I believe it's an inevitability
> as mobile use continues to grow it won't solve your need in the
> here-and-now., so:
>
>
> Short term:
>
> The Linux universe has many good command-line solutions available for
> extracting text from PDFs easily and efficiently, like this one:
> https://www.howtogeek.com/228531/how-to-convert-a-pdf-file-
> to-editable-text-using-the-command-line-in-linux/
>
> For those Win10 Pro users who can be convinced the tick a checkbox, the
> entire universe of the Ubuntu shell is now available.
>
> macOS also includes utilities for this, but I don't believe the same ones
> (at least not without installing an independent package manager like
> Homebrew.
>
> --
>  Richard Gaskin
>  Fourth World Systems
>
>
>
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
>



More information about the use-livecode mailing list