bonnmike at gmail.com
Sat May 12 16:16:47 EDT 2018
Thanks Richard. This helps cut my search down considerably. I had already
set up an ubuntu vm on my unraid server, so I should be able to get
something going. Much appreciated.
On Sat, May 12, 2018 at 2:08 PM, Richard Gaskin via use-livecode <
use-livecode at lists.runrev.com> wrote:
> Mike Bonner wrote:
> > I haven't needed to do this before, but is there a (relatively) easy
> > way to extract the text from a bunch of pdf files? I'm hoping I can
> > build some indexes for the boatload of files I want to go through
> > (THough, I guess I could bipass LC and just grep my heart out)
> > Any suggestions?
> Long term:
> Per Postel's Law, reduce the stockpile of PDFs littering humanity's
> infosphere by generating none except in the increasingly rare cases where
> no other format is a better choice.
> PDF is an archaic format held over from the days when nearly all display
> devices had screens at least as wide as a printed page. Back in the '90s,
> when it was popularized, a fixed-size format emulating a printed piece of
> paper was not an unreasonable thing to do.
> But times have changed. We rarely kill trees just to read anymore, so the
> bounds of a printed page are approaching meaninglessness.
> This becomes critically important for delivering an enjoyable reading
> experience when we consider that an ever-smaller minority of our time is
> spent on screens large enough to accommodate that size.
> Many of our screens are much smaller, and moreover they vary enough to
> make any single fixed size needlessly cumbersome.
> Attempting to read PDFs on a phone ranges from mildly annoying to
> prohibitively frustrating.
> That unnecessary pain is easily replaced these days with modern formats
> that reflow text to fit any of the many devices we might be using at any
> given moment.
> There's a good argument for using EPub as that foundation.
> But that's a long-term solution, and while I believe it's an inevitability
> as mobile use continues to grow it won't solve your need in the
> here-and-now., so:
> Short term:
> The Linux universe has many good command-line solutions available for
> extracting text from PDFs easily and efficiently, like this one:
> For those Win10 Pro users who can be convinced the tick a checkbox, the
> entire universe of the Ubuntu shell is now available.
> macOS also includes utilities for this, but I don't believe the same ones
> (at least not without installing an independent package manager like
> Richard Gaskin
> Fourth World Systems
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
More information about the use-livecode