PDF
Richard Gaskin
ambassador at fourthworld.com
Sun May 13 12:20:22 EDT 2018
R.H. wrote:
> To extract text from a PDF document, I am using a command line tool on
> Windows which is available also for Linux based systems called Xpdf.
>
> It was working well, using shell() on LiveCode Community 8x, but
> tested only in the IDE on Windows.
A good tool. Thanks.
> I needed this since some people had sent huge lists of numerical data
> in PDF which were impossible to extract, and the manual method could
> taken weeks.
Given PDF's role as a delivery vehicle, it's most commonly an extra step
added to the end of a publishing process.
Have you asked that data provider if they have the data available in the
format it was in before they went to that extra final step to convert it
to PDF?
> Nevertheless, I can not see that PDF will lose ground as the standard
> for many years to come. There are possibly billions of documents in
> PDF around?
Postel's Law is worth quoting here:
"Be liberal in what you accept,
and conservative in what you send."
The need to *read* PDFs will remain for a very long time. Adobe's power
and influence have made the cumbersome, inflexible, and expensively
complex format almost ubiquitous during the advent of the PC era,
leaving a vast collection of legacy documents that will continue to
encumber consumers and developers alike for at least a decade to come,
likely longer.
A great many households still have a VHS player. Old formats take a
long time to die, and never completely go away.
But that's for reading.
Choosing what our apps output offers us an opportunity to consider
modern workflows in a world where the majority of time spent with
computing devices is on screens too small to read PDFs comfortably.
Computing has taken us to a place where device size is varied and
usability is often a far more significant product differentiator than
algorithms.
It seems useful to encourage developers looking to distinguish their
apps for modern audiences to consider output formats that integrate well
across the full mix of devices we use.
EPub won't likely be a de-facto requirement for years. But market
differentiation isn't about waiting to play catch-up.
> What should replace it? And people are still printing.
EPub is printable for the ever-smaller number of documents requiring
tree death just to be read.
EPub uses HTML, tucked inside a common Zip container like so many other
formats (docx, xlxs, odoc, GarageBand, APKs, etc.). The developer
expense in dealing with the format is a small fraction of what's
required for dealing with PDF.
Nearly everything you can do in a browser can be done in EPub. Indeed,
we're beginning to see EPub reader extensions for browsers, and I
suspect it won't be long until we see native EPub support directly in
most popular browsers.
Like PDF, EPub files are normally readable by all, and like PDF EPubs
can be password-protected when DRM is needed.
But unlike PDF, EPub inherits HTML's ability to reflow content for
dynamic page rendering.
Many millennials don't have a laptop or desktop computer at all.
Mobile-exclusive workflows are common across all age groups throughout
much of the world. And the over-40 crowd everywhere appreciates the ease
with which text can be dynamically resized so they don't need to reach
for their reading glasses quite so often.
Given the breadth of HTML tools and experience available, along with the
common acceptance of Zip as a wrapper delivering format for multi-part
documents, EPub seems well placed to serve modern multi-device needs
with far less expense for developers and a much better experience for
end users.
So sure, we'll be *reading* PDFs even longer than VHS players will
continue taking up space in our livings rooms.
But if you're looking to distinguish your app from lackluster
competitors, adding EPub support for both reading and writing is worth
considering.
--
Richard Gaskin
Fourth World Systems
Software Design and Development for the Desktop, Mobile, and the Web
____________________________________________________________________
Ambassador at FourthWorld.com http://www.FourthWorld.com
More information about the use-livecode
mailing list