Richard Gaskin ambassador at fourthworld.com
Sun May 13 12:20:22 EDT 2018

R.H. wrote:

 > To extract text from a PDF document, I am using a command line tool on
 > Windows which is available also for Linux based systems called Xpdf.
 > It was working well, using shell() on LiveCode Community 8x, but
 > tested only in the IDE on Windows.

A good tool.  Thanks.

 > I needed this since some people had sent huge lists of numerical data
 > in PDF which were impossible to extract, and the manual method could
 > taken weeks.

Given PDF's role as a delivery vehicle, it's most commonly an extra step 
added to the end of a publishing process.

Have you asked that data provider if they have the data available in the 
format it was in before they went to that extra final step to convert it 
to PDF?

 > Nevertheless, I can not see that PDF will lose ground as the standard
 > for many years to come. There are possibly billions of documents in
 > PDF around?

Postel's Law is worth quoting here:

        "Be liberal in what you accept,
         and conservative in what you send."

The need to *read* PDFs will remain for a very long time.  Adobe's power 
and influence have made the cumbersome, inflexible, and expensively 
complex format almost ubiquitous during the advent of the PC era, 
leaving a vast collection of legacy documents that will continue to 
encumber consumers and developers alike for at least a decade to come, 
likely longer.

A great many households still have a VHS player.  Old formats take a 
long time to die, and never completely go away.

But that's for reading.

Choosing what our apps output offers us an opportunity to consider 
modern workflows in a world where the majority of time spent with 
computing devices is on screens too small to read PDFs comfortably.

Computing has taken us to a place where device size is varied and 
usability is often a far more significant product differentiator than 

It seems useful to encourage developers looking to distinguish their 
apps for modern audiences to consider output formats that integrate well 
across the full mix of devices we use.

EPub won't likely be a de-facto requirement for years.  But market 
differentiation isn't about waiting to play catch-up.

 > What should replace it? And people are still printing.

EPub is printable for the ever-smaller number of documents requiring 
tree death just to be read.

EPub uses HTML, tucked inside a common Zip container like so many other 
formats (docx, xlxs, odoc, GarageBand, APKs, etc.).  The developer 
expense in dealing with the format is a small fraction of what's 
required for dealing with PDF.

Nearly everything you can do in a browser can be done in EPub.  Indeed, 
we're beginning to see EPub reader extensions for browsers, and I 
suspect it won't be long until we see native EPub support directly in 
most popular browsers.

Like PDF, EPub files are normally readable by all, and like PDF EPubs 
can be password-protected when DRM is needed.

But unlike PDF, EPub inherits HTML's ability to reflow content for 
dynamic page rendering.

Many millennials don't have a laptop or desktop computer at all. 
Mobile-exclusive workflows are common across all age groups throughout 
much of the world. And the over-40 crowd everywhere appreciates the ease 
with which text can be dynamically resized so they don't need to reach 
for their reading glasses quite so often.

Given the breadth of HTML tools and experience available, along with the 
common acceptance of Zip as a wrapper delivering format for multi-part 
documents, EPub seems well placed to serve modern multi-device needs 
with far less expense for developers and a much better experience for 
end users.

So sure, we'll be *reading* PDFs even longer than VHS players will 
continue taking up space in our livings rooms.

But if you're looking to distinguish your app from lackluster 
competitors, adding EPub support for both reading and writing is worth 

  Richard Gaskin
  Fourth World Systems
  Software Design and Development for the Desktop, Mobile, and the Web
  Ambassador at FourthWorld.com                http://www.FourthWorld.com

More information about the use-livecode mailing list