PDF

Mark Waddingham mark at livecode.com
Tue May 15 04:21:22 EDT 2018


On 2018-05-14 20:50, Richard Gaskin via use-livecode wrote:
> They are indeed for very different purposes, and we've been using PDF
> for so long that it's become the hammer that makes everything look
> like a nail, applied to so much while it's only truly best for a much
> smaller subset.

Of course the subtle detail here is the use of 'best' - 'best' relative 
to what requirements?

PDF is a very general format - it models the notion of printed matter 
which we still grow up with (although, admittedly, increasingly less as 
time goes by). This suggests the problem is not with PDF, nor with PDF 
being used - it is with the ties humans have to 'printed matter'.

> In the course of my work I often go through periods of research, which
> inevitably has me reading a lot of academic research papers and
> corporate white papers.  Nearly all of them are published as PDF, many
> exclusively in that format.

Two things to point out here:

Academic research papers are generally written using the format which 
the journal publishers require - typically LaTeX/TeX for anything beyond 
the normal written word and embedded figures.

Corporate white papers are usually written using Word Processors, in a 
format / layout defined by the company they are coming from. In many 
cases they will also go through some sort of 'design' phase afterwards, 
particularly if they are to be published widely - and often that will be 
using some page layout tool (such as InDesign).

In both these cases, the author/designer is designing at a fixed width 
(the joy of the rise of WYSIWIG in the 80's / 90's perhaps?)

> The circumstances in which I'm immersed in such focus vary, and the
> devices I have with me vary as well.  With reflowing content it
> doesn't matter which device I happen to be using at the time, the work
> continues unabated.
> 
> But when I encounter a PDF while using screen less than 8.5" wide, the
> need to constantly zoom in and out and scroll back and forth so slows
> progress that it kills the joy of research, bringing the work to a
> halt until I can get to a device that happens to emulate size
> characteristics of paper, even though I'll never print anything I'm
> reading.
> 
> Curious if I'm alone with the time I spend on smaller screens led me
> to research that as well.  And it turns out I'm far from alone; it's
> where people are spending most of their computing time these days.
> And since this trend is driven largely by people younger than me it
> seems unlikely to slow down, at least until the next displacing form
> factor comes along (but then we'll be doing something entirely
> different still).

Right so the problem is nothing to do with PDF, it is to do with the 
fact that humans work better designing things at fixed width and the 
general tools which people learn to use, and continue to use support 
this frame of mind.

If a document is any more than 'just text' (as in something which can be 
rendered using a single font independent of page width) then requiring 
documents to work at any layout width means the author has to abstract 
and then instruct a tool to preserve that.

Certainly for many individual cases of 'document type' you can mechanize 
and assist; however, then the authors need to be aware of precisely what 
document type they are producing, and learn how to instruct a tool to 
encode content for that document type.

I'd like to be optimistic here, but I honestly don't think this is a 
problem with tooling - semantic representation of content has been 
around for as long as I have (probably longer), I was playing with 
systems which offered it when I was in my teens; and yet in my entire 
life since then I still see the majority of documents produced using 
word processors, or similar 'unconstrained' tools.

The problem I think is that humans don't like to be constrained when 
writing - any tool which appears to constrain what they can do in what 
they think (at the time) is an unreasonable way tend to be considered to 
be 'bad'. However, to achieve the goal of representing content in a 
contextual manner (relative to some abstract pattern which can be 
processed in the ways necessary to free us of fixed width layout, in 
this case) constraints are absolutely necessary.

Admittedly the rise of the web, and particularly HTML/CSS means we have 
an ever increasing body of practitioners who do have to think about the 
patterns of content, rather than just the content, but the knowledge 
they have and are able to apply has been hard won and learned by them 
(just like any other domain specific endeavour).

> Different tools for different jobs indeed.  Not everything is a nail,
> but the combination of technological inertia combined with an an
> acceptance among the majority of people who are not inventors of
> making due with whatever tool is handed to them, we keep using hammers
> to drive screws.

Ideally all content would be represented at a semantic level requisite 
to its context.

e.g. Why use anything other than ASCII text, if your text can be 
entirely represented using ASCII?

>> ... in exactly the same way as the author intended.
> 
> This is the only part of what you wrote I disagree with, if we were to
> try it on as a general rule.
> 
> Writing is the flow of ideas from one mind to another, encoded in
> streams of text.
> 
> Line breaks are often a meaningful part that communication, and on
> occasion page breaks as well.
> 
> But for most writing, aside from perhaps code and poetry, column width
> is rarely a semantic consideration at all.  Even printed books come in
> different sizes.

By general do you mean either:

   - for a 'high' percentage of cases

   - for all cases

I'm guessing you meant the former - I was talking about the latter.

The point is that there is no general rule - I can guarantee for every 
constraint which you add to a system for representation of content, 
there will be numerous (entire families in fact) of existing examples 
which cannot fit into it. Similarly, what you will find is that if a 
system is required to be used, then people will find a way to 'work 
around' the constraints - leaving you back where you started - i.e. your 
system will work exceptionally well for things written precisely to work 
with it; but poorly for the rest, and over time the poor cases will 
start to become a noticeable percentage of content.

As people who write software, we have the ability to create abstract 
representations of content but the problem is mapping the concrete form 
to the abstract - particularly when we live in a world where concrete 
forms abound in their billions, and entire workflows are centered around 
it. Any system which can't deal with the concrete or interoperate with 
it is unlikely to ever gain a huge amount of traction.

 From that point of view, I do think ePub is a bit of a 'red herring' 
here - it isn't really anything 'more' than a container format, with a 
reasonable way to encode indicies/document structure. Internally it uses 
the web technologies, which are good for reflowing text, certainly, but 
you still need to generate the HTML/CSS etc. and it is the mapping from 
'what I want to say' to 'how do I encode it in a way which works in all 
the ways other people want it to' which is the hard part.

I'm sure things like ePub will help a bit - at least it is trying to 
instigate some bounds on communication of such things - however, I do 
strongly suspect it will become a technical detail which is largely 
irrelevant at some point though.

After all, what the world perhaps needs (rather than another file 
format) is a way to take the existing forms of how we communicate and 
turn them into a form which is more amenable to modern usage patterns 
mechanically. (i.e. A system which turns a PDF into a re-flowable 
document).

Warmest Regards,

Mark.

-- 
Mark Waddingham ~ mark at livecode.com ~ http://www.livecode.com/
LiveCode: Everyone can create apps




More information about the Use-livecode mailing list