mark at livecode.com
Tue May 15 04:21:22 EDT 2018
On 2018-05-14 20:50, Richard Gaskin via use-livecode wrote:
> They are indeed for very different purposes, and we've been using PDF
> for so long that it's become the hammer that makes everything look
> like a nail, applied to so much while it's only truly best for a much
> smaller subset.
Of course the subtle detail here is the use of 'best' - 'best' relative
to what requirements?
PDF is a very general format - it models the notion of printed matter
which we still grow up with (although, admittedly, increasingly less as
time goes by). This suggests the problem is not with PDF, nor with PDF
being used - it is with the ties humans have to 'printed matter'.
> In the course of my work I often go through periods of research, which
> inevitably has me reading a lot of academic research papers and
> corporate white papers. Nearly all of them are published as PDF, many
> exclusively in that format.
Two things to point out here:
Academic research papers are generally written using the format which
the journal publishers require - typically LaTeX/TeX for anything beyond
the normal written word and embedded figures.
Corporate white papers are usually written using Word Processors, in a
format / layout defined by the company they are coming from. In many
cases they will also go through some sort of 'design' phase afterwards,
particularly if they are to be published widely - and often that will be
using some page layout tool (such as InDesign).
In both these cases, the author/designer is designing at a fixed width
(the joy of the rise of WYSIWIG in the 80's / 90's perhaps?)
> The circumstances in which I'm immersed in such focus vary, and the
> devices I have with me vary as well. With reflowing content it
> doesn't matter which device I happen to be using at the time, the work
> continues unabated.
> But when I encounter a PDF while using screen less than 8.5" wide, the
> need to constantly zoom in and out and scroll back and forth so slows
> progress that it kills the joy of research, bringing the work to a
> halt until I can get to a device that happens to emulate size
> characteristics of paper, even though I'll never print anything I'm
> Curious if I'm alone with the time I spend on smaller screens led me
> to research that as well. And it turns out I'm far from alone; it's
> where people are spending most of their computing time these days.
> And since this trend is driven largely by people younger than me it
> seems unlikely to slow down, at least until the next displacing form
> factor comes along (but then we'll be doing something entirely
> different still).
Right so the problem is nothing to do with PDF, it is to do with the
fact that humans work better designing things at fixed width and the
general tools which people learn to use, and continue to use support
this frame of mind.
If a document is any more than 'just text' (as in something which can be
rendered using a single font independent of page width) then requiring
documents to work at any layout width means the author has to abstract
and then instruct a tool to preserve that.
Certainly for many individual cases of 'document type' you can mechanize
and assist; however, then the authors need to be aware of precisely what
document type they are producing, and learn how to instruct a tool to
encode content for that document type.
I'd like to be optimistic here, but I honestly don't think this is a
problem with tooling - semantic representation of content has been
around for as long as I have (probably longer), I was playing with
systems which offered it when I was in my teens; and yet in my entire
life since then I still see the majority of documents produced using
word processors, or similar 'unconstrained' tools.
The problem I think is that humans don't like to be constrained when
writing - any tool which appears to constrain what they can do in what
they think (at the time) is an unreasonable way tend to be considered to
be 'bad'. However, to achieve the goal of representing content in a
contextual manner (relative to some abstract pattern which can be
processed in the ways necessary to free us of fixed width layout, in
this case) constraints are absolutely necessary.
Admittedly the rise of the web, and particularly HTML/CSS means we have
an ever increasing body of practitioners who do have to think about the
patterns of content, rather than just the content, but the knowledge
they have and are able to apply has been hard won and learned by them
(just like any other domain specific endeavour).
> Different tools for different jobs indeed. Not everything is a nail,
> but the combination of technological inertia combined with an an
> acceptance among the majority of people who are not inventors of
> making due with whatever tool is handed to them, we keep using hammers
> to drive screws.
Ideally all content would be represented at a semantic level requisite
to its context.
e.g. Why use anything other than ASCII text, if your text can be
entirely represented using ASCII?
>> ... in exactly the same way as the author intended.
> This is the only part of what you wrote I disagree with, if we were to
> try it on as a general rule.
> Writing is the flow of ideas from one mind to another, encoded in
> streams of text.
> Line breaks are often a meaningful part that communication, and on
> occasion page breaks as well.
> But for most writing, aside from perhaps code and poetry, column width
> is rarely a semantic consideration at all. Even printed books come in
> different sizes.
By general do you mean either:
- for a 'high' percentage of cases
- for all cases
I'm guessing you meant the former - I was talking about the latter.
The point is that there is no general rule - I can guarantee for every
constraint which you add to a system for representation of content,
there will be numerous (entire families in fact) of existing examples
which cannot fit into it. Similarly, what you will find is that if a
system is required to be used, then people will find a way to 'work
around' the constraints - leaving you back where you started - i.e. your
system will work exceptionally well for things written precisely to work
with it; but poorly for the rest, and over time the poor cases will
start to become a noticeable percentage of content.
As people who write software, we have the ability to create abstract
representations of content but the problem is mapping the concrete form
to the abstract - particularly when we live in a world where concrete
forms abound in their billions, and entire workflows are centered around
it. Any system which can't deal with the concrete or interoperate with
it is unlikely to ever gain a huge amount of traction.
From that point of view, I do think ePub is a bit of a 'red herring'
here - it isn't really anything 'more' than a container format, with a
reasonable way to encode indicies/document structure. Internally it uses
the web technologies, which are good for reflowing text, certainly, but
you still need to generate the HTML/CSS etc. and it is the mapping from
'what I want to say' to 'how do I encode it in a way which works in all
the ways other people want it to' which is the hard part.
I'm sure things like ePub will help a bit - at least it is trying to
instigate some bounds on communication of such things - however, I do
strongly suspect it will become a technical detail which is largely
irrelevant at some point though.
After all, what the world perhaps needs (rather than another file
format) is a way to take the existing forms of how we communicate and
turn them into a form which is more amenable to modern usage patterns
mechanically. (i.e. A system which turns a PDF into a re-flowable
Mark Waddingham ~ mark at livecode.com ~ http://www.livecode.com/
LiveCode: Everyone can create apps
More information about the Use-livecode