PDF
Mark Waddingham
mark at livecode.com
Mon May 14 13:44:32 EDT 2018
On 2018-05-14 18:34, Richard Gaskin via use-livecode wrote:
> Ralph DiMola wrote:
>
>> Richard I agree. I have customers that we are nudging toward EPubs for
>> the reasons you've enumerated but when they see the amount of work
>> involved they fall back to the "Well our employees are using MSWord
>> and just PDF them.
>
> Yes, updating tooling is key for adoption.
I suspect the main issue here is not the tooling - semantic based
document description tooling has existed for as long as PDF has...
Unfortunately, most people (when writing documents) don't think
semantically - they still write as they see / want to see - which is
quite a human thing.
Compared to the cost of programing new tools, re-programming humans is
an order of magnitude more expensive ;)
> As much as I like that vision, it seems odd to me that other tools
> like office suites can output HTML, but not in a form that works with
> EPub with less effort than we'd need in LC.
I think office suites can - if the people writing the documents use
appropriate semantic styling and document structure - rather than just
using 'indent', bold and font styles - see my above comment ;)
> Even just copying from a PDF can yield wildly unpredictable results
> when pasted into any other app. It's a rare day when I can copy
> content from a PDF into an email and not have to remove a mystifyingly
> large number of spaces and other characters not at all visible in the
> rendered PDF within its specialized viewer app.
Heh - yes that is a common gripe - I cannot deny...
However, that's not a problem with PDF per-se - it is more a problem
with the tools which produce the PDFs, and the tools which extract the
text. Neither is an easy problem to get right (the latter is harder than
the former) - but it can be done.
After all if a human can work out the ordering / flow of text on a page
by looking at it, then a computer should be able to - there is no
'magic' there, just a little bit of global page analysis which humans
are better at than computers (well better at than the developers who are
encoding the required global page analysis into algorithms a computer
can use). (OCR software has been doing this for years much better than
most PDF readers do today).
Remember - a PDF is nothing more than a description of marks on a page -
a bit like a list of instructions you would give a human to reproduce
it.
Warmest Regards,
Mark.
--
Mark Waddingham ~ mark at livecode.com ~ http://www.livecode.com/
LiveCode: Everyone can create apps
More information about the use-livecode
mailing list