PDF

Mark Waddingham mark at livecode.com
Mon May 14 13:44:32 EDT 2018


On 2018-05-14 18:34, Richard Gaskin via use-livecode wrote:
> Ralph DiMola wrote:
> 
>> Richard I agree. I have customers that we are nudging toward EPubs for
>> the reasons you've enumerated but when they see the amount of work
>> involved they fall back to the "Well our employees are using MSWord
>> and just PDF them.
> 
> Yes, updating tooling is key for adoption.

I suspect the main issue here is not the tooling - semantic based 
document description tooling has existed for as long as PDF has... 
Unfortunately, most people (when writing documents) don't think 
semantically - they still write as they see / want to see - which is 
quite a human thing.

Compared to the cost of programing new tools, re-programming humans is 
an order of magnitude more expensive ;)

> As much as I like that vision, it seems odd to me that other tools
> like office suites can output HTML, but not in a form that works with
> EPub with less effort than we'd need in LC.

I think office suites can - if the people writing the documents use 
appropriate semantic styling and document structure - rather than just 
using 'indent', bold and font styles - see my above comment ;)

> Even just copying from a PDF can yield wildly unpredictable results
> when pasted into any other app.  It's a rare day when I can copy
> content from a PDF into an email and not have to remove a mystifyingly
> large number of spaces and other characters not at all visible in the
> rendered PDF within its specialized viewer app.

Heh - yes that is a common gripe - I cannot deny...

However, that's not a problem with PDF per-se - it is more a problem 
with the tools which produce the PDFs, and the tools which extract the 
text. Neither is an easy problem to get right (the latter is harder than 
the former) - but it can be done.

After all if a human can work out the ordering / flow of text on a page 
by looking at it, then a computer should be able to - there is no 
'magic' there, just a little bit of global page analysis which humans 
are better at than computers (well better at than the developers who are 
encoding the required global page analysis into algorithms a computer 
can use). (OCR software has been doing this for years much better than 
most PDF readers do today).

Remember - a PDF is nothing more than a description of marks on a page - 
a bit like a list of instructions you would give a human to reproduce 
it.

Warmest Regards,

Mark.

-- 
Mark Waddingham ~ mark at livecode.com ~ http://www.livecode.com/
LiveCode: Everyone can create apps




More information about the use-livecode mailing list