"ouch: the beginning of the end"

Mark Waddingham mark at livecode.com
Wed Mar 8 06:59:53 EST 2017


Hi Dr Hawkins,

I've been away on holiday for just over a week, and this thread has got
quite long, so I thought it easier to answer the original post rather
than some off shoot on it.

On 2017-03-03 00:13, Dr. Hawkins via use-livecode wrote:
> I just got off the phone with the court clerk in Reno, and received the
> beginning of the end . . .I figured it would come from some state or 
> anther
> in a year or two, but they are requiring me to use the *exact* pdf as
> propagated by the court.

Having read the entire thread, my understanding of your problem is as 
follows
(please correct if I am wrong):

----

You have PDF forms which are downloadable from a government department. 
They
are intended for filling printing and then filling in - i.e. they do not 
use
editable PDF forms (FPDF?).

The government department for whatever reason requires that the forms 
are used
exactly as is with the user filling in the relevant spaces within them 
and then
submitting.

There is some claim by said department that 'at some point' they will 
get
scanners which will be able to tell whether the original forms were used 
or not
thus you are not allowed to recreate the non-user parts of the form.

----

Reading between the lines the latter requirements of the department are 
not
unreasonable - I suspect they would like to automate their processes as 
much
as possible and as such would like to be able to have a computer via OCR 
or
whatever suck out the appropriate parts of forms at some point to remove 
a
human from the equation.

Given that there is an obvious 'printing' element involved in this at 
present
pixel-perfection is not exactly what they are looking for (unless they 
are
imagining they live in a world where all printers are capable of 
absolutely
perfect registration - some skew / offset is always going to be present) 
just
that whatever software they might use in the future to automate can 
locate
the user written parts to suck out - therefore it is reasonable for them 
to
require that the non-user sections are relatively laid out and look 
precisely
the same as if you printed the original PDF.

I'll run on these above assumptions for now.

----

First of all let me just point out that EPS is definitely *not* what you 
want.

EPS is just a PostScript program with appropriate comments describing an
(optional) pre-rendered thumbnail, and other print related metadata so 
it
can be embedded in another document. Rendering EPS properly requires a 
full
PostScript interpreter - many programs which 'support EPS' actually only 
support
rendering the thumbnail and then only printing on a PostScript printer.

Indeed, there is a good reason why no non-GPL full open-source 
PostScript
interpreter exists (as far as I'm aware at least) - they are complex 
pieces
of software which have a high degree of commercial value.

Whilst Linux and Mac users might be used to transparent PostScript 
support this
is only because GhostScript is installed as an innate part of the 
printing tool
chain on those platforms - thus this is an innate part of the 'system' 
and as
such you can write non-GPL applications which use it as you don't need 
to distribute
it with your app. On all other platforms, however, you are looking at 
having to
distribute a PS interpreter with your app - and at that point you are 
hit by the
GPL (in particular, in your case, it would classify as an 'innate' 
requirement
of your application and non-optional and thus virality would kick in).

So, if you want a PostScript interpreter in your app you are going to 
have to
pay $$$$$ to license such a thing. (Including such a thing in LiveCode 
would
require license fees or development costs way above what most people 
would want
to pay for a feature they would probably rarely if ever use and as such 
it is
unreasonable to expect LiveCode to support such things cross-platform as 
part of
the standard license fee - event at the Business license level).

One of the main reasons that Adobe created PDF was to avoid needing a 
PostScript
interpreter to accurately create 'archival' type quality representations 
of printable
documents and to provide a much easier way to edit / amend and modify 
such documents.
As PDF is just a data structure the latter can be done with processing a 
generated
PDF. As EPS/PS are actually a program all bets are off for editing - the 
program
does what it is written to, and you can write it in any way you want. If 
you want to
'edit' it, you need to edit the program.

However....

PDF is also a large complicated format whose reading, writing and 
rasterisation
has huge commercial value.

Up until Google bought and open-sourced *part* of FoxIT so they could 
include a
full and complete cross-platform PDF renderer in Chrome (in the form of 
PDFium)
there was no non-GPL open-source full and complete PDF renderer 
available in
the open-source world that I know of.

As far as I'm aware all such open-source libraries for PDF rasterisation 
and
manipulation which existed up until that point where GPL and all of them 
offer
commercial licensing terms. The costs of which are substantial - again, 
well
outside the cost of what you could reasonably expect to get 'built in' 
to the
LiveCode license at any level.

Of course, when you look into what Google did you find out that whilst 
PDFium
is FoxIT - it is only a *subset* of FoxIT. Google only licensed the 
rasterisation
part - PDFium does not contain any of the public APIs which allow 
editing, merging,
modification and re-export of PDFs.

Again, you can understand why - the latter part of PDF manipulation has 
perhaps
the greatest part of the commercial value and since Google only wanted 
rasterisation
that was all they were going to pay for.

----

So, just to reiterate, the expectation that LiveCode should contain a 
full PS/EPS/PDF
rendering, manipulation and 'do whatever I want' type thing in it on all 
platforms is
somewhat beyond the current price of the license fee. Or should I say, 
far beyond what
anyone one person/organisation who does not need such functionality 
(which are most people)
would be willing to pay.

(I should point out here that I know what is involved in writing both a 
PostScript
interpreter, and PDF renderer as I have written a partial implementation 
of both in the
dim and distant past - for RiscOS in the early 1990's... Back when PS 
was still mostly
Level 2, and the PDF spec weighed in at around 150 pages... PostScript 
is now universally
at Level 3, and the PDF spec weighs in at 700+ pages - thus I do not 
begrudge
the commercialization of such libraries at all as they are large hefty 
pieces of work which
have to deal with inputs which may or may not completely conform to 
specification).

Anyway, bemoaning about the costs of developing and supporting such 
things aside back
to your actual problem...

First of all on some platforms what you want to do is actually not all 
that hard at all.

Mac and iOS both include full built-in PDF rendering and emission 
support. CoreGraphics
can both load and render PDF directly *and* also render and save PDF 
directly which means
that it is relatively straightforward (with a bit of LiveCode Builder or 
C++) to do what
you want - i.e. render an original page of a PDF then render some text 
on top. However,
it is important to point out that this approach will not result in the 
PDF necessarily
being original PDF + extra bits since you are re-rendering the PDF 
(although I don't
think this is a problem in your case as it sounds like there is an 
implicit may go through
an actual scanner in the government departments process).

Similarly, Linux always includes a postscript interpreter in its default 
install if you
install printing support. PDF can be rendered in PostScript by using an 
appropriate
header PostScript program (which converts the PDF data structure into a 
PostScript
program - in fact the main rendering bits in PDF are actually PostScript 
programs
just with a very fixed set of well defined operators which you can 
define in a PS
environment). Thus on this platform you could emit the necessary header, 
the PDF
and then the additions you require as PostScript programs.

Where you run into difficulty is on Windows and Android. Neither of 
these platforms
include either publicly accessible PDF nor PS support (although it 
appears Windows
10 might have a built in PDF Printer at least...).

----

So what options are there?

- Option 1 - bi-level background images

Here I'm assuming that your original PDFs do not change that often and 
(given the
requirements you have found out from the government department involved) 
the forms
must be used as is. Thus, I presume any 'recurring sections' would need 
to be
rendered on repeated images of the appropriate page rather than cutting 
up the
original forms into pieces and just replicating those parts.

In this case, then pre-rendering all the pages as high-resolution 
black-and-white
1bpp bitmaps and then rendering those underneath the LiveCode fields is 
probably not
that bad an option. Given that the average printer people will be using 
will probably
only have a true black-and-white resolution of 300-600dpi and most 
printed forms are
only about 5% black pixels you will get immensely high compression 
ratios. The only
slight snafu here right now is that PDF printing support in LC does not 
yet exist
for Android, and would need a small patch to pass PNG data straight 
through to the
PDF (at present it only does this for JPEG). [ The reason PDF printing 
is not currently
supported on Android is due to text rendering which is not a 
straightforward thing in
PDF nor PostScript; the reason only JPEG image data is currently 
supported is that
when the pass-through was implemented the library we use to do PDF 
printing - cairo -
only supported it for JPEG, I *think* it does support certain PNG 
formats now though
since we updated the library for other reasons a while back ].

- Option 2 - augment the original PDF

PDF documents can be augmented after creation - the data structure is 
designed to
allow revisions which overlay the original document. Thus it should be 
possible to
generate modifications to the original PDF and append them to it.

The difficulty here is that it would require some intimate knowledge of 
the PDF
document structure (although far less than what would be required to 
generate one
from scratch). Basically, you provide modified page objects for each 
page and a
modified 'page tree' which first contains all the original things on the 
page
and then adds text objects (which is not too bad to generate if you just 
want ASCII
characters in one of the built in fonts such as Helvetica) in the places 
you need.

Such a process could be implemented in LiveCode Script and would be 
completely
independent of platform. Also, it would preserve the original PDF 
entirely (no
round-tripping through a PDF rasterizer) as you would only be adding to 
what
was already there.

How much work would be involved in writing said script, however, is 
another matter.

- Option 3 - wait until LiveCode can render PDFs directly as an object 
on a card

This is obviously what you had hoped you could do and whilst not 
entirely
unreasonable, I hope you can appreciate from the above why you currently 
cannot -
particular on all platforms.

PDFium does at least give us a starting point - however it isn't the 
easiest of libraries
to build or maintain building of and there's still a fair bit of work we 
need to do to
allow it to function cross-platform (not least the building of it for 
all platforms!).

Also, lamentably, that is only one side of the story - you also need to 
generate PDFs,
which means some library to output PDF is needed which is happy to bind 
to PDFium's
rasterisation implementation. This is certainly not something which is 
exposed in the
public APIs of PDFium, and would probably require bespoke customisation 
of PDFium to
achieve.

- Option 4 - focus on Mac/iOS and do other platforms later

As mentioned above, both Mac and iOS include PDF rendering and emission 
as part of
CoreGraphics - they also include relatively straightforward APIs for 
drawing typeset
text. The process here would be:

   1) Create a CG PDF output context
   2) Load your original PDF as a CG PDF object
   3) For each page:
      i) Render the original page into the PDF output context
      ii) Render the text into the appropriate places on the page
   4) Finalize the output context to generate a PDF

I recently did some work for a business services request which needed to 
render
portions of a PDF to a new PDF on Mac - and it turned out to be around 
50 lines of
C to do that. Rendering the text you would need through CoreText would 
be a little
more than that, but nothing too onerous.

----

So anyway, sorry to be the bearer of perhaps not entirely great news, 
however what
you want to do is certainly possible - but like most things will require 
some leg-work
and a little bit of patience and/or some financial investment.

I do strongly suggest you contact business services 
(https://livecode.com/services/)
about what you need here. It is important to understand that whilst we 
would like to
do everything, we do need a way to prioritise what we focus on. Whilst 
PDF rendering
and output features are (obviously) quite widely useful for lots of 
things they are also
substantial and large features to develop and maintain (if they weren't 
we would be
surrounded by lots of open-source non-GPL implementations to choose from 
and base them
on) thus progress on them generally in terms of additions to the core 
product are likely
to be slow. However you do have a very specific use-case with well 
defined inputs and
outputs so we may be able to help you for far less then it would cost 
you to commercially
license the relevant cross-platform libraries you need and/or a platform 
which provides
the functionality out of the box. (My gut tells me that starting with 
Mac/iOS due to
their built in API support for what you want to do is probably the best 
first step to take
at least then you get a product which works as it needs to to - and like 
any venture, the
sooner you ship, the sooner you can generate revenue to reinvest and 
expand!).

Warmest Regards,

Mark.

-- 
Mark Waddingham ~ mark at livecode.com ~ http://www.livecode.com/
LiveCode: Everyone can create apps




More information about the use-livecode mailing list