"ouch: the beginning of the end"
Mark Waddingham
mark at livecode.com
Wed Mar 8 06:59:53 EST 2017
Hi Dr Hawkins,
I've been away on holiday for just over a week, and this thread has got
quite long, so I thought it easier to answer the original post rather
than some off shoot on it.
On 2017-03-03 00:13, Dr. Hawkins via use-livecode wrote:
> I just got off the phone with the court clerk in Reno, and received the
> beginning of the end . . .I figured it would come from some state or
> anther
> in a year or two, but they are requiring me to use the *exact* pdf as
> propagated by the court.
Having read the entire thread, my understanding of your problem is as
follows
(please correct if I am wrong):
----
You have PDF forms which are downloadable from a government department.
They
are intended for filling printing and then filling in - i.e. they do not
use
editable PDF forms (FPDF?).
The government department for whatever reason requires that the forms
are used
exactly as is with the user filling in the relevant spaces within them
and then
submitting.
There is some claim by said department that 'at some point' they will
get
scanners which will be able to tell whether the original forms were used
or not
thus you are not allowed to recreate the non-user parts of the form.
----
Reading between the lines the latter requirements of the department are
not
unreasonable - I suspect they would like to automate their processes as
much
as possible and as such would like to be able to have a computer via OCR
or
whatever suck out the appropriate parts of forms at some point to remove
a
human from the equation.
Given that there is an obvious 'printing' element involved in this at
present
pixel-perfection is not exactly what they are looking for (unless they
are
imagining they live in a world where all printers are capable of
absolutely
perfect registration - some skew / offset is always going to be present)
just
that whatever software they might use in the future to automate can
locate
the user written parts to suck out - therefore it is reasonable for them
to
require that the non-user sections are relatively laid out and look
precisely
the same as if you printed the original PDF.
I'll run on these above assumptions for now.
----
First of all let me just point out that EPS is definitely *not* what you
want.
EPS is just a PostScript program with appropriate comments describing an
(optional) pre-rendered thumbnail, and other print related metadata so
it
can be embedded in another document. Rendering EPS properly requires a
full
PostScript interpreter - many programs which 'support EPS' actually only
support
rendering the thumbnail and then only printing on a PostScript printer.
Indeed, there is a good reason why no non-GPL full open-source
PostScript
interpreter exists (as far as I'm aware at least) - they are complex
pieces
of software which have a high degree of commercial value.
Whilst Linux and Mac users might be used to transparent PostScript
support this
is only because GhostScript is installed as an innate part of the
printing tool
chain on those platforms - thus this is an innate part of the 'system'
and as
such you can write non-GPL applications which use it as you don't need
to distribute
it with your app. On all other platforms, however, you are looking at
having to
distribute a PS interpreter with your app - and at that point you are
hit by the
GPL (in particular, in your case, it would classify as an 'innate'
requirement
of your application and non-optional and thus virality would kick in).
So, if you want a PostScript interpreter in your app you are going to
have to
pay $$$$$ to license such a thing. (Including such a thing in LiveCode
would
require license fees or development costs way above what most people
would want
to pay for a feature they would probably rarely if ever use and as such
it is
unreasonable to expect LiveCode to support such things cross-platform as
part of
the standard license fee - event at the Business license level).
One of the main reasons that Adobe created PDF was to avoid needing a
PostScript
interpreter to accurately create 'archival' type quality representations
of printable
documents and to provide a much easier way to edit / amend and modify
such documents.
As PDF is just a data structure the latter can be done with processing a
generated
PDF. As EPS/PS are actually a program all bets are off for editing - the
program
does what it is written to, and you can write it in any way you want. If
you want to
'edit' it, you need to edit the program.
However....
PDF is also a large complicated format whose reading, writing and
rasterisation
has huge commercial value.
Up until Google bought and open-sourced *part* of FoxIT so they could
include a
full and complete cross-platform PDF renderer in Chrome (in the form of
PDFium)
there was no non-GPL open-source full and complete PDF renderer
available in
the open-source world that I know of.
As far as I'm aware all such open-source libraries for PDF rasterisation
and
manipulation which existed up until that point where GPL and all of them
offer
commercial licensing terms. The costs of which are substantial - again,
well
outside the cost of what you could reasonably expect to get 'built in'
to the
LiveCode license at any level.
Of course, when you look into what Google did you find out that whilst
PDFium
is FoxIT - it is only a *subset* of FoxIT. Google only licensed the
rasterisation
part - PDFium does not contain any of the public APIs which allow
editing, merging,
modification and re-export of PDFs.
Again, you can understand why - the latter part of PDF manipulation has
perhaps
the greatest part of the commercial value and since Google only wanted
rasterisation
that was all they were going to pay for.
----
So, just to reiterate, the expectation that LiveCode should contain a
full PS/EPS/PDF
rendering, manipulation and 'do whatever I want' type thing in it on all
platforms is
somewhat beyond the current price of the license fee. Or should I say,
far beyond what
anyone one person/organisation who does not need such functionality
(which are most people)
would be willing to pay.
(I should point out here that I know what is involved in writing both a
PostScript
interpreter, and PDF renderer as I have written a partial implementation
of both in the
dim and distant past - for RiscOS in the early 1990's... Back when PS
was still mostly
Level 2, and the PDF spec weighed in at around 150 pages... PostScript
is now universally
at Level 3, and the PDF spec weighs in at 700+ pages - thus I do not
begrudge
the commercialization of such libraries at all as they are large hefty
pieces of work which
have to deal with inputs which may or may not completely conform to
specification).
Anyway, bemoaning about the costs of developing and supporting such
things aside back
to your actual problem...
First of all on some platforms what you want to do is actually not all
that hard at all.
Mac and iOS both include full built-in PDF rendering and emission
support. CoreGraphics
can both load and render PDF directly *and* also render and save PDF
directly which means
that it is relatively straightforward (with a bit of LiveCode Builder or
C++) to do what
you want - i.e. render an original page of a PDF then render some text
on top. However,
it is important to point out that this approach will not result in the
PDF necessarily
being original PDF + extra bits since you are re-rendering the PDF
(although I don't
think this is a problem in your case as it sounds like there is an
implicit may go through
an actual scanner in the government departments process).
Similarly, Linux always includes a postscript interpreter in its default
install if you
install printing support. PDF can be rendered in PostScript by using an
appropriate
header PostScript program (which converts the PDF data structure into a
PostScript
program - in fact the main rendering bits in PDF are actually PostScript
programs
just with a very fixed set of well defined operators which you can
define in a PS
environment). Thus on this platform you could emit the necessary header,
the PDF
and then the additions you require as PostScript programs.
Where you run into difficulty is on Windows and Android. Neither of
these platforms
include either publicly accessible PDF nor PS support (although it
appears Windows
10 might have a built in PDF Printer at least...).
----
So what options are there?
- Option 1 - bi-level background images
Here I'm assuming that your original PDFs do not change that often and
(given the
requirements you have found out from the government department involved)
the forms
must be used as is. Thus, I presume any 'recurring sections' would need
to be
rendered on repeated images of the appropriate page rather than cutting
up the
original forms into pieces and just replicating those parts.
In this case, then pre-rendering all the pages as high-resolution
black-and-white
1bpp bitmaps and then rendering those underneath the LiveCode fields is
probably not
that bad an option. Given that the average printer people will be using
will probably
only have a true black-and-white resolution of 300-600dpi and most
printed forms are
only about 5% black pixels you will get immensely high compression
ratios. The only
slight snafu here right now is that PDF printing support in LC does not
yet exist
for Android, and would need a small patch to pass PNG data straight
through to the
PDF (at present it only does this for JPEG). [ The reason PDF printing
is not currently
supported on Android is due to text rendering which is not a
straightforward thing in
PDF nor PostScript; the reason only JPEG image data is currently
supported is that
when the pass-through was implemented the library we use to do PDF
printing - cairo -
only supported it for JPEG, I *think* it does support certain PNG
formats now though
since we updated the library for other reasons a while back ].
- Option 2 - augment the original PDF
PDF documents can be augmented after creation - the data structure is
designed to
allow revisions which overlay the original document. Thus it should be
possible to
generate modifications to the original PDF and append them to it.
The difficulty here is that it would require some intimate knowledge of
the PDF
document structure (although far less than what would be required to
generate one
from scratch). Basically, you provide modified page objects for each
page and a
modified 'page tree' which first contains all the original things on the
page
and then adds text objects (which is not too bad to generate if you just
want ASCII
characters in one of the built in fonts such as Helvetica) in the places
you need.
Such a process could be implemented in LiveCode Script and would be
completely
independent of platform. Also, it would preserve the original PDF
entirely (no
round-tripping through a PDF rasterizer) as you would only be adding to
what
was already there.
How much work would be involved in writing said script, however, is
another matter.
- Option 3 - wait until LiveCode can render PDFs directly as an object
on a card
This is obviously what you had hoped you could do and whilst not
entirely
unreasonable, I hope you can appreciate from the above why you currently
cannot -
particular on all platforms.
PDFium does at least give us a starting point - however it isn't the
easiest of libraries
to build or maintain building of and there's still a fair bit of work we
need to do to
allow it to function cross-platform (not least the building of it for
all platforms!).
Also, lamentably, that is only one side of the story - you also need to
generate PDFs,
which means some library to output PDF is needed which is happy to bind
to PDFium's
rasterisation implementation. This is certainly not something which is
exposed in the
public APIs of PDFium, and would probably require bespoke customisation
of PDFium to
achieve.
- Option 4 - focus on Mac/iOS and do other platforms later
As mentioned above, both Mac and iOS include PDF rendering and emission
as part of
CoreGraphics - they also include relatively straightforward APIs for
drawing typeset
text. The process here would be:
1) Create a CG PDF output context
2) Load your original PDF as a CG PDF object
3) For each page:
i) Render the original page into the PDF output context
ii) Render the text into the appropriate places on the page
4) Finalize the output context to generate a PDF
I recently did some work for a business services request which needed to
render
portions of a PDF to a new PDF on Mac - and it turned out to be around
50 lines of
C to do that. Rendering the text you would need through CoreText would
be a little
more than that, but nothing too onerous.
----
So anyway, sorry to be the bearer of perhaps not entirely great news,
however what
you want to do is certainly possible - but like most things will require
some leg-work
and a little bit of patience and/or some financial investment.
I do strongly suggest you contact business services
(https://livecode.com/services/)
about what you need here. It is important to understand that whilst we
would like to
do everything, we do need a way to prioritise what we focus on. Whilst
PDF rendering
and output features are (obviously) quite widely useful for lots of
things they are also
substantial and large features to develop and maintain (if they weren't
we would be
surrounded by lots of open-source non-GPL implementations to choose from
and base them
on) thus progress on them generally in terms of additions to the core
product are likely
to be slow. However you do have a very specific use-case with well
defined inputs and
outputs so we may be able to help you for far less then it would cost
you to commercially
license the relevant cross-platform libraries you need and/or a platform
which provides
the functionality out of the box. (My gut tells me that starting with
Mac/iOS due to
their built in API support for what you want to do is probably the best
first step to take
at least then you get a product which works as it needs to to - and like
any venture, the
sooner you ship, the sooner you can generate revenue to reinvest and
expand!).
Warmest Regards,
Mark.
--
Mark Waddingham ~ mark at livecode.com ~ http://www.livecode.com/
LiveCode: Everyone can create apps
More information about the use-livecode
mailing list