Reading PDF - a cry for help

Graham Samuel livfoss at mac.com
Fri Sep 30 11:02:53 EDT 2011


Thanks to all who replied. As the one who started this thread, I'd like to say that I pretty much despair of finding a solution. The current position seems to me admirably summarised by Paul Dupuis (see below). Suggestions that I use a command-line utility seem to me to come down to using ImageMagick since I have not found any other solution that has the right functionality and licensing terms - but when I looked into it, although I freely admit I know almost nothing about the internal workings of Windows, it seemed to me that IM is a resources hog that would not be amenable to a simple installation process hidden from the user; and that operating it so as to provide a LiveCode window containing the relevant representation would not be straightforward and would certainly mean clunky intermediate files. So it would be very very different from an 'import paint' situation. Bear in mind that I am not interested in the text in a PDF, just the image content (just a bitmap really), so things like IM are overkill for me anyway. But this 'modest' requirement hasn't got me any nearer a solution.

What really annoys me is that if I were writing my app in Visual Basic, I suspect there would be library components available with the right licensing terms, but the promise of a simple 'glue' or 'wrapper' capability to tie LiveCode to third-party externals whose APIs were not written with LiveCode in mind has not been fulfilled, even though it has been proposed in some versions of the LC documentation.

Before I completely give up I will go round the ImageMagick route once more, since I suppose I may have misunderstood its resource requirements, and it does have the advantage of being able to read TIFFs, which is another problem I have (also not likely to be on LiveCode's radar despite QA requests).

As a last remark, I'd be interested to know of the details of ANY implementation of adding functionality of any kind to LiveCode via a third-party application and 'shell'. I have never seen this in action and I can't remember it being demonstrated by anyone on this list - but maybe I just wasn't paying attention.

Graham


On 9/29/2011 10:01 AM, Graham Samuel wrote:
Short of RunRev itself extending input formats to include PDF (not impossible, but not likely in the short term), the solution would seem to be to licence a third-party library component and integrate it into my app by the use of bridging ('glue') code. I got pretty near with this one, having identified a component with suitable licensing terms and functionality (Sorax DLL). RunRev suggested that I could do the gluing with the aid of a 'C' programmer. It turns out after a lot of research by Thierry Douez, who has been helping me, that what I need is a person familiar with Visual Studio to accomplish this - but I despair of finding such a person who would also be familiar with the externals interface of the LiveCode engine. Maybe I will find such a person, but the trail does seem to have gone cold.

Has anyone any suggestion as to how I might proceed? My app works so nicely with JPG and PNG files, and I have (a little) belief that I could make it work with TIFF files, but without PDF input I am dead in the water.

As some folks may remember, I have posted to this list a number of time 
on the need for being able to open and read PDF content (text and 
images) in LiveCode. We at Researchware have, I think, thoroughly 
explored this topic. It all boils down to the fact you need 3rd party 
technology that can read the PDF format and render it and/or extract the 
text from it.

For pages as images or unstylized text, the cheap and dirty way is to 
use a 3rd party command-line utility to make your conversions. From a 
script perspective, you perform an answer file command, get the PDF 
file, and then use shell to batch convert it and then read the resulting 
text file or image file(s) back in. There is NO other free way to do 
this. Yes, this is ugly and probably not for the novice scripter and you 
code pretty much has to be platform specific, but again, it is the ONLY 
free way to do this. You are also not every displaying a real PDF - you 
are either displaying images of pages OR the unformatted, unstyled text.

You can also do a limited form of displaying a PDF in a window (you 
can't get or copy any selections/content in it though and can only 
navigate under script control by page) through InterApplication 
communication (IAC)

To open a PDF in LiveCode where you can actually control navigation 
through script control and get or set the user selections required two 
things: (a) a PDF library with APIs supporting these actions and (b) 
creating a set of LiveCode externals that in turn use the PDF APIs to 
provide these functions. The main problem with this approach is that all 
(or all we could find) of the open source or free PDF libraries are 
woefully immature and lack major functionality. Only commercial PDF 
technology has the supported APIs for this and whether Adobe, Foxit or 
other commercial PDF technologies providers, all charge typically based 
upon a per unit shipped royalty model. And some, like Adobe, are really 
expensive.

I used the revPartner program to explore this with RunRev quite some 
time ago and asked whether they would consider support in the engine. At 
the time, they only said it was not practical due to licensing issues 
(or close to that). What I understand now is that is becuase the open 
source PDF libraries are crap and the commercial ones woudlld have 
imposed an entirely different licensing model for LiveCode - one with 
runtime royalties - which I think none of us want (RunRev or Developers).

I am afraid, as of September 2011, that is that state of LiveCode and 
PDFs. There is a promising free open source GNU effort out there 
(http://gnupdf.org/Library), but most of the libraries are only 30 or 
40% complete. When this is complete, we can all benefit from free PDF 
support in LiveCode by wrapping the external API around the GNU effort. 
Until then, you have to choose between cheap, dirty, and limited OR 
costly and commercial.

-- 
Paul Dupuis
Cofounder
Researchware, Inc.
http://www.researchware.com/




More information about the use-livecode mailing list