PDF text extraction?

RM richmondmathewson at gmail.com
Sat Apr 2 09:30:52 EDT 2016


Well: every a sucker for repeating mistakes as many times as possible, I 
opened my sample
PDF in Inkscape and then saved it as an EPS file, then tried to import 
it into Metacard 2.4;
which promptly crashed.

Of course this, even were it to work, would be useless in terms of batch 
processing tons of PDF files.

As previously observed . . . I wonder if the *OpenOfficeOrg **_Open 
Source_* routine for converting PDF files into text
could not be co-opted by Livecode ?

Richmond.

On 1.04.2016 03:47, Richard Gaskin wrote:
> I may need to extract text from a fair number of PDFs (hundreds).  I 
> can find all sorts of third-party tools to do that, many of them free 
> and easy to use, but I'd prefer to integrate this step into some other 
> things I need to do with the files.
>
> The format isn't as simple as Word or docx, though.  I'm not even sure 
> if we have support in LC for the compression used in the text streams. 
> Lots of parts there.
>
> Anyone here have a library or external for extracting text from PDFs? 
> Ideally a good solution would be available for Win, Mac, and Linux.
>
> TIA -
>




More information about the use-livecode mailing list