PDF text extraction?
RM
richmondmathewson at gmail.com
Sat Apr 2 09:30:52 EDT 2016
Well: every a sucker for repeating mistakes as many times as possible, I
opened my sample
PDF in Inkscape and then saved it as an EPS file, then tried to import
it into Metacard 2.4;
which promptly crashed.
Of course this, even were it to work, would be useless in terms of batch
processing tons of PDF files.
As previously observed . . . I wonder if the *OpenOfficeOrg **_Open
Source_* routine for converting PDF files into text
could not be co-opted by Livecode ?
Richmond.
On 1.04.2016 03:47, Richard Gaskin wrote:
> I may need to extract text from a fair number of PDFs (hundreds). I
> can find all sorts of third-party tools to do that, many of them free
> and easy to use, but I'd prefer to integrate this step into some other
> things I need to do with the files.
>
> The format isn't as simple as Word or docx, though. I'm not even sure
> if we have support in LC for the compression used in the text streams.
> Lots of parts there.
>
> Anyone here have a library or external for extracting text from PDFs?
> Ideally a good solution would be available for Win, Mac, and Linux.
>
> TIA -
>
More information about the use-livecode
mailing list