PDF text extraction?
Richard Gaskin
ambassador at fourthworld.com
Fri Apr 1 13:49:23 EDT 2016
Alejandro Tejada wrote:
> Hi Richard,
>
> Could you use command line tools like pdftk or qpdf?
>
> Check this:
>
http://stackoverflow.com/questions/15058207/pdftk-will-not-decompress-data-streams
>
> and this:
>
>
https://books.google.com.do/books?id=ozWeSBkPQW4C&pg=PA205&lpg=PA205&dq=uncompress+and+save+pdf+streams&source=bl&ots=9LyTX9eHMe&sig=nmvt8iXLCF5NTNpBEQQJadGbR34&hl=en&sa=X&ved=0ahUKEwjQpPuFpezLAhUHlh4KHffsBSUQ6AEITzAI#v=onepage&q=uncompress%20and%20save%20pdf%20streams&f=false
Very helpful, Alejandro. Thanks.
I may have a lead on a long-term solution, and for the short term I was
delighted to discover that the command-line tool pdftotext is included
in Ubuntu, with super-simple syntax:
pdftotext <sourcepdffile> <outputtextfile>
So for now my mining operation is underway....
--
Richard Gaskin
Fourth World Systems
Software Design and Development for the Desktop, Mobile, and the Web
____________________________________________________________________
Ambassador at FourthWorld.com http://www.FourthWorld.com
More information about the use-livecode
mailing list