PDF text extraction?

Richard Gaskin ambassador at fourthworld.com
Fri Apr 1 13:49:23 EDT 2016


Alejandro Tejada wrote:

 > Hi Richard,
 >
 > Could you use command line tools like pdftk or qpdf?
 >
 > Check this:
 > 
http://stackoverflow.com/questions/15058207/pdftk-will-not-decompress-data-streams
 >
 > and this:
 >
 > 
https://books.google.com.do/books?id=ozWeSBkPQW4C&pg=PA205&lpg=PA205&dq=uncompress+and+save+pdf+streams&source=bl&ots=9LyTX9eHMe&sig=nmvt8iXLCF5NTNpBEQQJadGbR34&hl=en&sa=X&ved=0ahUKEwjQpPuFpezLAhUHlh4KHffsBSUQ6AEITzAI#v=onepage&q=uncompress%20and%20save%20pdf%20streams&f=false

Very helpful, Alejandro.  Thanks.

I may have a lead on a long-term solution, and for the short term I was 
delighted to discover that the command-line tool pdftotext is included 
in Ubuntu, with super-simple syntax:

   pdftotext <sourcepdffile> <outputtextfile>

So for now my mining operation is underway....

-- 
  Richard Gaskin
  Fourth World Systems
  Software Design and Development for the Desktop, Mobile, and the Web
  ____________________________________________________________________
  Ambassador at FourthWorld.com                http://www.FourthWorld.com





More information about the use-livecode mailing list