PDF text extraction?

Richard Gaskin ambassador at fourthworld.com
Sat Apr 2 11:38:57 EDT 2016


Richmond wrote:

> I see that Supercard, in their 4.8 Beta have introduced these:
>
> rtfToText - extracts text from an RTF or RTFD file
> pdfToText - extracts text from a PDF file
> docToText - extracts text from a Microsoft Word file
> docxToText - extracts text from a Microsoft Word XML file
> htmlToText - extracts text from an HTML file
> webarchiveToText - extracts text from a WebArchive file
> http://forums.supercard.us/viewtopic.php?f=11&t=2115&p=10705&hilit=pdf#p10705
>
> However, as I have just joined the Supercard Forum [for the only reason
> to get a copy of the Beta] this will take a few days to come through.
>
> Richard Gaskin should not have to wait:
> http://solutionsetcetera.com/betarequest.html
>
> Of course, as Supercard is Macintosh only I suspect
> these features are leveraging Mac-only features; although they
> do look very UNIX/Linux like in their naming method.

Thanks. Yes, Mark Lucas has been doing some outstanding work on 
SuperCard 4.8.

But being exclusively for OS X, as much as I've enjoyed trying out those 
enhancements on my Mac I can't use them on the platform I spend most of 
my time on (Ubuntu), or the one most of my users spend time on 
(Windows).  Mr. Lucas is, to put it politely, not fond of the Windows 
API, and has no interest in Linux, so I don't see that changing anytime 
soon.

I may have a lead on a long-term multi-plat solution, and for now I can 
get through the first batch of a thousand or so PDFs I need to work with 
using the pdftotext command line tool included in Ubuntu.

--
  Richard Gaskin
  Fourth World Systems
  Software Design and Development for the Desktop, Mobile, and the Web
  ____________________________________________________________________
  Ambassador at FourthWorld.com                http://www.FourthWorld.com




More information about the use-livecode mailing list