PDF text extraction?
Richard Gaskin
ambassador at fourthworld.com
Sat Apr 2 11:38:57 EDT 2016
Richmond wrote:
> I see that Supercard, in their 4.8 Beta have introduced these:
>
> rtfToText - extracts text from an RTF or RTFD file
> pdfToText - extracts text from a PDF file
> docToText - extracts text from a Microsoft Word file
> docxToText - extracts text from a Microsoft Word XML file
> htmlToText - extracts text from an HTML file
> webarchiveToText - extracts text from a WebArchive file
> http://forums.supercard.us/viewtopic.php?f=11&t=2115&p=10705&hilit=pdf#p10705
>
> However, as I have just joined the Supercard Forum [for the only reason
> to get a copy of the Beta] this will take a few days to come through.
>
> Richard Gaskin should not have to wait:
> http://solutionsetcetera.com/betarequest.html
>
> Of course, as Supercard is Macintosh only I suspect
> these features are leveraging Mac-only features; although they
> do look very UNIX/Linux like in their naming method.
Thanks. Yes, Mark Lucas has been doing some outstanding work on
SuperCard 4.8.
But being exclusively for OS X, as much as I've enjoyed trying out those
enhancements on my Mac I can't use them on the platform I spend most of
my time on (Ubuntu), or the one most of my users spend time on
(Windows). Mr. Lucas is, to put it politely, not fond of the Windows
API, and has no interest in Linux, so I don't see that changing anytime
soon.
I may have a lead on a long-term multi-plat solution, and for now I can
get through the first batch of a thousand or so PDFs I need to work with
using the pdftotext command line tool included in Ubuntu.
--
Richard Gaskin
Fourth World Systems
Software Design and Development for the Desktop, Mobile, and the Web
____________________________________________________________________
Ambassador at FourthWorld.com http://www.FourthWorld.com
More information about the use-livecode
mailing list