PDF text extraction?

RM richmondmathewson at gmail.com
Sat Apr 2 11:55:37 EDT 2016



On 2.04.2016 18:38, Richard Gaskin wrote:
> Richmond wrote:
>
>> I see that Supercard, in their 4.8 Beta have introduced these:
>>
>> rtfToText - extracts text from an RTF or RTFD file
>> pdfToText - extracts text from a PDF file
>> docToText - extracts text from a Microsoft Word file
>> docxToText - extracts text from a Microsoft Word XML file
>> htmlToText - extracts text from an HTML file
>> webarchiveToText - extracts text from a WebArchive file
>> http://forums.supercard.us/viewtopic.php?f=11&t=2115&p=10705&hilit=pdf#p10705 
>>
>>
>> However, as I have just joined the Supercard Forum [for the only reason
>> to get a copy of the Beta] this will take a few days to come through.
>>
>> Richard Gaskin should not have to wait:
>> http://solutionsetcetera.com/betarequest.html
>>
>> Of course, as Supercard is Macintosh only I suspect
>> these features are leveraging Mac-only features; although they
>> do look very UNIX/Linux like in their naming method.
>
> Thanks. Yes, Mark Lucas has been doing some outstanding work on 
> SuperCard 4.8.

Well, outstanding is as outstanding does, I really wonder how Supercard 
keeps going in the
face of competition of Livecode.

I know that Supercard has been around for donkey's ages (recall playing 
with it [and finding it
rather awkward compared with Hypercard 2.4.1] about 20 years ago), but 
as Macintosh, whichever way one
looks at things, is a coterie, niche market, a multiplatform alternative 
(pace Livecode) would seem
to make it redundant.

>
> But being exclusively for OS X, as much as I've enjoyed trying out 
> those enhancements on my Mac I can't use them on the platform I spend 
> most of my time on (Ubuntu), or the one most of my users spend time on 
> (Windows).  Mr. Lucas is, to put it politely, not fond of the Windows 
> API, and has no interest in Linux, so I don't see that changing 
> anytime soon.
>
> I may have a lead on a long-term multi-plat solution, and for now I 
> can get through the first batch of a thousand or so PDFs I need to 
> work with using the pdftotext command line tool included in Ubuntu.

Yes, batch processing of PDF to text is easy enough (although the 
standard of end results may vary) on Linux; but it is not
an in-built solution in Livecode.

In a perfect world (which is a silly turn of phrase) Livecode would be 
able to parse just about any file one could chuck
at it.

>
> -- 
>  Richard Gaskin
>  Fourth World Systems
>  Software Design and Development for the Desktop, Mobile, and the Web
>  ____________________________________________________________________
>  Ambassador at FourthWorld.com http://www.FourthWorld.com
>

Richmond.




More information about the Use-livecode mailing list