How to extract whole text from a PDF file with the PDF widget?

Paul Dupuis paul at researchware.com
Sat Dec 11 08:27:38 EST 2021


I suspect it is for backward compatibility.

When I turned over the XPDF external to Livecode, I asked that they 
maintain it for a couple years. I had expected we'd migrate out apps to 
the PDF widget by then, but business factors mean we're only now just 
starting a migration.

That's why I jumped in on this thread - we HAVE to have the ability to 
extract text and images from the PDF widget (as you can with the 
External) - to migrate to the Widget.

I suspect many other commercial developers who used the External still 
have active code using it that they have not migrated yet OR the issue 
of the undocumented (or, even worse, missing) properties of the widget 
most likely would have been raised before now.

To migrate, all the command and functions of the External need to be 
mapped to the properties of the Widget. We have probably a couple 
hundred calls to the External in our code all of which need to be 
mapped, updated, and tested - so no trivial task.


On 12/11/2021 6:50 AM, matthias rebbe via use-livecode wrote:
> Ah, i thought you were referring only to XPDF.
> Btw. do you have an idea why both, XPDF external and PDF widget, are maintained? Wouldn't it make sense to have only one pdf solution included?
> Or am i missing something?
>
> Regards,
> Matthias
>
>
>> Am 11.12.2021 um 02:01 schrieb Paul Dupuis via use-livecode <use-livecode at lists.runrev.com>:
>>
>> Yes, I am familiar with the XPDF external (based on Google's PDFium library), having designed it and paid Monte to code it and then turned it over to LiveCode.
>>
>> I was referring to the PDF Widget (also based on Google's PDFium), which should have a comparable property for fetching the text of a page. The LC dictionary does not list any property for returning the page text, so I assume that is a Dictionary/Documentation error and that Monte can tell us the correct property of the PDF widget that will return the text of a page.
>>
>>
>> On 12/10/2021 7:05 PM, matthias rebbe via use-livecode wrote:
>>> Paul,
>>>
>>> here on mac OS the dictionary of LC 10 DP1 definitely lists the function XPDFViewer_Text(viewerName, pageNumber).
>>> Btw. checking this showed me that this function seems to be deprecated and instead the command
>>>       XPDFViewer_Unicode viewerName, pageNumber, variableName
>>> should be used.
>>>
>>>
>>>> Am 10.12.2021 um 23:22 schrieb Paul Dupuis via use-livecode <use-livecode at lists.runrev.com>:
>>>>
>>>> There must be an undocumented property for the text of a page - there was a function to return the full text of a page in the External (XPDF) and to get the full text of the PDF file, you just stepped through the pages (1..N) getting and concatenating the page text.
>>>>
>>>> Monte? LC 10.0.0 Dictionary does not list a property for the page text.
>>>>
>>>>
>>>> On 12/10/2021 4:46 PM, Torsten Holmer via use-livecode wrote:
>>>>> Hi,
>>>>>
>>>>> I have a PDF file with text and pictures, but I just want the text.
>>>>>
>>>>> I can do it manually with Ctrl-A and Ctrl-Copy by viewing the file with Preview on MacOS.
>>>>>
>>>>> I have a business licence and want to use the PDF widget but I cannot find a way to do it.
>>>>>
>>>>> Can someone help me out?
>>>>>
>>>>> Cheers,
>>>>> Torsten
>>>>> _______________________________________________
>>>>> use-livecode mailing list
>>>>> use-livecode at lists.runrev.com
>>>>> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
>>>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>>> _______________________________________________
>>>> use-livecode mailing list
>>>> use-livecode at lists.runrev.com
>>>> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
>>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>> _______________________________________________
>>> use-livecode mailing list
>>> use-livecode at lists.runrev.com
>>> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>
>> _______________________________________________
>> use-livecode mailing list
>> use-livecode at lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
>
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode




More information about the use-livecode mailing list