Parsing a PDF file
Jim Hurley
jhurley0305 at sbcglobal.net
Mon Jul 11 11:28:40 EDT 2016
Kay Lan wrote:
In this particular case I found it much easier to open the PDF file in Adobe Acrobat and doing a “Save as — Text (Accessible)”
Jim
>
> On Mon, Jul 11, 2016 at 9:36 AM, Roger Eller
> <roger.e.eller at sealedair.com <mailto:roger.e.eller at sealedair.com>> wrote:
>> Since this seems to be Mac only, why not "do as Applescript" the select
>> all, and Copy?
>>
> Because Preview isn't properly scriptable and you can't "Select All"
> or "Copy". As Richard said, the answer is with Automator.
>
> If you open Automator, select a new 'application', then in the left
> hand column you'll see "PDF's", as an option. If you click on that and
> browse down the middle column you'll see 'Extract PDF Text', and if
> you click on that, in it's description you'll see that it can extract
> Plain or Rich text.
>
> So how can we get this to work with LC?
>
> 1) In Automator, drag the 'Extract PDF Text' action into the right
> hand workspace window.
> a) Choose the output type - most likely Plain Text
> b) Select a folder to save to - for convenience we'll use "Desktop"
> c) For the Output File Name you probably want to use a Custom Name -
> pdf2text or whatever. You do not need to specify the suffix.
> d) tick the Replace Existing files box.
>
> 2) Back in the left hand column where you clicked on the PDFs icon,
> now click on the 'Files & Folders' icon (looks like the Finder icon).
>> From the middle column drag 'Ask for Finder Items' into the right hand
> column, place it above 'Extract PDF Text'.
> a) Set the 'Start at: to a logical location, like Downloads, if that
> is where your PDFs are likely to be located.
> b) Type: should be left at files and do NOT tick the Allow Multiple
> Selection box as these instruction are for a single file only.
>
>
> 3) From the middle column drag 'Open Finder Items' and place it
> 'between' the last two actions - so the order will be Ask for Finder
> Items, Open Finder Items, Extract PDF Text.
> a) Set Open with: to Preview.
>
> 4) Optionally, if you don't always have Preview open and you don't
> want to be left with the PDF file open, in the left hand column click
> Utility, and from the middle column drag 'Quit Application' to the end
> of your workflow.
> a) set it to "Preview.app"
>
> You can now test this by clicking the Run button in the top right
> corner. What should happen is you should get a standard Open File
> dialog box to point to a file, you then select a file and shortly
> thereafter the Automator log window at the bottom should have all
> green ticks.
>
> You should then be able to navigate to the Desktop folder and the file
> 'pdf2text.txt' should be there.
>
> So to complete the LC integration process. Save your Automator
> workflow, and call it something like pdf2text. For this example we'll
> also save it to Desktop.
>
> Then in you LC script:
>
> on mouseUp
> set the defaultFolder to specialFolderPath("desktop")
> launch pdf2text.app
> --if file is large, consider a wait 1 or more here.
> put textDecode(URL
> "file:/Users/yourname/Desktop/pdf2text.txt","utf8") into tNotPDF
> --do what you have to after this
>
> --your Automator app will auto Quit once it's done it's thing so
> there is no need to balance the 'launch' command with a 'kill' command
> end mouseUp
>
> It should be noted that Automators Extract PDF Text typically does a
> better job of text extraction than manually Select All + Copy + Paste.
>
> Unfortunately I consider both these options about 30% or less accurate
> than using my old PPC G5 running Leopard and Devon Technologies old
> PDF2RTFService. I had not previously offered a solution to the OP
> because, get a PPC Mac, install Leopard and PDF2TEXTService is only
> really an option if you are handling many large, complex formatted
> pdfs day in, day out, as I am. Jim's problem sounds like a one off.
>
>
> e-livecode at lists.runrev.com <mailto:use-livecode at lists.runrev.com>
> http://lists.runrev.com/mailman/listinfo/use-livecode
>
> ------------------------------
>
> End of use-livecode Digest, Vol 154, Issue 21
> *********************************************
More information about the use-livecode
mailing list