Search Inside PDF's that have been OCR'd

Kay C Lan lan.kc.macmail at gmail.com
Wed Mar 24 03:56:25 EDT 2010


Sivakatirswami,

If all you are after is the text within the PDF, you can do it A LOT faster
by rendering the PDF as a Text file. Note this may not work well for PDFs
that have vasts amounts of images spread throughout.

I know you're on OS X, so as long as you have 10.4 or later go here and get
PDF2RTFDService, it's free:

http://www.devon-technologies.com/products/freeware/services.html

Install it in your ~/Library/Services folder

You may need to log out and log back in.

Then in Rev:

use the launch command to force the pdf file to open with TextEdit

then use 'do' to run this AppleScript:

tell application "TextEdit"
   set tText to the text of document 1
end tell

If you drag and drop the PDF onto TextEdit and it renders nicely, then the
above should do OK at extracting the text word perfect. On the other hand,
if TextEdit has trouble rendering a PDF with lots of images, and gets all
the columns mixed up, then the above wont work for you.

HTH

On Wed, Mar 24, 2010 at 12:34 PM, Sivakatirswami <katir at hindu.org> wrote:

> Has any one tried using Revolution to search inside PDF's that have been
> OCR'ed?  I can of course try it yself, but wondering if anyone has done it
> and what insights you may have.
>
>
> _______________________________________________
> use-revolution mailing list
> use-revolution at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-revolution
>



More information about the use-livecode mailing list