PDF Files - Is there a way to read them in LiveCode?
Kay C Lan
lan.kc.macmail at gmail.com
Thu Sep 18 02:11:47 EDT 2014
OCR is the last thing you want to use. Far too many errors.
If you are on Mac you can use Applescript to open, Select All and Copy
and this will result in 100% of the text being available, no OCR
errors. Unfortunately, depending on your document the output might not
exactly match the input. This will be particularly true pages pages
containing multi-column data or there are tables of data. It's easy
enough to test, open the pdf in question, Select All, Copy and then
Paste into TextEdit. You'll be left with 3 possibilities:
1) You are extremely lucky and your pdfs are very basic and the text
output is a 100% match. LC solution very easy.
2) 90% of the document is fine but a couple of tables don't match.
Your pdfs are standardised and these tables (or multi columns) appear
in the same place. Will be possible to parse the data and use LC to
correct the formatting. Development time will be considerably longer.
3) Your pdfs are random and there are tables and multi columns all
over the place resulting in output that is anywhere between 1% to 10%
accurate. Forget it, it will be almost impossible to reconstruct the
jumble of text back to the original layout.
HTH
On Wed, Sep 17, 2014 at 4:25 AM, Jonathan Scott <songe at agate.plala.or.jp> wrote:
> Hi,
> I have some PDF files that I'd like to create a search stack for. I looked on the net and found a way to display them in LiveCode, but is there actually a way to read the OCR text that's in each file? If not, there should be.
> Thanks in advance.
> _______________________________________________
> use-livecode mailing list
> use-livecode at lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
More information about the use-livecode
mailing list