Parsing a PDF file
Dar Scott
dsc at swcp.com
Fri Jul 8 15:03:47 EDT 2016
> On Jul 8, 2016, at 9:44 AM, Richard Gaskin <ambassador at fourthworld.com> wrote:
>
> > My County is now publishing the election results to the web as a PDF
> > file:
> >
> > https://www.mynevadacounty.com/nc/elections/docs/2016%20Elections/June%207%2c%202016%2c%20Presidential%20Primary/Election%20Results/precinctreport.pdf <https://www.mynevadacounty.com/nc/elections/docs/2016%20Elections/June%207%2c%202016%2c%20Presidential%20Primary/Election%20Results/precinctreport.pdf>
> >
> > Is there a way to parse these PDF files?
>
> It's unfortunate that so many orgs release data useful to analysis in complex formats that inhibit such use. PDF is great when the goal is to preserve page layout, but a uniquely poor choice for sharing data to be used for analytics. Alas, that hasn't slowed its unfortunate use in such contexts.
To make it worse, documents for human consumption are claimed to be the same when underneath there are big changes. Tables are moved around, rotated, have zeros converted to blanks, have commas added and so on.
You know that party bosses get files in useful forms. I'd contact the right people in the state government and get the right files.
One thing that has worked for me for onetime analysis is trying different file name extensions in downloading. The right file might be there.
Dar
More information about the use-livecode
mailing list