Parsing a PDF file

Jim Hurley jhurley0305 at sbcglobal.net
Sun Jul 10 10:13:11 EDT 2016


Richard Gaskin wrote:
> 
> Jim Hurley wrote:
> 
>> Thanks Richard.
>> 
>> You are so right about releasing data in complex formats.
>> I spoke to the election's off about posting election results in PDF
>> format.
>> I knew there was not use fighting them when they told me that it was
>> now County "policy" to post everything in PDF--not unlike those 10
>> policies of renown that were carved in stone--and a metaphor was born.
> 
> Unfortunate, as it renders the data nearly useless.  I agree you need to 
> pick your battles, but it's dismaying in an ostensible democracy when 
> the process of open data for civic-minded citizens is implemented in 
> ways that ultimately deliver the opposite of the intended goal.

Part of the problem in rural areas, such as the county I retired to, is budget.
The Board Of Supervisors is ruled by budget considerations. They see it as the central issue in their reelection.
They have cut back the budget for the elections office. That is not a good place to economize.
There have been numerous screw ups recently. I get the feeling that the staff lives in constant terror of messing up.
I served as the database manager for the current head of the department in his last election (it is an elected office—don’t ask) and I’m confident he had no idea what I did in that capacity. 
> 
> Across the US we're beginning to see a revolution in government data 
> sharing.  At the municipal level one of the shining examples has been 
> Raleigh, NC, in no small part due to the work of Jason Hibbets.  He 
> works as the Community Manager for Red Hat, and has devoted significant 
> volunteer time working with city officials to make data available so 
> local devs can deliver apps for the community.
> 
> Notes on his work and a link to his excellent book, "The Foundation for 
> an Open Source City" (I got a signed copy when I met him at the SoCal 
> Linux Expo a couple years ago) is here:
> http://theopensourcecity.com/
> 
> The slides from the SCaLE talk where I met him are linked to from this 
> page outlining his presentation:
> http://www.socallinuxexpo.org/scale12x/presentations/open-source-all-cities.html
> 
> 
>> In the County's old system, each of the 50 election precincts were
>> stored in 50 web pages as HTML documents.
>> That was perfect for LiveCode's "get url". It was a matter of second
>> to  visit all 50 pages, parse the text, and store the data.
> 
> So much for progress. ;)
> 
> Too often we see Cargo Cult thinking in data management, where folks 
> start using a tool or a format only because they hear about it others, 
> but since they don't actually use the system they're delivering they 
> never come to understand what's useful and what's an impedance.
> 
> 
>> (The other two text options in Adobe are "Rich Text Format" and "Text
>> (Plain)", neither of which works--only "Text (Accessible)"
> 
> What is "Text (Accessible)”?

I don’t know. It apparently is neither RTF nor “Text—plain” .

I tried to save the PDF file as “Text—plain”  and got this response:

    Acrobat was unable to make this document accessible because of the following error:
    Bad PDF; could not read page structure. <Bad PDF; error in processing fonts: unsupported Type2 font> [7]
    Please note that some pages of this document may have been changed. Because of this failure, you are advised to not save these changes.

In the “Text (Accessible)” format there seems to be an implied criticism of the  “Text—plain” format. Text (Accessible) really is accessible, the others, not so much.
Apparently you can save it as plain text, its just not accessible. I love tech jargon. 

> 
>> I was unaware of Apple's Automator. I'll look into it--but it is
>> unnecessary for this project.
> 
> Warning:  Automator is a lot of fun, and may be addictive.  Be careful 
> playing with it, since you may find yourself experimenting with all 
> sorts of things and before you know it your Saturday is completely gone. :)

Fair warning. Thanks.
Jim Hurley


> 
> -- 
>  Richard Gaskin
>  Fourth World Systems
>  Software Design and Development for the Desktop, Mobile, and the Web
>  ____________________________________________________________________
>  Ambassador at FourthWorld.com                http://www.FourthWorld.com
> 
> 





More information about the use-livecode mailing list