Content Mine: Sunlight in California – can AMI help make Spending data Open?

Marc Joffe has an ambition – to make Open vast amounts of spending data in California. The Sunlight Foundation has funded Marc – the problem is that the data is present in PDFs. So Marc mailed #ami2 a typical document and asked if she could understand it:

She can get the pictures out easily, but that’s not what Marc wants – he wants the data. Like this:

AMI thinks she can find some time to tackle this and help Marc. She’s not interested in money (she has the emotional age of a FORTRAN compiler) but she needs to hack table for science and this one should be possible. (Of course #animalgarden is working with @TabulaPDF as well).

Marc's blogged about it

Open government data is valuable only to the extent that it can be used cost-effectively. When governments provide “open data” in the form of voluminous PDFs they offer the appearance of openness without its benefits. In this situation, the open government movement had two options: demand machine readable data or hack the PDFs – using technology to liberate the interesting data from them. The two approaches are complimentary; we can pursue both at the same time.


Whether your motive is to improve government, lower the cost of data journalism or free scientific data, you are welcome to join The PDF Liberation Hackathon on January 18-19, 2014 – sponsored by The Sunlight Foundation, Knight-Mozilla OpenNews and others. We’ll have hack sites at the NYU-Poly Incubator in New York, Chicago Community Trust, Sunlight’s Washington DC office and at RallyPad in San Francisco (one or two locations will have an opening social on the evening of the 17th). Developers can also join remotely because we will publish a number of clearly specified PDF extraction challenges before the hackathon. – See more at:


PMR will be in Lithuania liberating crystallography but hopes to connect in.

And we hope you will too.