We meet in Berlin to prepare the #schoolofdata

I’m spending an exciting two days in Berlin helping the OKFN/P2PU prepare their School Of Data (SoD) course/s. I’m sure this will turn out to be a seminal event in both Internet education and advancement in “data wrangling”. Here’s the initial announcement – http://blog.okfn.org/2012/02/08/announcing-the-school-of-data/ . “The School will be a joint venture between the Open Knowledge Foundation and Peer 2 Peer University (P2PU). ”

There’s a huge need for skilled and inventive data wrangling. This is a mixture of technical knowledge and knowhow and the “course” will cover both. We are working out the granularity of the “course” – almost certainly a collection of smaller units, generally self-paced but with some clear timelines. P2PU has had considerable experience in this – for example partnering with Mozilla on web skills.

Here’s Laura Newman – the course coordinator – getting our thoughts organized and photographed, and here’s Rufus Pollock and Stiivi Urbanek hard at work planning the details.

Stiivi has put together a great “architecture” for the technical side of the course which goes from acquiring data, to cleaning, filtering, repurposing and presentation. We have a strong sense of pipeline, where course participants take a problem from start to finish, using the appropriate skills are each stage. We are presenting this round “challenges” – we take a theme which everyone can relate to and go all the way from finding the data to drawing conclusions.

The course structure and participation is flexible and controlled – there is no hierarchical distinction between teachers and leaners – we are all a bit of both. We expect information to flow from and to the course.

The overall components (stages) – which have largely crystallized in our planning – are

  • Data sources
  • Discovery and acquisition
  • Extraction
  • Cleansing, transformation, and integration
  • Analytical modelling
  • Data mining
  • Presentation, Analysis, publishing and packaging

And an overarching subject of “data governance”

To analyse a particular subject a participant needs to go through the processes above, although not all will be needed for a given problem/challenge. We call this process a “journey”, where we visit the different stages on a planned itinerary. Many courses will be organized like this – and the first we have designed is “What is unique about my country?”

In this participants (perhaps working in teams) will find and extract information about their country, clean, fliter and integrate it and finally present answers to this very general question (which requires comparison with other countries).

In an orthogonal fashion, participants will also study a particular stage in depth. In the journey metaphor, this is like spending your time in one place, finding the different ways of tackling it. So one early topic will be “Crawling and scraping” – there are several different tools, approaches and problems.

There’s a real buzz! Over 300 people have signed up and we had an IRC meeting yesterday with 30 – who are very keen to be involved and contribute. Lots of great skills and ideas.

Much more later – on a regular basis – as this is an important part of my life.