The WellcomeTrust APC spreadsheet (ed Michelle Brook and community) adds massive crowdsourced value to Open Access. YOU can help

Last week The Wellcome Trust published its list of ca. 2000 articles for which it had paid Article Publishing Charges (APCs). It spent about 3 million GBP.

Those publications are a valuable investment. On Monday Mark Walport told us at the EuropePMC young scientist writers awards that publishing was as valuable as test tubes. Well-communicated science is of great value. Science behind paywalls loses hugely. My rough guess is that publishing is ca 1-2% of the cost of the grant, so I’d guess this represents about 200 million GBP overall investment. [See below how to avoid the guessing].

But what the Wellcome Trust lists offers is just the beginning. Michelle Brook , who runs Science at the Open Knowledge Foundation, immediately saw the potential. With great energy (and loss of sleep) she coordinated volunteers to curate this list. The result is at

This isn’t the “version of record”. It’s a snapshot. Get used to the idea that in the Digital Century everything is snapshotted. There is often no “final version”. There may be intermediate versions used for specific purposes – for example checking that Elsevier has published what it got paid to publish. But everything is capable of revision and enhancement – in so many ways. I’ll give some below.

Michelle is using Google spreadsheets – which allows anyone to view the exact state of the spreadsheet. When she first prepared the spreadsheet it could be a bit confusing because if anyone sorted a column it alters everyone’s views.  But we solve that by social, not technical means. We know who is there – they are all friends (by definition you are part of the community) and we let each other know what we are doing.

The result is mind-blowing. It’s a human-machine synthesis of a section of scholarly publishing. So here’s a rough roll of honour:

  • Mark Walport, Robert Terry for making Wellcome the most dynamic force in Open Access and providing the funding
  • Robert Kiley and colleagues
  • Michelle Brook (andOKFN) for pulling this together and in no order (and maybe with omissions)
  • Stuart Lewis
  • Theo Andrew
  • Nic Weber
  • Jackie Proven
  • Fiona Wright
  • Stuart Lawson
  • Jenny Molloy
  • Yvonne Budden
  • SM
  • Rupert Gatti
  • Peter Murray-Rust
  • ck

That’s 13 contributors in less than a week. That’s how crowdsourcing works. About half the entries have names, so there’s lot of opportunity for you. You don’t need to have any specialist knowledge – and it’s open to all. Would make a good high-school project. Open Access Button could be involved, for example.

I think this spreadsheet has added a million GBP to Wellcome’s output.

What???!!! That’s an absurd amount to claim for 1 week of crowd sourcing. OK, I’ll revise it below…

Yes. There is 200 million GBP of investment. If no-one knows about it its values is small (we can count people trained, buildings kept-up, materials, etc.). But the major outcome of research funding, apart from people and institutions is KNOWLEDGE.

If the knowledge is 100 million, that’s a bad investment. If it’s 200 million, it’s marginal. To be useful the knowledge must be at least 300 million. [I’ll claim a multiplier of 5 for the mean of Open Knowledge and I’ll write a separate post…].

So what can this spreadsheet be used for?

  • we can download all the full text and search it. [“some of this isn’t CC-BY” so you can’t do that… Well I’m going to mine it for Facts, and that’s legal and anyway if you want to take me to court and claim that copyright stops people doing research that stops people dying I’ll see you there. It’s Open – Wellcome Trust has paid huge amounts of its own money and we have a moral right to that output.]. So expect the Content Mine to take this as a wonderful resource.
  • we can teach with it. For most science the publishers forbid teaching without paying them an extra ransom. Well, there’s enough here that we can find masses of useful examples for teaching. tells, sequences, species, phylogenetic trees, metabolism, chemical synthesis, etc. When you are creating teaching resources one of the first places you will look will be the WT-OKFN spreadsheet
  • we make science better. There’s enough here to create books of recipes (how-tis), typical values, etc. We can detect develop FRAUD detection tools.
  • we can engage citizens. [“Hang on – you’re going too far. Ordinary people can’t be exposed to science”. Tell that to cyclists in cambridge – there’s a paper on the “health benefits of cycling in Cambridge”. I think they’ll understand it. And I think they may be more knowledgeable that many paywall-only readers.]
  • we can detect papers behind paywalls. and the hints are that it’s not just Elsevier…
  • we can develop the next generation of tools. This spreadsheet is massive for developing content-mining. It’s exactly what I want. A collection of papers from all the biomedical publishers and I know I can’t be sued.
  • a teaching resource. If I were teaching Library and Information Science I would start a modern course with this spreadsheet. It’s a window onto everything that’s valuable in modern scientific information.
  • an advocacy and awareness aid.
  • a tool to fundamentally change how we communicate science. This is where the future is and it’s just the beginning. Information collected and managed by new types of organisation. The Open Knowledge Foundation. Democracy and bottom-up rather than top-down authoritarianism. If you are in conventional publishing and you don’t understand what I have just said then your are in trouble. (Unless of course you have good lawyers and rich lobbyists who can stop the world changing). We haven’t even put it into RDF yet and that will be a massive step forward.
  • a community-generator. We’ve already got 13 people in a week. That’s how Open Streetmap started. it’s now got half a million. WT-Brook could expand to the whole of enlightened scientific communication. Think Wikipedia. Think Mozilla, Think Geograph, Think OpenStreetMap. Think My Society, Think Crowdcrafting, Think Zooniverse. These can take off within weeks or months.


So it was silly to suggest this spreadsheet liberates a million pounds of value. I’ll be conservative and settle for ten million.