The Content Mine: we meet Wikimedia

A massively valuable meeting at #solo13. I’ve told you how we are going to extract 100 million facts from the scientific literature. That’s an act of faith and we have to start building the reality. We’ve got to work out:

  • How we are going to get them?
  • How are we going to process them?
  • Where do we put them?
  • Who will help?

At #solo13 there was a session on Revolution run but Alok Jha, Science correspondent of the Guardian. (Alok’s been very helpful in the past, highlighting our battle to have lawyer-free content-mining). After the panel, Alok asked for people who were running revolutionary campaigns.

I volunteered The Content Mine (blog post and 5-minute video https://vimeo.com/78353557 ). I said this was revolutionary. I also said “‘I work within the law… Up to now…” – that’s a factual statement, not a threat. Afterwards I met up with Toni Sant, who is education organizer at Wikimedia UK.

Before I continue, a clarification. Wikimedia is a foundation, Wikipedia is an encyclopedia (albeit the greatest the world has seen), Wikileaks is completely separate, MediaWiki is a useful tool from Wikimedia. Wikimedia has created many Wiki-projects, see http://www.wikimedia.org/. There’s 16 projects. That’s too much for me to hold in my head (Greg Wilson has told me this). So If I am confused that’s an objective problem. So Wikimedians, please forgive and correct errors.

Anyway the really exciting thing is that Toni and colleagues would love to host the data coming out of the Content Mine.

Ross and I have been invited to visit the Wikicommunity down in Wikiland London (Leonard Street, off Silicon Roundabout). The most likely receptacles will be Wikidata and Wikispecies.

We’ll need to discuss details – does our output need curation? If so how? And by whom? Or is the quality produced by machines sufficiently good (I think it is for species, and maybe chemicals, and some identifier systems). We are all communally very excited?

Why am I not doing this in my University or the University of Bath?

Because Universities don’t understand modern information as well as and as enthusiastically as Wikimedia. Wikimedia understands:

  • Distributed systems
  • Version control
  • The semantic web
  • Communities and crowd-contributions
  • The world of citizens outside academia
  • Identifier systems
  • Domain-specific information
  • The law

Universities (with a very few exceptions) aren’t interested and aren’t competent. I have tried to use repositories for data and effectively failed. Anecdote: A senior retired academic asked his library to store his database. “Oh we don’t support academic databases, only ones we buy in”. (He “solved” that by turning his database into LaTeX, then PDF and then the library was happy to accept it as it was a “book”). I might have thought that my own University would be interested in my experience of text-and-data mining, but they don’t answer my mail.

So I go where the energy, the vision, the community is. I think Wikimedia will largely replace university libraries for most people (and certainly the #scholarlypoor). Wikimedia states:

Imagine a world in which every single human being can freely share in the sum of all knowledge. That’s our commitment.

I share that vision. Universities, unfortunately, don’t.

The only thing that prevents Wikimedia – and it’s a horrible thing – is the stranglehold of copyright and contract law from megacorporations, abetted by the supine-ness of Universities. If your University really values citizens of the world it should work WITH Wikimedia, not against it.