The right to read is the right to mine

UPDATE. I got feedback suggesting that part of principle 2 was inappropriate at this stage and I agree. So I have struck through parts in this post. There is merit in changing emphasis at such an early stage in the process. This document is subject to revision – that’s part of the point of open discussion.

We – in the OKFN – have been spending some time on Etherpads and skype putting the principles of Open Content Mining. Yesterday we met on skype and decided that we’d done sufficient to take this to the world and get feedback and enhancement. Naomi Lillie (OKFN) will post the full version later and Peter Suber will link to it. This blogpost is an introduction and I’ll quote the central points.

Kinder Scout (from Wikipedia) Fuaigh Mòr (Wikipedia)

Let’s start with another historic area of rights – the right to roam. This is a 20th C movement in many countries to assert that everyone has access to land, whether or not it is privately owned. It’s a good analogy. The fundamental ownership of land id critical, political and often poorly defined. In the 18/19th Century Scotland suffered the – where the residents of the land were thrown out – killed, emigrated, – the lands “improved” with sheep and the lands now “belong” to landlords. But there is a traditional right of access to these lands regardless of actual “ownership”. Wikipedia ( ) says:

The freedom to roam, or everyman’s right is the general public’s right to access certain public or privately owned land for recreation and exercise. The right is sometimes called the right of public access to the wilderness or the right to roam.

Not everyone shares the same view as to what these rights are or even whether they exist. I have been thrown off Scottish land by a gamekeeper with a shotgun, even where there was a legal right. But just because not everyone agrees on the rights doesn’t mean they don’t exist.

So we believe that there is a right to mine the scientific literature and we have expressed this as:

The right to read is the right to mine.

That’s our assertion of the fundamental rights. In the 20th Century the people asserted their right to roam. We are asserting the people’s right to mine. This is a simple political statement – like “everyone has a right to a fair trial”. Because the publishers[*] – like the 19th C landowners dispute this right we have to fight for it. The UK has had a series of fights for rights including freedom of speech, trial by jury, freedom from slavery, etc. Sometimes people went to jail, sometimes they died for these.

But we must fight. An extremely relevant example is the mass trespass at Kinder Scout ( , WP:

The mass trespass of Kinder Scout was a notable act of willful trespass by ramblers. It was undertaken at Kinder Scout, in the Peak District of Derbyshire, England, on 24 April 1932, to highlight that walkers in England and Wales were denied access to areas of open country. Political and conservation activist Benny Rothman was one of the principal leaders.

The trespass proceeded via William Clough to the plateau of Kinder Scout, where there were violent scuffles with gamekeepers. The ramblers were able to reach their destination and meet with another group. On the return, five ramblers were arrested, with another detained earlier. Trespass was not, and still is not, a criminal offence in any part of Britain, but some would receive jail sentences of two to six months for offences relating to violence against the keepers.

The mass trespass marked the beginning of a media campaign by The Ramblers Association, culminating in the Countryside and Rights of Way Act 2000, which legislates rights to walk on mapped access land. The introduction of this Act was a key promise in the manifesto which brought New Labour to power in 1997.

So it’s a long struggle. Am I suggesting a Mass Trespass of publishers? That may depend on readers. But the same tensions are there as 80 years ago – an unjust control of access and the need to change the system by breaking the law. And we have a long tradition of noble lawbreaking – often it is the only way that we change minds and therefore laws. There is usually a debate as to whether change should come by legal means or – in today’s language – “occupying” and “pirate” action.

So I repeat:

The right to read is the right to mine.

This isn’t a negotiated position. It’s not a summary of current practice. It’s a statement of a fundamental right that we must fight for.

Yesterday we agreed that we could not at this stage list the “how” of Open Content Mining (OCM). That comes later. It will probably be filled with subjunctive clauses – this is a difficult and complex area. The right to roam has to yield to national security and rare species. It may or may not have to yield to personal privacy – a difficult area. So the right to mine will have to take account of the current law and decide what can be done within it or what needs changing (e.g. Hargreaves). It may require a definition of “fact”. It may requires cases. It could take some time. But that does not mean we cannot NOW assert the right.

So here’s the core of the principles. We’d welcome others being involved. But I repeat, this is not a negotiation – it’s drafting something we expect to stand for decades or longer. Much of it needs commentary and redrafting – particularly IMO section 2. We don’t want to rush these principles, but we do wish to kickstart the process.



Principle 1: Right of Legitimate Accessors to Mine


We  assert that there is no legal, ethical or moral reason to refuse to  allow legitimate accessors of research content (OA or otherwise) to use  machines to analyse the published output of the research community.   Researchers expect to access and process the full content of the research literature with their computer programs and should be able to use their machines as they use their eyes.


  • The right to read is the right to mine.


Principle 2: Lightweight Processing Terms and Conditions    


Mining  by legitimate subscribers should not be prohibited by contractual or  other legal barriers.  Publishers should add clarifying language in  subscription agreements that content is available for information mining  by download or by remote access.  Where access is through  researcher-provided tools, no further cost should be required. The right  to crawl is not the right to use a publisher’s API for free, however,  when access is through publisher-supplied programmatic interfaces, the  fees should be transparent and per-api-call.  Processing by subscribers  should be conducted within community norms of responsible behaviour in the electronic age.


  • Users and providers should encourage machine processing.

Immediate feedback suggested deleting part of this section and I agree.


Principle 3: Use


Researchers can and will publish facts and excerpts which they discover by reading and processing documents.  They expect to disseminate aggregate statistical results as facts and context text as fair use excerpts, openly and with no restrictions other than attribution.  Publisher  efforts to claim rights in the results of mining further retard the  advancement of science by making those results less available to the  research community; Such claims should be prohibited.


  • Facts don’t belong to anyone.

Guardian article on Content-mining (thanks Alok Jha) makes it mainstream

[I meant to blog this earlier but I have been spending time on writing content-mining software rather than the continued depressing struggle with reactionary commercial publishers of #scholpub.]

On 2012-05-24 the Guardian (a/the mainstream liberal daily newspaper in Uk published an article in its main news pages on content-mining from scientific #scholpub articles. In the paper “It’s a useful research tool so why forbid it?” (p14), online

Text mining: what do publishers have against this hi-tech research tool?

Researchers push for end to publishers’ default ban on computer scanning of tens of thousands of papers to find links between genes and diseases

Byline: Alok Jha, Science Correspondent

Alok was the person who promoted “Academic Spring” on the front page of the Guardian last month. He contacted me and others, especially Robert Kiley from Wellcome Trust. Robert and his Wellcome colleagues have made a massive contribution to free scientific information – without Wellcome we would have much poorer involvement. And as sponsors of UKPMC Robert is at the frontline of content-mining – he knows firsthand how hard it is to get any help from the publishing industry[*].

The coverage included stories from Casey Bergman + Max Haeussler, Heather Piwowar and myself – detailing carefully and accurately our major ongoing difficulties. Some snippets:

All of them [above] needed access to tens of thousands of research papers at once, so they could use computers to look for unseen patterns and associations across the millions of words in the articles. This technique, called text mining, is a vital 21st-century research method. It uses powerful computers to find links between drugs and side effects, or genes and diseases, that are hidden within the vast scientific literature. These are discoveries that a person scouring through papers one by one may never notice.

It is a technique with big potential. A report published by McKinsey Global Institute last year said that “big data” technologies such as text and data mining had the potential to create €250bn (£200bn) of annual value to Europe’s economy, if researchers were allowed to make full use of it.

Unfortunately, in most cases, text mining is forbidden. Bergman, Murray-Rust, Piwowar and countless other academics are prevented from using the most modern research techniques because the big publishing companies such as Macmillan, Wiley and Elsevier, which control the distribution of most of the world’s academic literature, by default do not allow text mining of the content that sits behind their expensive paywalls.

Absolutely correct.

Any such project requires special dispensation from – and time-consuming individual negotiations with – the scores of publishers that may be involved.

“That’s the key fact which is halting progress in this field,” said Robert Kiley, head of digital services at the Wellcome Trust. “For a lot of people, though there is promise there, the activation effort is just too great.”

Exactly. My research has been set back 2-3 years by fruitless “discussions” with publishers.

Asking for permission from publishers is an option, though time-consuming. The University of British Columbia (UBC) researcher, Heather Piwowar, was trying to map the ways scientists use and share papers.

She was eventually contacted by Alicia Wise, Elsevier’s director of universal access, who convened a conference call with Piwowar, a UBC librarian and five Elsevier colleagues. That conversation led to permission for UBC researchers to text mine the Elsevier journals to which they already had access.

Piwowar said: “It takes a lot of time and a lot of energy and doesn’t scale at all. To me it’s a good result because now I have access to things I didn’t have access to before and also it will also hopefully drive change by people saying, ‘This is not an OK way to build on our scholarly literature.’”

The colossal waste of time is clear. Elsevier want me to negotiate with them and the Cambridge University Library. I have to tell Elsevier what research I want to do. The library has better things to do with its time. So do I.

And it’s technically completely unnecessary. I can access the articles I want by standard means. It’s a pinprick in the daily Elsevier downloads. It’s sheer FUD to suggest I will crash their servers. I don’t want ZIP files from them through a special API. I already have what I want. All I need is Elsevier to say they won’t sue me.

Wise said that, in principle, her company was happy to enable text mining for its content. “We want to help researchers deepen their insight and understanding, we want to help them to advance science and healthcare and we want to be able to do that in ways that help realise the maximum benefit from the content we publish. Text mining is clearly a part of this landscape and it will continue to be and we’re keen to support it.”

“In principle” means nothing. In the comments AW described

Elsevier is leading the research information industry to enable text mining.

NO! BMC and PLoS are leading it. I can mine them – as much as I like and I can’t mine Elsevier at all.

We provide text mining solutions to an array of customers, and we also enable researchers to text mine our content for themselves. This is all done through licensing, which is highly efficient and easily scalable.

So efficient and scalable that I have got nowhere in ca. 3 years. So efficient that we need 5 Elsevier staff for one researcher.

We began partnering with the University of Southern California in 2007 to enable researchers in its Neuroscience Research Institute to content mine and we now have agreements with about 20 universities around the world.

Wow! 20/1500 universities in 5 years. Just over 1%.

We also serve researchers in a broad array of commercial organisations. Earlier this year we announced our acquisition of Ariadne Genomics and QUOSA, companies that both provide state-of-the-art text mining services to improve researcher productivity. We continue to invest to develop an array of text mining ourselves, and we offer other tools through collaboration with partners such as the UK’s National Centre for Text Mining. We are also working with other publishers to ensure that text mining is possible regardless of who has published it or where it is located.

These are all words. I am still not allowed to text-mine. And it is Elsevier who makes the rules – in most science it’s God who makes the rules, but here it’s Mammon. I will write a blog on Elsevier and Helpfulness. “Elsevier is a helpful publisher” is similar to a British bank which advertises “helpful banking”. Think of “helpful banking” whenever you think of Elsevier.

Back to the positive.

So what Alok has done is massive! To get national coverage at this level is a huge boost to the legitimacy of our effort. It means the issue is now clear to everyone and cannot be ignored as a minor fringe activity. The UCSF declaration for Open Access (still not mandatory and therefore of very limited practical effect) mentioned mining. Funders are starting to promote mining. UKPMC is fully aware of its huge potential – the dam is only maintained by publisher lawyers and publisher lobbyists in Capitol Hill (US).

So I have been aggressively tooling up for when I am allowed to mine the scientific content. The Guardian article acted as a trial in the court of public opinion and I think the publishers have very little support there.

But I am starting with BMC. Who knows, maybe there is enough hidden science in just 5% of the scholarly literature?

Today we continue developing our Manifesto on Content Mining



[*]Yes, I exempt PLoS, BMC, and lots of worthy society publishers

PLoS ONE Launches the Mice Drawer System Experiment Collection

In August 2009, the Italian Space Agency launched its Mice Drawer System (MDS) investigation on the Shuttle Discovery flight 17A/STS-128. Over the course of a 91-day mission at the International Space Station, the MDS experiment focused on the effects of microgravity on six mice. The purpose of the experiment was to investigate the structural and functional changes that occur in animals when there is an absence of normal gravity over an extended period of time.

The new PLoS ONE Collection brings together a number of articles drawn from this long-term project.

The research presented attempts to capture information on a range of mammalian physiological system changes during the space flight. Collectively the articles offer an integrative view of the mammal’s physiological response to a microgravitational climate.

The research was an international collaboration and involved scientists from several countries. With a better understanding of the effect of microgravitational conditions on mice, this research could be applied in ways to help extend the human presence in space beyond low Earth orbit.

Adapted from: Cancedda R, Liu Y, Ruggiu A, Tavella S, Biticchi R, et al. (2012) The Mice Drawer System (MDS) Experiment and the Space Endurance Record-Breaking Mice. PLoS ONE 7(5): e32243. doi:10.1371/journal.pone.0032243

Collection Citation: The Mice Drawer System Experiment and the Space Endurance Record-Breaking Mice (2012) PLoS Collections:

Latest Article Alert from BMC Public Health

The latest articles from BMC Public Health, published between 21-May-2012 and 28-May-2012

For articles which have only just been published, you will see a ‘provisional PDF’ corresponding to the accepted manuscript.
A fully formatted PDF and full text (HTML) version will be made available soon.

Study protocol
Triage of frail elderly with reduced exercise tolerance in primary care (TREE). A

Latest Article Alert from BMC Infectious Diseases

The latest articles from BMC Infectious Diseases, published between 28-Apr-2012 and 28-May-2012

For articles which have only just been published, you will see a ‘provisional PDF’ corresponding to the accepted manuscript.
A fully formatted PDF and full text (HTML) version will be made available soon.

Study protocol
A multicentre randomised controlled trial evaluating lactobacilli and

Latest Article Alert from BMC Medical Research Methodology

The latest articles from BMC Medical Research Methodology, published between 28-Apr-2012 and 28-May-2012

For articles which have only just been published, you will see a ‘provisional PDF’ corresponding to the accepted manuscript.
A fully formatted PDF and full text (HTML) version will be made available soon.

From theory to ‘measurement’ in complex interventions: Methodological

Latest Article Alert from Journal of Occupational Medicine and Toxicology

The latest articles from Journal of Occupational Medicine and Toxicology, published between 12-May-2012 and 26-May-2012

For articles which have only just been published, you will see a ‘provisional PDF’ corresponding to the accepted manuscript.
A fully formatted PDF and full text (HTML) version will be made available soon.

Bicycle helmet use and non-use recently published research

Latest Article Alert from Breast Cancer Research

The latest articles from Breast Cancer Research, published between 12-May-2012 and 26-May-2012

For research articles that have only just been published you will see a ‘provisional PDF’ corresponding to the accepted manuscript. Fully formatted PDF and full-text (HTML) versions will be made available soon.

Back to the embryonic stage: Nodal as a biomarker for breast cancer progression

Latest Article Alert from Particle and Fibre Toxicology

<!– body {margin:0px; padding:0px; color:#000; background-color:#fff; background-image:none} body, td, p {font-family:Verdana,Geneva,Arial,Helvetica,sans-serif; font-size:12px; line-height:16px} img {border:0px} p {margin:0.5em 0px 1em 0px} a {color:#039} h1 {margin:15px 0px; font-size:20px; line-height:24px; font-weight:normal}