FPRAA takes Center Stage at Congressional Hearing

Open access issues are clearly on the minds of U.S. lawmakers. Yesterday, the U.S. House of Representatives Subcommittee on Investigations and Oversight conducted a hearing on the topic “Examining Public Access and Scholarly Publication Interests.” The hearing was designed to generate information regarding open access in general, but quickly turned into a discussion of the recently-reintroduced Federal Research Public Access Act (FRPAA).

FPRAA takes Center Stage at Congressional Hearing

Open access issues are clearly on the minds of U.S. lawmakers. Yesterday, the U.S. House of Representatives Subcommittee on Investigations and Oversight conducted a hearing on the topic “Examining Public Access and Scholarly Publication Interests.” The hearing was designed to generate information regarding open access in general, but quickly turned into a discussion of the recently-reintroduced Federal Research Public Access Act (FRPAA).

Read the Cancer Medicine editorial by Dr Qingyi Wei

Dr Qingyi WeiCancer Medicine has now published the inaugural editorial by Dr Qingyi Wei, Personalized medicine at a prime time for cancer medicine – Introducing Cancer Medicine. Dr Wei writes,  ”We are entering an important era when cancer medicine is being transformed into personalized medicine.  At this time, I am delighted to introduce Cancer Medicine, a new Wiley Open Access interdisciplinary journal, which is committed to rapidly disseminating cutting-edge research and will consider submissions from all oncologic specialties, including, but not limited to, the areas of cancer biology, clinical cancer research and cancer prevention to advance the personalized care of cancer patients.” We hope you enjoy reading the editorial and learning more about this exciting new journal.

Cancer Medicine is open for submissions. To submit your article please visit our online submission site.

Why "Public Access" vs. "Research Access" Matters

Practically speaking, public access (i.e., free online access to research, for everyone) includes researcher access (free online access to research for researchers).

Moreover, free online access to research, for everyone, includes both public access and researcher access.

So what difference does it make what you call it?

The answer is subtle, but important:

The goal of providing “public access to publicly funded research” has a great deal of appeal (rightly) to both tax-paying voters and to politicians.

So promoting open access as “public access” is a very powerful and effective way to motivate and promote the adoption of open access self-archiving mandates by public research funders such as NIH and the many other federal funders in the US that would be covered by the Federal Research Public Access Act (FRPAA).

That’s fine for publicly funded research.

But not all research — nor even most research — is publicly funded.

All research worldwide, however, whether funded or unfunded, originates from institutions: The universal providers of research are the world’s universities and research institutes.

To motivate institutions to adopt open access self-archiving mandates for all of their research output requires giving them and their researchers a credible, valid reason for doing so.

And for institutions and their researchers, “public access to publicly funded research” is not a credible, valid reason for providing open access to their research output:

Institutions and their researchers know full well that apart from a few scientific and scholarly research areas (notably, health-related research), most of their research output is of no interest to the public (and often inaccessible technically, even if accessible electronically).

Institutions and their researchers need a credible and valid reason for providing open access to their research output.

And that credible and valid reason is so as to provided access for all of the intended users of their research — researchers themselves — rather than just those who are at an institution that can afford to subscribe to the journal in which it was published.

Subtle, but important.

It has become obvious that the >75% of researchers who have not been providing open access to their research for over two decades now — despite the fact that the Web has made it both possible and easy for them to do so — will not do so until and unless it is mandated. That’s why mandates matter.

The rationale for the mandate, however, has to be credible and valid for all research and all researchers. “Public access to publicly funded research” is not.

But “maximize researcher access to maximize research uptake and impact” is.

And it has the added virtue of not only maximizing research usage, applications and progress — to the benefit of the public — but public access to publicly funded research also comes with the territory, as an added benefit.

So Mike Rossner (interviewed by Richard Poynder) is quite right that the two are functionally equivalent.

It is just that they are not strategically equivalent — if the objective is to convince institutions and their researchers that it is in their interest to mandate and provide open access.

Elsevier Cell Reports OA: confused or fake?

Update: a closer look suggests genuine confusion.

When I click on the most recent article in Cell Reports, the following picture is what I see – on the left hand side, it says

[copyright sign] The Authors 2012. On the right hand side, there is a “Permissions” link which goes to Rightslink. The only other copyright related note I see on this page (not in the picture below) is the [copyright sign] Elsevier 2012. All rights reserved. In the actual article, at the very bottom of the article under “licensing information” is the Creative Commons license. The author copyright notice is also there – at the bottom of the page, not co-located with licensing information.  A google search for a paragraph within the actual text immediately directs me to this article, which suggests that this Elsevier version of a no-derivatives article is available for text mining. (Details below).

 Here is the paragraph within the text that a google search immediately connected with the actual article:

Next, we expressed an AMPAR subunit (GluA1, GluA2, or
GluA3) fused to Super Ecliptic pHluorin (SEP), a pH-sensitive
variant of EGFP, in hippocampal neurons cultured on the NRXcoated
glass in order to observe changes of the subunits during
LTP. SEP fluorescence is quenched by low pH inside cytoplasmic
vesicles such as endosomes

Elsevier’s Cell Reports has announced that they are offering authors two options for real open access in the form of two Creative Common license options, CC-BY (attribution only) or CC-BY-NC-ND (attribution- non commercial-no derivatives). In both cases, it is the authors that retain copyright, according to the announcement. This is important, as it signifies that with this journal at least, Elsevier is recognizing their obligation to give up commercial sale rights when paid for article production services. However, when I lookat articles in Cell Reports, I see copyright (c sign) Authors, and a permissions link – which goes to Rightslink, not Creative Commons.Is Elsevier confused, or is this more pseudo-OA like Elsevier’s sponsored articles?

Brain and Behavior Publishes Issue 2.2

Brain and BehaviorIssue 2.2 of Brain and Behavior has now been published. This issue includes a policy paper by Anne Abbott and colleagues, Why the United States Center for Medicare and Medicaid Services (CMS) should not extend reimbursement indications for carotid artery angioplasty/stenting. The article suggests that there is overwhelming evidence that supporting this proposal would have serious negative health and economic repercussions for the USA. Also of note are the molecular neuroscience paper by Jennie Wilkerson and colleagues Immunofluorescent spectral analysis reveals the intrathecal cannabinoid agonist, AM1241, produces spinal anti-inflammatory cytokine responses in neuropathic rats exhibiting relief from allodynia and Enhancement and suppression in a lexical interference fMRI-paradigm by Stefanie Abel, Katharina Dressel, Cornelius Weillerand Walter Huber.

This issue’s cover has been selected from Expression and immunolocalization of Gpnmb, a glioma-associated glycoprotein, in normal and inflamed central nervous systems of adult rats by Jian-Jun Huang, Wen-Jie Ma and Shigeru Yokoyama. The results of the investigations discussed in this article suggest that Gpnmb plays an important role in the regulation of immune/inflammatory responses in non-tumorous neural tissues.

You can submit your article to Brain and Behavior via our online submission site.

Worth a Thousand Words: The Spikerbox

Depiction of the SpikerBox (a) and iPhone running custom open-source iOS software (b) used for electrophysiology experiments in the classroom.

Pictured above is the SpikerBox, a low cost, open-source BioAmplifier developed by a team of scientist/engineers in their quest to bring neuroscience education to the K-12 curricula. The SpikerBox can be built by students and teachers in the classroom and enables a variety of experiments that, in the authors’ words, “provides a great way to learn about how the brain works by letting you hear and even see the electrical impulses of neurons!”

In their manuscript, “The SpikerBox: A Low Cost, Open-Source BioAmplifier for Increasing Public Participation in Neuroscience Inquiry” published last week in PLoS ONE, authors Timothy C. Marzullo and Gregory J. Gage describe the design of the SpikerBox and detail experiments employing the device in a classroom setting. They also provide learning materials and supplemental resources, including an assembly guide and student questions, for use in a lesson plan. Marzullo and Gage’s work is an excellent example of bringing together open-source hardware and openaccess publication to support science education.

From the abstract:

Although people are generally interested in how the brain functions, neuroscience education for the public is hampered by a lack of low cost and engaging teaching materials. To address this, we developed an open-source tool, the SpikerBox, which is appropriate for use in middle/high school educational programs and by amateurs. This device can be used in easy experiments in which students insert sewing pins into the leg of a cockroach, or other invertebrate, to amplify and listen to the electrical activity of neurons. With the cockroach leg preparation, students can hear and see (using a smartphone oscilloscope app we have developed) the dramatic changes in activity caused by touching the mechanosensitive barbs. Students can also experiment with other manipulations such as temperature, drugs, and microstimulation that affect the neural activity. We include teaching guides and other resources in the supplemental materials. These hands-on lessons with the SpikerBox have proven to be effective in teaching basic neuroscience.

The Guardian Open Day – C21 publishing as it should be

TomMR made me take a day off to the Guardian Open day http://www.guardian.co.uk/news/blog/2012/mar/24/the-guardian-open-weekend-live-blog . For non-UK readers the Guardian (http://en.wikipedia.org/wiki/The_Guardian orginally the Manchester Guardian ) is 180 years old and one of the few non-profit, major daily newspapers. The Guardian put on show many of it’s regular features and beyond – and for us one of the highlights were the crosswords sessions run by “Paul” and “Araucaria”. I’ll devote a blog for that – you’ll see why.

But the session which most excited me was the Guardian Open Digital Platform. I’d come across this before as both Timetrics and the OKF have worked with the Guardian , especially on data and data-journalism. The Guardian team is absolutely committed to Openness. They see their content as something to be re-used – for example I could reformat the Guardian and produce my own newspaper. They work with Facebook, creating a new entry to a different generation of young people, many of whom never read newspapers. No wonder that the G has the second highest online presence in the UK (the much larger and much … Daily mail is first).

They work with Open source and Open content. They see a vision beyond the traditional newspaper. They don’t know what it looks like or even what role they have in shaping it – leader? Infrastructure? Early adopter? But they want to be the first there.

Literally abutting onto to them is a major scientific publisher, Macmillan/NaturePublishingGroup. What a contrast!

Why accelerate discovery? reasons why we need open access now

One of the benefits of open access is accelerating discovery. This benefit is most evident with libre open access (allowing for re-use and machine assistance via text and data mining), and particularly in evidence with little or no delay from time of discovery to time of sharing of work.

There are always many reasons for accelerating discovery – here are just a few examples of why we need full, immediate, libre, OA, and why we need it NOW:

Multiple drug resistance: we have developed a range of drugs that has worked for us in the past few decades to combat bacteria, tuberculosis, and other diseases. Now we are seeing increasing levels of resistance to antibiotics and others drugs, including anti-malarial drugs. Maintaining the health gains of the past few decades will take more than continuing with current solutions; we need more research, and the faster we can do this, the better the odds of staving off the next epidemic.

Another example of why we need to accelerate discovery, and we need to move to accelerated discovery fast, is the need to find solutions to climate change and cleaner, more efficient energy. We literally cannot afford to wait.

So as much as some of us might wish to give current scholarly publishers time to adjust to a full libre open access environment, this is a luxury that we cannot afford.

These examples of acceleration will likely provide new business opportunities, too. If this happens, it is a welcome, albeit secondary, benefit.

Cancer Medicine Launch Event at AACR

AACR posterWe are hosting a launch event for Cancer Medicine at the American Association for Cancer Reasearch (AACR) conference in Chicago this year. The event will take place at the Wiley-Blackwell booth #3608 April 2nd, 3-5pm. Join our Editor-in-Chief, Dr Qingyi Wei and Managing Editor, Dr Verity Emmans for coffee and cookies at the stand and collect your free Cancer Medicine T-shirt.  They are looking forward to meeting authors and reviewers and answering any questions about the journal.

The journal is now open for submissions. You can submit via our online submission site now!

Hopeful Ad Hoc Critiques of OA Study After OA Study: Will Wishful Thinking Ever Wane?

Comment on Elsevier Editors’ Update by Henk Moed:
Does Open Access publishing increase citation rates? Studies conducted in this area have not yet adequately controlled for various kinds of sampling bias.

No study based on sampling and statistical significance-testing has the force of an unassailable mathematical proof.

But how many studies showing that OA articles are downloaded and cited more have to be published before the ad hoc critiques (many funded and promoted by an industry not altogether disinterested in the outcome!) and the special pleading tire of the chase?

There are a lot more studies to try to explain away here.

Most of them just keep finding the same thing…

(By the way, on another stubborn truth that keeps bouncing back despite untiring efforts to say it isn’t so: Not only is OA research indeed downloaded and cited more — as common sense would expect, since it accessible free for all, rather than just to those whose institutions can afford a subscription — but requiring (mandating) OA self-archiving does indeed increase OA self-archiving. Where on earth did Henk get the idea that some institutions’ self-archiving “did not increase when their OA regime was transformed from non-mandatory into mandatory”? Or is Henk just referring to the “mandates” that state that “You must self-archive — but only if and when your publisher says you may, and not if your publisher says ‘you may if you may but you may not if you must’“…? Incredulous? See here and weep (for the credulous — or chuckle for the sensible)…)

My response to Hargreaves on copyright reform: I request the removal of contractual restrictions and independent oversight

Jenny Molloy, Diane Cabell, Laura Newman and I have been working to create a considered, hopefully powerful and constructive report to the Hargreaves report recommending the reform of UK copyright. (This is not a formal OKF response – OKF deliberately does not pursue advocacy – but has been done using OKF community processes and tools). We have created a response from all of us, but I felt that I could give personal evidence about the effect of the current publisher-imposed contractual and technical restrictions on information mining.


I shall comment later in detail (and hope that this will generate lively discussion). Here I simply highlight my claim that the downstream market for chemical information alone is at least a billion and that much value is lost through the restrictions. I outline some of the types of lost value and, while some are slightly anecdotal, I hope they are compelling. I also make the case for removing control from the publishers to an independent body.


I thank Jenny, Diane and Laura for help.


Dear Mr Taffy Yui


Please find below a response to the IPO [Intellectual Property Office] copyright consultation from Peter Murray-Rust (pm286@cam.ac.uk)

Jenny Molloy
Coordinator, Open Science Working Group
Open Knowledge Foundation

Personal experience and evidence from Professor Peter Murray-Rust.

I have been involved in developing and deploying text and other forms of data mining in chemistry and related sciences (e.g. biosciences and material sciences) for ten years. I have developed open source tools for chemistry (OSCAR [1], OPSIN [2], ChemicalTagger [3]), which have been developed with funding from EPSRC, JISC, DTI and Unilever PLC. These tools represent the de facto open source standard and are used throughout the world. In November 2011, I gave an invited plenary lecture on their use to LBM 2011 (Languages in Biology and Medicine) in Singapore [4]. 

These tools are capable of very high throughput and accuracy. Last week we extracted and analysed 500,000 chemical reactions from the US patent office service; approximately 100,000 reactions per processor per day. Our machine interpretation of chemical names (OPSIN) is over 99.5% accurate, better than any human. The extractions are complete, factual records of the experiment, to the extent that humans and machines could use them to repeat the work precisely or to identify errors made by the original authors. 

It  should be noted that many types of media  other than text provide valuable scientific information, especially graphs and  tables, images of scientific phenomena, and audio / video captures  of scientific  factual material. Many publishers and rights agencies would assert that graphs and machine-created images were subject to copyright while I would call them “facts”. I therefore often use the term ”information mining” rather than “text mining”. 

It is difficult to estimate the value of this work precisely, because we are currently restricted from deploying it on the current scientific literature by contractual restrictions imposed by all major publishers. However it is not fanciful to suggest that our software could be used in a “Chemical Google” indexing the scientific literature and therefore potentially worth low billions.

Some indications of value are:

1. My research cost £2 million in funding, and because of its widespread applicability, would be conservatively expected to be valued at several times that amount. The UK has a number of highly valued textmining companies such as Autonomy [5], Linguamatics [6], and Digital Science (Macmillan) [7]. Our work is highly valuable to them, as they both use our software [under Open licence] and recruit our staff when they finish. In this sense already, we have contributed to UK wealth generation.

2. The downstream value of high quality, high throughput chemical information extracted from the literature can be measured against conventional abstraction services, such as the Chemical Abstracts Service of the ACS [8] and Reaxys [9] from Elsevier, with a combined annual turnover of perhaps $500-1,000 million dollars. We believe our tools are capable of building the next and better generation of chemical abstraction services, and they would be direct competitors in this high value market. This supports our valuation of chemical textmining in the low billions.

3. The value of the tools themselves is difficult to estimate, but Chemical Informatics has for many years been a traditional SME activity in the UK and would have been expected to grow if textmining had been permitted. Companies such as Hampden Data services, ORAC, Oxford Molecular, Lhasa have values in the 10-100 millions.

4. I come from a UK pharmaceutical industrial background (15 years in Glaxo). I know from personal experience and discussions with other companies that it is not uncommon for drugs which fail to have post-mortems showing that the reason for failure could have been predicted from the original scientific literature, had it been analysed properly. Such failures can run to $100 million and the lack of ability to use the literature in an effective modern manner must contribute to serious loss of both effort and opportunity. My colleague Professor Steve Ley has estimated that because of poor literature analysis tools 20-25% of the work done in his synthetic chemistry lab is unnecessary duplication or could be predicted to fail. In a 20-year visionary EPSRC Grand Challenge (Dial-a-molecule) Prof Richard Whitby of Southampton is coordinating UK chemists, including industry, to design a system that can predict how to make any given molecule. The top priority is to be able to use the literature in an “artificially intelligent manner” where machines rather than humans can process it, impossible without widespread mining rights.

5. The science and technology of information mining itself is seriously held back by the current contractual restrictions. The acknowledged approach to building quality software is to agree on an open, immutable, ‘gold standard’ corpus of relevant literature, against which machine learning methods are trained. We have been forbidden by rights holders from distributing such corpora, and as a result our methods are seriously delayed (I estimate by at least three years) and are impoverished in their comprehensiveness and applicability. It is difficult to quantify the lost opportunities, but my expert judgement is that by linking scientific facts, such as those in the chemical literature, to major semantic resources such as Linked Open Data [10] and DBPedia [11] an enormous number of potential opportunities arise, both for better practice, and for the generation of new wealth generating tools. 

Note: Most of my current work involves factual information, and I believe is therefore not subject to copyright. However, it is impossible to get clarification on this, and publishers have threatened to sue scientists for publishing factual information. I have always erred on the side of caution, and would greatly value clear guidelines from this process, indicating where I have an absolute right to extract without this continuing fear. 

In response to Consultation Question 103 

“What are the advantages and disadvantages of allowing copyright exceptions to be overridden by contracts? Can you provide evidence of the costs or benefits of introducing a contract-override clause of the type described above?”

The difficulties I have faced are not even due to copyright problems as I understand it, but to additional contractual and technical barriers imposed by publishers to access their information for the purposes of extracting facts and redistributing them for the good of science and the wider community.

The barriers I have faced over the last five years appear common to all major publishers and include not only technical constraints (e.g. the denial of literature by publisher robot technology) but also difficulties in establishing  copyright/contractual restrictions, which I do not wish to break. It is extremely difficult to get clear permissions to carry out any work in this field, and while a court might find that I had not been guilty of violating copyright/contract, I cannot rely on this. Therefore, I have taken the safest course of not deploying my world leading research. 

Among the publishers with which I have had correspondence are Nature Publishing Group, American Chemical Society, Royal Society of Chemistry, Wiley, Elsevier, Springer. None have given me explicit permission to use their content for the unrestricted access of scientific facts by automated means and many have failed even to acknowledge my request for permission. I have for example challenged the assertion made by the Public Research Consortium that ‘publishers seem relatively liberal in granting permission’ for content mining. [12]

In conclusion, I stress that any need to request permissions drastically reduces the value of text mining. I have spent at least a year’s worth of my time attempting to get permissions as opposed to actually carrying out my research. At LBM 2011, I asked other participants, and they universally agreed that it was effectively impossible to get useful permissions for text mining. This is backed up by the evidence of Max Haussler to the US OSTP [13] and his comprehensive analysis of publisher impediments where it has taken some publishers over two years to agree any permissions, while many others have failed to respond within 30 days of being asked [14]. I do not believe therefore, that this problem can be solved by goodwill assertions from the publishers. Part of the Hargreaves initiated reform should be to assert the rights that everyone has in using the scientific factual literature for human benefit. 

In response to Consultation Question 77 

“Would an exception for text and data mining that is limited to non commercial research be capable of delivering the intended benefits? Can you provide evidence of the costs and benefits of this measure? Are there any alternative solutions that could support the growth of text and data mining technologies and access to them?”

Non-commercial clauses are completely prejudicial to effective use of text mining, because many of the providers and consumers will be commercial. For example, the UK SMEs could not use a corpus produced under these conditions, nor could they develop added downstream value. 

I have had discussions with several publishers who have insisted on imposing NC restrictions on material. They are clearly aware of its role, and it is difficult to understand their motives in insisting on NC, other than to protect the publishers’ own interests by denying the widespread exploitation of the content. In two recent peer-reviewed papers, it has been convincingly shown that NC adds no benefits, is almost impossible to operate cleanly, and is highly restrictive of downstream use. [15, 16]

Alternative Solutions:
These contractual restrictions have been introduced unilaterally by publishers without effective challenge from the academic and wider community. The publishers have shown that they are not impartial custodians of the scientific literature. I believe this is unacceptable for the future and that a different process for regulation and enforcement is required. The questions I would wish to see addressed are:
Which parts of the scientific literature are so important that they should effectively be available to the public? One would consider, at least:
facts (in their widest sense, i.e. including graphs, images, audio/visual)
additional material such as design of experiments, caveats from the authors, discussions, 

metadata such as citations, annotations, bibliography

Who should decide this?
 It must not be the publishers. Unfortunately many scientific societies also have a large publishing arm (e.g. Royal Soc Chem) and they cannot be seen as impartial. 
I would suggest either the British Library, or a subgroup of the RCUK and other funding bodies
How show it be policed and conflicts resolved? 

Where possible the regulator I propose should obtain agreement from all parties before potential violation. If not possible, then the onus should be on the publishers to challenge the miners, thought the regulator. Ultimately there is always final recourse to the law.

[1] http://www.jcheminf.com/content/3/1/41;

[2] http://pubs.acs.org/articlesonrequest/AOR-PcYgSy87ettZWfqyvHmN

[3] http://www.jcheminf.com/content/3/1/17

[4] http://lbm2011.biopathway.org/

[5] http://www.autonomy.com/

[6] http://www.linguamatics.com/;

[7] http://www.digital-science.com/

[8] http://www.cas.org/

[9] https://www.reaxys.com/info/

[10] http://linkeddata.org/

[11] http://dbpedia.org/About

[12] Smit, Eefke and van der Graaf, Maurits, ‘Journal Article Mining’, Publishing Research Consortium, Amsterdam, May 2011. http://www.publishingresearch.net/documents/PRCSmitJAMreport20June2011VersionofRecord.pdf.

[13] http://www.whitehouse.gov/sites/default/files/microsites/ostp/scholarly-pubs-%28%23226%29%20hauessler.pdf

[14] See also Max Haeussler, CBSE, UC Santa Cruz, 2012, tracking data titled

Current coverage of Pubmed, Requests for permission sent to publishers, at http://text.soe.ucsc.edu/progress.html

[15] Hagedorn, Mietchen, Morris, Agosti, Penev, Berendsohn & Hobern, ‘Creative Commons licenses and the non-commercial condition: Implications for the re-use of biodiversity information’, ZooKeys 150 (2011) : Special issue: 127-149, ‘e-Infrastructures for data publishing in biodiversity science’; 

[16] Carroll MW (2011) Why Full Open Access Matters. PLoS Biol 9(11): e1001210. doi:10.1371/journal.pbio.1001210

PLoS ONE News and Media Round-Up

Increasing your vegetable and fruit intake could improve your appearance, according to a new study. Scientists from the University of St Andrews in Scotland observed 35 participants who increased their fruit and vegetable intake over a 6 week period. They noticed significant changes in the skin’s yellow and red coloring, due to the absorption of carotenoids. To measure the impact of this change, undergraduate students then viewed images of those individuals with increased pigmentation and reported the subject’s appearance as more attractive and healthy. You can read more about this article at NPR, The Huffington Post and ABC News.

Fossil remains found in China’s Yunnan Province provide evidence of a prehistoric human species researchers are calling the “Red Deer Cave people”, as they were thought to feed on an extinct species of native deer. According to radiocarbon dating, this population lived just 14,500 to 11,500 years ago, and that these remains possess both modern (H. sapiens) and archaic (putative plesiomorphic) traits making the findings rather unusual. National Geographic, The Guardian and The History Channel covered this study.

In January of 2011, Daryl Bem of Cornell University published a study in the Journal of Personality and Social Psychology suggesting the existence of precognition, or the ability to predict future events. Dr. Bem invited other scientists in the field to replicate the study, to encourage scientific credibility. A team of researchers, led by Dr. Stuart Ritchie independently replicated the study three times, and were unable to replicate the results. The Chicago Tribune The Guardian and MSNBC covered this story.

No other animal can bite as powerfully as the crocodile, according to a new study covered by National Geographic, The New York Times and The Huffington Post. For the first time, scientists from the University of Florida used a transducer, a device that converts pressure into an electrical signal, to record bite forces and tooth pressures in all 23 existing crocodilian species. They found that the Crocodylus porosus, or the saltwater crocodile, bites with 3,689 pounds of force, the highest recorded of any living creature.

For more in-depth coverage on news and blog articles about PLoS ONE papers, please visit our Media Tracking Project.

Copyright for expression of ideas; patent law for ideas

This post is a second reply to a post David Prosser wrote on the GOAL list in response to my post on the RCUK consultation, highlighting the intellectual property issues. This post is a mixture of answers, my perspectives, and questions. In my opinion, David Prosser’s brief example raises a number of issues which can help us to move forward with understanding libre open access. In brief, I argue that facilitating data and text mining and resulting works does not involve copyright at all (crawling text and data is simply normative in the context of the world wide web, for example), but rather making works openly available, and in a format that permits text and data mining.

On 18-Mar-12, at 5:07 AM, David Prosser wrote:

Say I wanted to data mine 10,000 articles.  I’m at a university, but I am co-funded by a pharmaceutical company and there is a possibility that the research that I’m doing may result in a new drug discovery, which that company will want to take to market.  The 10,000 articles are all ‘open access’, but they are under CC-BY-NC-SA licenses.  What mechanism is there by which I can contact all 10,000 authors and gain permission for my research?


First, before I comment on intellectual property issues, I would like to point out that the concept of “intellectual property” is a relatively recent invention, and one that arguably should be challenged. For details, see the second chapter of my draft thesis; from here, search for: The invention of “intellectual property”: enclosure of knowledge. Also, a disclaimer that I am a scholar whose work intersects with intellectual property issues, but not a copyright lawyer or expert.  Given that the arguably fictional “intellectual property” is legally nonfiction throughout most of the world, following are some reflections arising from David’s example.

Copyright covers the expression of ideas, not the ideas themselves. If a researcher employed by a pharmaceutical firm were to read 10,000 articles and this research resulted in an idea for a new drug, the pharmaceutical firm would not need to seek permission from any of the authors of the articles in order to apply for a patent. Text-mining is merely an automated form of reading, so again, no need to seek permission from authors to apply for a patent. The World Intellectual Property Organization (WIPO) provides a brief overview of intellectual property which explains well the various forms. In brief, there are about 5 forms of intellectual property, many of which actually have opposing expectations. Patent law is a public declaration of rights to use an idea or procedure, and openness is appropriate. Patent law is designed to protect rights to private profit. Trade secret law is also designed to protect private property, however in this case the protection is achieved through secret, private means rather than a public, open process.

The question of whether copyright permissions are, or should be, necessary for data or text mining is an important issue to address when considering libre open access (including broader re-use rights in contrast to the free-to-read gratis open access). I argue that no special copyright related permissions are necessary. As evidence, here is a quick illustration:

Try a google search for: “To pursue, within the limits of the STM Association’s aims and objectives, the highest possible level of international protection of copyright works and of the services of publishers in making these works available” and it should be quite easy to find the Introduction to Copyright & Legal Affairs of the International Association of Scientific, Technical and Medical Publishers (STM):  http://www.stm-assoc.org/copyright-legal-introduction/ There is nothing on the STM website to indicate that special rights have been granted for text mining. STM is certainly not naive or neutral about intellectual property rights; the founding reason for the existence of STM in protection of IP. Yet clearly Google, a commercial company, is crawling this site and returning results. There is nothing the slightest bit exceptional about this example. This is how the world wide web works! If anyone wants to post things on the web but not make them available for crawling, it is up to the website owner to opt out by indicating that they do not want their site crawled.  

Some subscription-based scholarly publishers do not allow text or data mining of their databases. It seems likely that they are interpreting the multiple downloads often involved as pirating of their copyrighted content. That is, the basis for refusing to allow text or data mining is interpretation of the activity as a violation of copyright – or fear that the publisher cannot allow text or data mining while simultaneously preventing copyright violation – not because text or data mining actually violates copyright. If publishers’ products contain DRM preventing text or data mining, that is a different matter. Legal protection for the publishers in this instance involves DMCA style laws and contract law – not copyright law. Within the context of library subscriptions, data and text mining can be included in contracts. Here is the relevant text from the BC Electronic Library Network model license: 3.1.11 “DATA and TEXT MINING. Members and Authorized Users may conduct research employing data or text mining of the Licensed Materials”. This language is not original with BC ELN, but rather developed based on research on other model licenses, including those of JISC, CRKN, and OCUL. In the real world, copying this kind of work with informal permission but without attribution is actually the norm, as we all want to work towards standards and avoid re-inventing the wheel.

What is needed to provide for data and text mining, I argue, is not changes to copyright but rather content made available in formats that are easily crawled for these purposes, such as xhtml rather than locked-down PDFs, and made openly available over the World Wide Web.

I understand that Europe (as a whole, or just some countries) may have some odd laws that would prohibit text and data mining. This may help to explain why people are trying to use copyright law as a means of ensuring permissions for text and data mining. I would like to know more about this; if anyone can provide details, links, etc., that would be most helpful for all of us to really understand the issues.

My first response to David Prosser’s question, challenging the underlying assumption that increasing corporatization of the university is acceptable, can be found here.

Discussion is welcome.