A month or so ago Elsevier published a “click-through” licence “allowing” researchers to use Elsevier content for Text-and-Data-Mining (TDM) – more widely content mining. Nature News rejoiced and suggested everyone could start mining. I read the licence carefully and wrote several [start] blog posts [end] showing the great danger of anyone signing . Effectively DONT.
LIBER, the European association of Research Libraries flagged these and said it would do a thorough analysis which has now been published. http://www.libereurope.eu/news/liber-response-to-elsevier’s-text-and-data-mining-policy I’ll show most of this below with my comments. It’s necessarily long, so, to summarise:
- DON’T SIGN
- TELL EVERYONE ELSE NOT TO SIGN
- CROSS OUT ANY CLAUSES RESTRICTING MINING
Other publishers and publisher syndication – e.g. DOI resolvers – may develop their own TDM “licences”
So here’s why (summarised)
- The licences add additional restrictions and no freedoms
- Researchers could find themselves in legal trouble
- Libraries could find themselves in trouble
- Legislation is coming in UK and elsewhere which renders these licences unnecessary. You will simply be signing away your right
- Publishers’ APIs are worse than using the standard access to research papers
- You do NOT need publishers software. There is better Open Access software that is free.
So, if an Elsevier rep approaches you with a shiny new contract with a TDM clause, strike it out. YOU have the power. Tell the world.
Now the TL;DR bit. I reproduce much of LIBER and comment.
LIBER believes that the right to read is the right to mine and that that licensing will never bridge the gap in the current copyright framework as it is unscalable and resource intensive. Furthermore, as this discussion paper highlights, licensing has the potential to limit the innovative potential of digital research methods by:
- restricting the tools that researchers can use
- limiting the way in which research results can be made available
- impacting on the transparency and reproducibility of research results.
The full text of the discussion paper is included below or can by downloaded here.
PMR: Yes. LIBER and many others (JISC, BL, etc.) walked out of the attempt to force licences on us. My highlighting
Over the last twelve months LIBER has devoted a considerable amount of effortto making the case for the need for changes to copyright legislation in order to allow researchers to employ digital research methods to extract facts and data from content. We believe that this will exponentially speed up scientific progress and innovation in Europe. Having explored the issue of TDMwith our members and other stakeholders in the research community we have come to the conclusion that licensing will never bridge the gap in the current copyright framework as it is unscalable and resource intensive.
In the current vacuum left by a legal framework that is unfit for the digital age, and with the ensuing lack of legal clarity, it is unavoidable that libraries or researchers will have to agree to further licences for the mining of content to which they already have access. The terms of such licences, however, should be such that they reinforce the position that the right to read is the right to mine, and not impose restrictions on how researchers apply research methods or disseminate their research.
UK members should exercise particular caution when considering TDM licence terms, since an exception in UK law for text and data mining is imminent and, dependent on the wording in this new exception, TDM licence terms may undermine what researchers will be permitted to do under this update to UK copyright law. Ireland is also considering such an exception.
PMR: This has now been tabled (I shall blog it) and is substantially what has been drafted for the last year. It gives all the rights we felt we could ask for. Singing Elsevier’s contract or any other contract will simply restrict your rights.
This paper has been released in response to the recent launch of the new Elsevier text and data mining policy and API. It is understood that Science Direct licences will be amended to include language around access for TDM. Many libraries may be considering signing, or have even already signed up to the terms and conditions laid out under this new licence.
PMR: DONT sign. Much of what libraries have signed has restricted scholarship for no gains. STOP HERE>
Other publishers may also be considering following in the footsteps of Elsevier by introducing similar terms for the licensing of text and data mining activities into their licence agreements. LIBER is concerned that some of the licence’s terms and conditions relating to content mining may be unnecessarily restrictive and that systematic and widespread adoption of such terms and conditions will severely hamper the progress and dissemination of data-driven research.
PMR: DON’T EVEN LET THEM TRY.
The institutional licence agreement for text and data mining
In order for a researcher within a subscribing institution to gain access to Elsevier content for the purpose of mining, it is necessary for the institution to update their licence agreement to allow text mining access. Note that within this agreement “text mining access” does not mean access to the content on the Elsevier Website that universities subscribe to. Access to content for the purpose of mining is limited to access via an API. The licence explicitly prohibits the use of robots, spiders, crawlers or other automated programs, or algorithms to download content from the website itself, which are the most common ways of performing content mining. Although the new Elsevier policy claims that it “enshrines text- and data-mining rights” in subscription agreements, in reality, under these terms, it compels institutions to agree to very restrictive conditions in order to gain very narrowly defined “access” to content for the purpose of mining.
PMR: Elsevier’s API is constructed solely to reduce the view of the content, control the way it is accessed and monitor what is done. It is not necessary and has no beneficial process. (PLoS and BMC provide all that is necessary without APIs).
Access via an API
An application program interface (API) is a set of programming instructions and standards for accessing a web-based software application. In the case of the API offered by Elsevier, the API provides full-text content in XML and plain-text formats. The use of APIs for the mining of metadata is not uncommon. However, article content is much richer, potentially containing images, figures, interactive content, and videos. For researchers in many different disciplines there is as much value in the images and figures contained in the article as there is in the text. In fact, for researchers in disciplines such as the humanities, genetics, chemistry, these may be the most valuable content elements. The Elsevier API allows access to thetext only.And the access limit is an arbitrary and proportionally tiny 10,000 articles per week.
PMR: In the ContentMine we are already extracting data from images and expect to handle millions of figures a year.
Crucially, researchers develop their own tools for handling and exploiting this rich and diverse variety of content and formats. In order for students and academics to be able to perform research freely, in the way that makes sense for their own studies, they must have the freedom to interrogate, query and structure content in ways that fit with their own needs, technologies and requirements. The requirement to use pre-defined publisher technologies hampers academic freedom, learning, and data driven innovation.
PMR: Innovation is critical. Publishers have failed to innovate and held back innovation. We are innovating.
Even for those researchers for whom the API is sufficient, the licence does not guarantee sustained access to the API, as the following clause indicates:
3.4 Elsevier reserves the right to block, change, suspend, remove or disable access to the APIs and any of its services at any time.
PMR: Were you pleased when Elsevier or Nature tightened their policies on Green OA recently? They can do that on TDM.
Use of robots
The Elsevier policy expressly forbids the use of robots for content mining on the grounds that it would place too much strain on their infrastructure. Open access publishers, whose infrastructure is exposed to all web users on the open web,have reportedthat the demand placed on their infrastructure by robots for content mining is negligible and any increase in demand will be easy to manage. For subscription services such as those provided by Elsevier, the demand placed on their infrastructure should be even less, as only users registered at subscribing institutions will have access.
PMR: I can mine the whole literature on my laptop. That’s probably 0.00001% of daily usage. If that crashes Elsevier they shouldn’t be in the business. This argument is FUD.
Control of outputs
Under the terms and conditions of the updated licence agreement the outputs are controlled in the following ways:
1. Outputs can contain “snippets” of up to 200 characters of the original text
This is an arbitrary limit. Because this is essentially a limit on the amount of text that can be quoted from the original source, it could potentially result in misquotation or, at the very least, an inaccurate representation of the original research.
PMR: some chemical names are > 200 characters. Truncating these could KILL PEOPLE.
2. Licensed as CC-BY-NC
In signing up to the Elsevier licence agreement, researchers are asked to agree to make their output available under a CC-BY-NC licence. The outputs of TDM are very often facts and data, which are not subject to copyright; however, the Elsevier licence agreement stipulates that this non-copyright information should be put under a licence for copyright works.
In addition, the definition of “non-commercial” is highly ambiguous and open to interpretation. In effect, a CC-BY-NC licence prevents downstream use of the results and may also put researchers who are performing research under a grant agreement that mandates that data be openly available in a difficult position. Universities are also increasingly engaging in, and being encouraged by governments to enter into business partnerships with, private business. This is known as the “knowledge transfer agenda”. We recommend that universities and researchers decide before signing the Elsevier licence whether there is a possibility that the outputs of the research they wish to undertake are commercial. As facts and data are not copyrightable, LIBER’s position is that they should be made available under a CC0 licence.
PMR: The only reasonable way to publish scientific Facts is CC0. We enshrined this in the Panton Principles. These are , for example, endorsed by BMC and Cameron Neylon of PLoS is a co-author
Registration and click-through licences
In order for an individual researcher to gain access to the Elsevier content that their institution subscribes to, he/she must register directly with the Elsevier developers portal, provide details about the research they wish to undertake, and agree to the terms of a click-through licence. LIBER is particularly concerned about making such demands of researchers for the following reasons:
1. We want to protect the privacy of our users.
Libraries have a strong track record of putting measures in place to protect the personal details and reading habits of our patrons. By requiring researchers to register individually and to provide details of their research project, Elsevier is circumventing the protections that libraries have put in place. The reason given by Elsevier for this requirement is that the publisher needs to check the credentials of the individual accessing the content. However, in authenticating individual user accounts the institution has already established the bona fide nature of the researcher. Further verification should not be necessary. We object to data about the research being performed by our users in our institutions being collected by an external third party. It is not the job of a publisher to control, monitor and vet what research takes place at a university.
2. We want to protect our researchers from undue liability.
Many institutions employ full time experts to negotiate the terms and condition of licence agreements on their behalf. This process can take months, and yet, a researcher is expected to agree to the Elsevier click-through licence in a matter of seconds. The terms of this click-through licenceare extremely complex, in many places unclear and could haveserious down-stream implications for the outputs of the research. We also note that there is no cap on liabilities for a researcher:
2.3 The User will be solely responsible for all costs, expenses, losses and liabilities incurred, and activities undertaken by the User in connection with TDM Service. [BOLD here is from LIBER]
What is more, Elsevier retain the right to amend the terms, without notice and the changes will be deemed accepted by the researcher immediately. This is unacceptable.
Many of the responsibilities that are placed on the researcher by the click-through licence will be difficult to implement in practice e.g. the licence states that copyright notices may not be changed from how they appear in the dataset. This means that in a dataset derived from 10,000 articles there may be at least 10,000 appearances of the word “copyright”. A normal way of dealing with this “noise” would be to remove these irrelevant data from the dataset, but this would contravene the terms of the licence.
The click-through licence also makes it impossible to ensure the transparency and reproducibility of research results as the researcher may not share the dataset used for the research project and must delete it after use. The researcher is also expressly prohibited from depositing this dataset in their institutional repository.
Lastly, the licence is silent on post-termination use of the results of content mining. The licence will be terminated if the subscribing university “does not maintain a subscription to the book and journal content in the ScienceDirect® database”.If a researcher has mined thousands of articles, how do they check that each and every one is being subscribed to? If one or many are cancelled, what does this mean for the results, categorisations and hypotheses contained in data they have invested time and effort to produce?
PMR: Can anyone suggest that these terms are good for science?
We estimate that European universities spend in the region of €2 billion a year on Scientific Technical and Medical published content, the vast majority of which is on e-journal subscriptions. The new Elsevier licence terms and added requirement of an additional licence for each and every researcher who wishes to mine the content raises questions about what institutions are actually purchasing when subscribing to digital information. The implication of the Elsevier TDM policy is that institutions only purchase the right to cache, look at, print out, and do a word-search on a PDF. We believe that universities should be able to employ computers to read and analyse content they have purchased and to which they have legal access. An e-subscription fee is paid so that universities can appropriately and proportionately use the content they subscribe to. For what other purpose is a university buying access to information?
Research and innovation is best encouraged in a free-thinking and enabling environment where researchers can fully exploit the content they have access to through their library. Going forward, it is important that libraries can ensure that the scientific freedom of their researchers is not eroded, and the impact of their scientific outputs undermined, by limits imposed through licences.
This licence is recommended so that reuse is not prevented under the sui-generis Database Directive.
Terms used in the licence such as “recognition” and “classification” (2.1.1) are unclear. Another crucial, term “integration” (3.3) has been left undefined.
PMR: In summary, the ONLY reason for Elsevier’s licence is to give them stranglehold over this new technology. Libraries gave away author’s rights (they should have flagged this and communally refused to let it happen).
Any library who signs a publishers’ TDM clause will destroy the new information-led science.
Even if you aren’t in UK it is very probable that it is legally allowed to extract facts. The only thing stopping you doing it is the additional clause you have agreed to with the publisher.
Kill the restrictive clauses you sign with the publisher. You don’t have to.