Content Mining: Why you and I should NOT sign up for Elsevier’s TDM service

In the last few days Elsevier has announced their policy on Text And Data Mining (TDM). I use the term “content mining” as I wish to mine every part of published content (images, audio, video) and not just text. The policy was announced here .

This post contains a lot of material (from Elsevier and my comments) so I’ll try to summarise. Note that Elsevier’s material seems inconsistent in places (common with this publisher). I have had to go behind Elsevier’s paywall to find one statement of agreement and rights and it is probable that I have not found everything. In essence:

  • Elsevier asserts complete control over “its” content and requires both institutions and individuals to sign licences.
  • Elsevier is the sole author and controller of the policy – there has been no Open discussion or agreement with scholarly bodies
  • Libraries have to – individually – sign agreements with Elsevier. There are no details of these policies or whether they entail additional institutional payment. It is also possible that Institutions may be asked to give up content-mining rights in return for lower overall prices. (Libraries have universally and unilaterally given away all these rights over the last decade and support publishers to forbid machine access to content).
  • Researchers have to register as a developer (I think) and ask permission of Elsevier for every project they wish to do. It is not clear whether permission is automatic or whether Elsevier exercise control over choice and scope of project (they certainly did when I “negotiated” with them ).
  • Researchers can only access content through an Elsevier-controlled portal. They have to register as a Developer and get an APIKey (conflicts with “sign a click-through licence”).
  • Researchers can only mine text. Images are specifically prohibited. This is useless for me – as I and colleagues are mining chemical structure diagrams.
  • There is no indication of how current the material will be. I shall be mining the literature an hour after it appears. Will the API provide that?
  • The amount that can be republished is often useless (“200 characters”). I want to build corpora (impossible); vocabularies (essential to record precise words – impossible); chemical names (often > 200 characters so impossible). Figure captions (impossible).
  • The researchers must commit to a CC-NC licence. This effectively kills downstream use (I shall use CC0). It also trains them into thinking CC-NC is a “good thing”. It isn’t.
  • If a researcher has a LEGITIMATE collection of papers that they wish to mine (say on their hard disk) they are forbidden. They have to go to each publisher (if this awful protocol is promoted elsewhere) and find the API and mine the individual papers. Absurd.


    This is licence-controlled TDM. The publishers tried very hard to get Europe (Neelie Kroes) to agree to licences for TDM (“Licences for Europe”). They failed.

    They tried to stop the UK Hargreaves process exempting data analytics from copyright reform. They failed.


    The leading library organizations and funders such as the British Library, JISC, LIBER, Wellcome Trust, RCUK are united in their opposition to licences. This is simply Licences under another head.


    The danger is that University libraries – who have signed these restrictive clauses for years will continue to sign them.




    Don’t take my word for this. Ask the BL, or JISC or LIBER.






    APIs make it HARDER to mine. We are releasing technology that will work directly on PDFs. It’s Open Source and works. And others are doing the same. If every publisher came up with a similar process it would make the burden of mining huge. This is probably what some publishers hope.

Here are the supporting docs. I have emphasized some parts: (In front of paywall)

How to gain access

For Academic subscribers once your institutional agreement has been updated to allow text-mining access, individual researcher access is an automatic process, managed through our developer portal. Researchers will need to follow three steps:

  1. Register their details using the online form on the developer’s website
  2. Agree to our Text Mining conditions via a “click-through” agreement
  3. Receive an API token that will allow you to access ScienceDirect content (delivered in an XML format suitable for text mining)

Terms and conditions of text and data mining

  • Text mining access is provided to subscribers for non-commercial purposes
  • Access is via the ScienceDirect APIs only
  • Text mining output adheres to the following conditions:

1. Output can contain “snippets” of up to 200 characters of the original text

2. Licensed as CC-BY-NC

3. Includes DOI link to original content

Note: We request that all access to content for text mining purposes takes place through our APIs and remind you that in order to maintain performance and availability for all users, the terms and conditions of access to ScienceDirect continue to prohibit the use of robots, spiders, crawlers or other automated programs, or algorithms to download content from the website itself. (behind paywall?)

Text mining of Elsevier publications


Revision history

Definition: the client application is a system that ingests full-text publications in order to text-mine them: extract data and information using automated algorithms. Examples of text mining are entity recognition, relationship extraction, and sentiment analysis using linguistic methods.

We allow this use case under the following conditions:

  • Access to the APIs for text mining purposes is available free of charge to researchers at academic institutions that subscribe to The full-text content that is available for mining through the APIs is the content that the institute has subscribed to [PMR it’s TEXT ONLY].
  • Our APIs must be used to retrieve the content; crawling the website itself is not allowed.
  • The institution needs to have written permission from Elsevier for text mining, either through a clause embedded in an existing subscription agreement or as a separate add-on agreement.
  • After permission is granted, researchers at the institution will be able to obtain an APIKey by registering their text mining project through the ‘My Projects’ page of the Elsevier Developer Portal.
  • The use of Elsevier content in text mining, and of the resulting output, should adhere to Elsevier’s TDM policy as outlined on

If your institution wants to get written permission for text minng, the institution’s authorized representative can request Elsevier to provide one, by contacting his/her Elsevier account manager or our Academic & Government Sales department.

If you want to mine Elsevier content for commercial purpose, please contact our Corporate Sales department.


All Dried Up? Modeling the Effects of Climate Change in California’s River Basins

Mono Lake
Whether you are trapped inside because of it, or mourning the lack of it, water is on everyone’s mind right now. Too much snow in the Midwest and Northeast has been ruining travel plans, while too little snow is limiting Californians’ annual ski trips. No one wants to drive three hours only to find a rocky hillside where their favorite slope used to be.

It’s hard to deny that abnormal things are happening with the weather right now. Recently, Governor Jerry Brown officially declared a state of emergency in California due to the drought and suggested that citizens cut water usage by 20%. With no relief in sight, it is important not only to regulate our current water use, but also to reevaluate our local programs and policies that will affect water usage in the future. So, how do we go about making these decisions without being able to predict what’s next? A recently published PLOS ONE article may offer an answer in the form of a model that allows us to estimate how potential future climate scenarios could affect our water supply.


Researchers from UC Berkeley and the Stockholm Environmental Institute’s (SEI) office in Davis, CA built a hydrology simulation model of the Tuolumne and Merced River basins, both located in California’s Central Valley (pictured above). Their focus was on modeling the sensitivity of California’s water supply to possible increases in temperature. When building the model, the authors chose to incorporate historical water data, current water use regulations, and geographical information to estimate seasonal water availability across the Central Valley and the San Francisco Bay Area. They then ran various water availability scenarios through the model to predict how the region could be affected by rising temperatures.

Using estimated temperature increases of 2°C, 4°C, and 6°C, the model predicted earlier snowmelts, leading to a peak water flow earlier in the year than in previous years. The model also forecasted a decreased river flow due to increased evapotranspiration (temperature, humidity, and wind speed). The water supply was also estimated to drop incrementally with each temperature increase, though it is somewhat cushioned by the availability of water stored in California’s reservoirs.


The authors used an existing model as an initial structure, and built upon it to include information on local land surface characteristics, evapotranspiration, precipitation, and runoff potential. Surrounding water districts were modeled as nodes and assigned a priority according to California’s established infrastructure and legislation. Using this information, the authors state that the tool is equipped to estimate monthly water allocation to agricultural and urban areas and compare it to historical averages for the same areas.

Though a broad model, the authors present it as a case study that provides estimates of longer-term water availability for the Central Valley and Bay Area, and encourage other areas to modify its design to meet the needs of their unique locales. Those of us looking for more specific predictions can also use the tool to create models with additional information and refined approximations, allowing flexibility for future changes in land use and policy. For now, we might have a good long-term view of our changing water supply and a vital tool as we race to keep up with our ever-changing world.

Citation: Kiparsky M, Joyce B, Purkey D, Young C (2014) Potential Impacts of Climate Warming on Water Supply Reliability in the Tuolumne and Merced River Basins, California. PLoS ONE 9(1): e84946. doi:10.1371/journal.pone.0084946

Image 1 Credit: Mono Lake by Stuart Rankin

Image 2 Credit: Figure 1 pone.0084946

Image 3 Credit: Figure 2 pone.0084946

The post All Dried Up? Modeling the Effects of Climate Change in California’s River Basins appeared first on EveryONE.

We – and that includes you – must preserve Net Neutrality

I have just signed a petition on Net Neutrality; written to the MEP/Rapporteur for the ITRE process; and written to my MEPs. Ten years ago that would have taken me all day. Now it takes under half an hour.

Some of you may think – “isn’t it a bit radical to be doing all this? The world’s reasonably OK. Theare visible?) politicians will look after it, won’t they? And everybody knows the value of the Web – it’s unthinkable it will be removed. And it’s not really my business”.

Well it IS.

We are in the middle of a digital war. A war between corporate interests and freedom. Because there are huge amounts of money to be made by controlling people and the flow of information. A typical idea would be a “two-speed internet” – one for those who can pay, and one for the rest. And on the high-speed we wouldn’t get all that spam because it would be controlled by those who know best what we want. Like Movie corporations; Or mega-science-publishers. (Do you really want a dedicated net where only rich science publishers are present?).

The forces of control have money. Their weapon is the lobbyist. I know, for example, that publisher money is being spent to stop me – yes PM-R and colleagues – developing a free-to-everyone approach to Content Mining (Text and Data Mining). Loss of Net Neutrality could kill content mining as the content would only be available on the mega-publishers’ private web.

We have people’s minds and energy. That’s very powerful but it relies on YOU. I hope I have convinced you to care. Then it’s easy. Go to:

They (the wonderful MySociety) will tell you what to do. All you have to know is your postcode. They’ll work out who your MEPs are. Write your letter. It’s worth making it personal – you’ll see below that I have included bits of me, and bits of the local region. (Cut-and-paste is collected, counted, but not read in detail).

So here’s how it’s done… (the names are worked out by WTT)


  • Richard Howitt MEP
  • Vicky Ford MEP
  • Geoffrey Van Orden MEP
  • Stuart Agnew MEP
  • David Campbell Bannerman MEP
  • Andrew Duff MEP
  • Robert Sturdy MEP


Thursday 30 January 2014

From: Peter Murray-Rust

Dear Geoffrey Van Orden, Richard Howitt, Robert Sturdy, Andrew Duff, Stuart Agnew, Vicky Ford and David Campbell Bannerman,

I am writing to you to urge you to vote and campaign for Net Neutrality, which is being debated in a few days time in ITRE. I have written to MEP/Rapporteur Pilar del Castillo
( :

I am writing to urge you to preserve Net Neutrality in the ITRE process at all costs. Europe invented the Web. I was privileged to be at CERN in 1994 to hear the scientist Sir Tim Berners-Lee launch the Semantic Web.

Tim’s vision is simple – at the 2012 Olympics his message was

“This is for everyone”.

Europe has made massive contributions to the Web. It stands to gain massively more. I have estimated to the UK government that in my own discipline of chemistry we stand to gain “low billions” worldwide by making knowledge free. In Europe alone the new uses of scientific information could generate huge wealth.

Restriction – such as a divided web – kills innovation. Innovation and free information are the foundation for a better future.

This is compelling in itself, but there are also special local reasons to support Net Neutrality. Cambridge and the Eastern Region are making outstanding advances in new technology and deploying them for both wealth generation and the betterment of our society and the planet. Free flow of information and ideas are fundamental to this. So by supporting Net Neutrality you will also be helping to strengthen the outstanding potential of our region.


The Missing Underwater Forests of Australia: Restoring Phyllospora comosa Around Sydney

restoration_Adriana Verges

Although seaweed is the dominant habitat-forming organism along temperate coastlines, one of the major macroalgae of Australia, Phyllospora comosa, has disappeared over the last forty years from the urban shores around Sydney, Australia. Human activity is likely related to the degradation of these habitats in urbanized areas: During the 1970s and 1980s, humans discharged large amounts of sewage from nearby cities along surrounding coasts. Unfortunately, despite significant improvements in water quality around Sydney since, Phyllospora has not returned. To test whether Phyllospora can ever be restored in reefs where it was once abundant, authors of a recent PLOS ONE paper transplanted Phyllospora into two reefs in the Sydney area. In this interview, corresponding author Dr. Alexandra Campbell from the University of New South Wales elaborates on the group’s research and the impact of these ‘missing underwater forests’:

You’ve said that “seaweeds are the ‘trees’ of the ocean”. Can you tell us a little more about your study organism, Phyllospora, and explain its importance for coastal ecosystems around Australia?

Phyllospora comosa (known locally as ‘crayweed’) grows up to 2.5 m in length and forms dense, shallow forests along the south-eastern coastline of Australia, from near Port Macquarie in New South Wales, around Tasmania to Robe in South Australia. Individuals appear to persist on reefs for around 2 years and are reproductive year round.

How do these ecosystems change with the reduction of seaweed forests?

Large, canopy-forming macroalgae provide structural complexity, food and habitat for coastal marine ecosystems and other marine organisms. When these habitat-formers decline or disappear, the ecosystem loses its complexity, biodiversity decreases and many ecosystem services are also lost. Losing large seaweeds from temperate reefs has analogous ecosystem-level implications to losing corals from tropical reefs.

We’re interested in learning more about how you got involved in this research. Can you tell us how you became interested in studying Phyllospora?

For my doctorate, I studied how changing environmental conditions may disrupt relationships between seaweeds and microorganisms – which are abundant and ubiquitous in marine environments – potentially leading to climate-mediated diseases. During my PhD, my colleagues (Coleman et al.) published a paper describing the disappearance of crayweed from the urbanised coastline of Sydney and hypothesised that the cause was the high volume, low treatment, near shore sewage outfalls that used to flow directly on to some beaches and bays in the city. I wondered whether this pollution may have disrupted the relationship between Crayweed and its microbial associates and that’s how I got involved in the project.

Why is the loss of canopy-forming macroalgae difficult to study retrospectively and how has this informed your current study?

Once an organism has disappeared from an ecosystem, it can be difficult to piece together the processes that caused its demise, particularly if the disappearance occurred several decades ago and the ecosystem state shifted dramatically as a consequence.  In our study, we hypothesized that poor water quality might have caused the decline of Phyllospora. There have been significant improvements in water quality in the region since the decline, but the species and ecosystems they used to support have failed to recover. To test whether the water quality has improved enough to allow recolonisation of this seaweed, we transplanted the seaweed back onto reefs where it was once abundant. The survival rates of transplanted seaweed were very good, suggesting that with a little help, this species may be able to recolonize Sydney’s reefs.

What were some of the difficulties you faced while conducting your research?

Moving hundreds of large seaweeds many kilometres from donor populations to the restoration sites was a big job. Thankfully, we received a great deal of help from many volunteers from the local community – mostly divers, with an interest in conserving and restoring the marine ecosystems they visit recreationally and value as a natural resource.

You’ve talked about Phyllospora ‘recruitment’ at one recipient site. Can you explain in greater detail what a ‘recruit’ is and how this is important for the success of a restoration site?

Phyllospora reproduces sexually, with gametes from male individuals fertilizing gametes from females, forming zygotes, which then attach themselves to the bottom (usually not very far from their parents) and grow into juvenile algae which we call ‘recruits’. In the context of restoration, the high level of recruitment (i.e. successful reproduction) we observed at our transplant site is very encouraging because it creates the possibility for the establishment of a self-sustaining population of Phyllospora at this site for the first time in many decades.

Why do seaweed forests receive less attention than other marine ecosystems, for example mangroves or coral reefs?

Most people don’t think about seaweeds very often. When they do, it’s usually because the sight, touch or smell of seaweed on the beach is annoying or offensive. Even the name “seaweed” conjures negative imagery so perhaps it’s a PR issue! Arguably, macroalgae have traditionally received less attention from marine ecologists than other marine ecosystems as well, with much more attention and funding going to coral reef research. With global patterns of declines of temperate, habitat-forming macroalgae, this needs to change and our understanding of the processes that affect seaweed populations needs to grow.

What would a successful restoration of underwater kelp forests mean for the ecosystem and for the local population?

It’s our hope that, by restoring habitat-forming macroalgae like Phyllospora, we will also enhance populations of other organisms that rely on this species for food or shelter. Detecting such follow-on benefits of our seaweed restoration program is the focus of ongoing research and our initial results are very encouraging.

You’ve mentioned that larger scale restoration would be a sound way of combating the grazing (herbivory) you saw. What is the next step forward for you?

Enhanced grazing may be another mechanism by which Phyllospora disappeared from these reefs (or perhaps why it’s failed to recover). The impacts of grazing we observed were site-specific, so further investigations in to why one place was so severely impacted by herbivores while the other was not, are needed. Our first step towards resolving this is to establish more numerous restoration patches of different sizes to see whether we can satiate the herbivores and whether smaller patches are more susceptible to grazing than larger patches.

For more PLOS ONE articles about the ‘trees of the ocean’, check out the way seaweed and coral interact in “Seaweed-Coral Interactions: Variance in Seaweed Allelopathy, Coral Susceptibility, and Potential Effects on Coral Resilience” and how ocean currents influence seaweed community organization in “The Footprint of Continental-Scale Ocean Currents on the Biogeography of Seaweeds”.

Citation: Campbell AH, Marzinelli EM, Vergés A, Coleman MA, Steinberg PD (2014) Towards Restoration of Missing Underwater Forests. PLoS ONE 9(1): e84106. doi:10.1371/journal.pone.0084106

Image: Adriana Vergés, co-author

The post The Missing Underwater Forests of Australia: Restoring Phyllospora comosa Around Sydney appeared first on EveryONE.

Read the February Issue of Evolutionary Applications Online!

eva_7_2_coverThe February Issue of Evolutionary Applications has been published online. This issue features an image of a lone grizzly bear in Alberta, Canada, which relates to a study by Shafer and colleagues linking the genotype, ecotype, and phenotype in grizzly bears (Ursus arctos). Toward this end, this issue also launches a new series of research highlights that will offer brief synopses of new work with direct relevance to readers of Evolutionary Applications from across other journals with the aim of exploring the breadth of potential applications of evolutionary theory from across fields and disciplines. The Editor-in-Chief Louis Bernatchez has highlighted the following articles as of particular interest:

purple_lock_open Genomic selection for recovery of original genetic background from hybrids of endangered and common breeds by Carmen Amador, Ben J. Hayes and Hans D. Daetwyler

Summary: The authors present two genomic selection strategies, employing genome-wide DNA markers, to recover the genomic content of the original endangered population from admixtures. They also compare the efficiency of both strategies using empirical 50K SNP array data from sheep breeds.

purple_lock_open Anthropogenic selection enhances cancer evolution in Tasmanian devil tumours by Beata Ujvari, Anne-Maree Pearse, Kate Swift, Pamela Hodson, Bobby Hua, Stephen Pyecroft, Robyn Taylor, Rodrigo Hamede, Menna Jones, Katherine Belov and Thomas Madsen

Summary: The Tasmanian Devil Facial Tumour Disease (DFTD) provides a unique opportunity to study cancer evolution in vivo. Since it was first observed in 1996, this transmissible cancer has caused local population declines by 90%. In this study the authors focus on the evolutionary response of DFTD to a disease suppression trial.  The results reveal that DFTD has the capacity to rapidly respond to novel human-induced selective regimes and that disease eradication may result in novel tumour adaptations.

purple_lock_open Linking genotype, ecotype, and phenotype in an intensively managed large carnivore by Aaron B. A. Shafer, Scott E. Nielsen, Joseph M. Northrup and Gordon B. Stenhouse

Summary: In this study, integrated GPS habitat use data and genetic profiling were used to determine the influence of habitat and genetics on fitness proxies (mass, length, and body condition) in a threatened population of grizzly bears (Ursus arctos) in Alberta, Canada. The authors found that homozygosity had a positive effect on fitness these proxies, which may be indicative of outbreeding depression unintentionally caused by massive translocations of bears over large geographic distances.

We encourage you to submit papers applying concepts from evolutionary biology to address biological questions of health, social and economic relevance across a vast array of applied disciplines. We also welcome submissions of papers making use of modern genomics or other molecular methods to address important questions in an applied evolutionary framework. For more information please visit the aims and scopes page.

Submit your article to Evolutionary Applications here >

Sign up to receive email content alerts here >

Latest Article Alert from Breast Cancer Research

The following new articles have just been published in Breast Cancer Research

For articles using Author Version-first publication you will see a provisional PDF corresponding to the accepted manuscript. In these instances, the fully formatted Final Version PDF and full text (HTML) versions will follow in due course.

Research article
Significant overlap between human genome-wide

Latest Article Alert from BMC Women’s Health

The following new articles have just been published in BMC Women’s Health

For articles using Author Version-first publication you will see a provisional PDF corresponding to the accepted manuscript. In these instances, the fully formatted Final Version PDF and full text (HTML) versions will follow in due course.

Research article
Is the health status of female victims poorer than males in the

Latest Article Alert from BMC Health Services Research

The following new articles have just been published in BMC Health Services Research

For articles using Author Version-first publication you will see a provisional PDF corresponding to the accepted manuscript. In these instances, the fully formatted Final Version PDF and full text (HTML) versions will follow in due course.

Research article
Mental health policy in Eastern Europe: a comparative

Latest Article Alert from BMC Infectious Diseases

The following new articles have just been published in BMC Infectious Diseases

For articles using Author Version-first publication you will see a provisional PDF corresponding to the accepted manuscript. In these instances, the fully formatted Final Version PDF and full text (HTML) versions will follow in due course.

Case report
Yersinia pseudotuberculosis enterocolitis mimicking enteropathic

Latest Article Alert from Allergy, Asthma & Clinical Immunology

The following new articles have just been published in Allergy, Asthma & Clinical Immunology

For articles using Author Version-first publication you will see a provisional PDF corresponding to the accepted manuscript. In these instances, the fully formatted Final Version PDF and full text (HTML) versions will follow in due course.

Aluminium adjuvants and adverse events in sub-cutaneous