Content Mining: Why you and I should NOT sign up for Elsevier’s TDM service

In the last few days Elsevier has announced their policy on Text And Data Mining (TDM). I use the term “content mining” as I wish to mine every part of published content (images, audio, video) and not just text. The policy was announced here .

This post contains a lot of material (from Elsevier and my comments) so I’ll try to summarise. Note that Elsevier’s material seems inconsistent in places (common with this publisher). I have had to go behind Elsevier’s paywall to find one statement of agreement and rights and it is probable that I have not found everything. In essence:

  • Elsevier asserts complete control over “its” content and requires both institutions and individuals to sign licences.
  • Elsevier is the sole author and controller of the policy – there has been no Open discussion or agreement with scholarly bodies
  • Libraries have to – individually – sign agreements with Elsevier. There are no details of these policies or whether they entail additional institutional payment. It is also possible that Institutions may be asked to give up content-mining rights in return for lower overall prices. (Libraries have universally and unilaterally given away all these rights over the last decade and support publishers to forbid machine access to content).
  • Researchers have to register as a developer (I think) and ask permission of Elsevier for every project they wish to do. It is not clear whether permission is automatic or whether Elsevier exercise control over choice and scope of project (they certainly did when I “negotiated” with them ).
  • Researchers can only access content through an Elsevier-controlled portal. They have to register as a Developer and get an APIKey (conflicts with “sign a click-through licence”).
  • Researchers can only mine text. Images are specifically prohibited. This is useless for me – as I and colleagues are mining chemical structure diagrams.
  • There is no indication of how current the material will be. I shall be mining the literature an hour after it appears. Will the API provide that?
  • The amount that can be republished is often useless (“200 characters”). I want to build corpora (impossible); vocabularies (essential to record precise words – impossible); chemical names (often > 200 characters so impossible). Figure captions (impossible).
  • The researchers must commit to a CC-NC licence. This effectively kills downstream use (I shall use CC0). It also trains them into thinking CC-NC is a “good thing”. It isn’t.
  • If a researcher has a LEGITIMATE collection of papers that they wish to mine (say on their hard disk) they are forbidden. They have to go to each publisher (if this awful protocol is promoted elsewhere) and find the API and mine the individual papers. Absurd.


    This is licence-controlled TDM. The publishers tried very hard to get Europe (Neelie Kroes) to agree to licences for TDM (“Licences for Europe”). They failed.

    They tried to stop the UK Hargreaves process exempting data analytics from copyright reform. They failed.


    The leading library organizations and funders such as the British Library, JISC, LIBER, Wellcome Trust, RCUK are united in their opposition to licences. This is simply Licences under another head.


    The danger is that University libraries – who have signed these restrictive clauses for years will continue to sign them.




    Don’t take my word for this. Ask the BL, or JISC or LIBER.






    APIs make it HARDER to mine. We are releasing technology that will work directly on PDFs. It’s Open Source and works. And others are doing the same. If every publisher came up with a similar process it would make the burden of mining huge. This is probably what some publishers hope.

Here are the supporting docs. I have emphasized some parts: (In front of paywall)

How to gain access

For Academic subscribers once your institutional agreement has been updated to allow text-mining access, individual researcher access is an automatic process, managed through our developer portal. Researchers will need to follow three steps:

  1. Register their details using the online form on the developer’s website
  2. Agree to our Text Mining conditions via a “click-through” agreement
  3. Receive an API token that will allow you to access ScienceDirect content (delivered in an XML format suitable for text mining)

Terms and conditions of text and data mining

  • Text mining access is provided to subscribers for non-commercial purposes
  • Access is via the ScienceDirect APIs only
  • Text mining output adheres to the following conditions:

1. Output can contain “snippets” of up to 200 characters of the original text

2. Licensed as CC-BY-NC

3. Includes DOI link to original content

Note: We request that all access to content for text mining purposes takes place through our APIs and remind you that in order to maintain performance and availability for all users, the terms and conditions of access to ScienceDirect continue to prohibit the use of robots, spiders, crawlers or other automated programs, or algorithms to download content from the website itself. (behind paywall?)

Text mining of Elsevier publications


Revision history

Definition: the client application is a system that ingests full-text publications in order to text-mine them: extract data and information using automated algorithms. Examples of text mining are entity recognition, relationship extraction, and sentiment analysis using linguistic methods.

We allow this use case under the following conditions:

  • Access to the APIs for text mining purposes is available free of charge to researchers at academic institutions that subscribe to The full-text content that is available for mining through the APIs is the content that the institute has subscribed to [PMR it’s TEXT ONLY].
  • Our APIs must be used to retrieve the content; crawling the website itself is not allowed.
  • The institution needs to have written permission from Elsevier for text mining, either through a clause embedded in an existing subscription agreement or as a separate add-on agreement.
  • After permission is granted, researchers at the institution will be able to obtain an APIKey by registering their text mining project through the ‘My Projects’ page of the Elsevier Developer Portal.
  • The use of Elsevier content in text mining, and of the resulting output, should adhere to Elsevier’s TDM policy as outlined on

If your institution wants to get written permission for text minng, the institution’s authorized representative can request Elsevier to provide one, by contacting his/her Elsevier account manager or our Academic & Government Sales department.

If you want to mine Elsevier content for commercial purpose, please contact our Corporate Sales department.


All Dried Up? Modeling the Effects of Climate Change in California’s River Basins

Mono Lake
Whether you are trapped inside because of it, or mourning the lack of it, water is on everyone’s mind right now. Too much snow in the Midwest and Northeast has been ruining travel plans, while too little snow is limiting Californians’ annual ski trips. No one wants to drive three hours only to find a rocky hillside where their favorite slope used to be.

It’s hard to deny that abnormal things are happening with the weather right now. Recently, Governor Jerry Brown officially declared a state of emergency in California due to the drought and suggested that citizens cut water usage by 20%. With no relief in sight, it is important not only to regulate our current water use, but also to reevaluate our local programs and policies that will affect water usage in the future. So, how do we go about making these decisions without being able to predict what’s next? A recently published PLOS ONE article may offer an answer in the form of a model that allows us to estimate how potential future climate scenarios could affect our water supply.


Researchers from UC Berkeley and the Stockholm Environmental Institute’s (SEI) office in Davis, CA built a hydrology simulation model of the Tuolumne and Merced River basins, both located in California’s Central Valley (pictured above). Their focus was on modeling the sensitivity of California’s water supply to possible increases in temperature. When building the model, the authors chose to incorporate historical water data, current water use regulations, and geographical information to estimate seasonal water availability across the Central Valley and the San Francisco Bay Area. They then ran various water availability scenarios through the model to predict how the region could be affected by rising temperatures.

Using estimated temperature increases of 2°C, 4°C, and 6°C, the model predicted earlier snowmelts, leading to a peak water flow earlier in the year than in previous years. The model also forecasted a decreased river flow due to increased evapotranspiration (temperature, humidity, and wind speed). The water supply was also estimated to drop incrementally with each temperature increase, though it is somewhat cushioned by the availability of water stored in California’s reservoirs.


The authors used an existing model as an initial structure, and built upon it to include information on local land surface characteristics, evapotranspiration, precipitation, and runoff potential. Surrounding water districts were modeled as nodes and assigned a priority according to California’s established infrastructure and legislation. Using this information, the authors state that the tool is equipped to estimate monthly water allocation to agricultural and urban areas and compare it to historical averages for the same areas.

Though a broad model, the authors present it as a case study that provides estimates of longer-term water availability for the Central Valley and Bay Area, and encourage other areas to modify its design to meet the needs of their unique locales. Those of us looking for more specific predictions can also use the tool to create models with additional information and refined approximations, allowing flexibility for future changes in land use and policy. For now, we might have a good long-term view of our changing water supply and a vital tool as we race to keep up with our ever-changing world.

Citation: Kiparsky M, Joyce B, Purkey D, Young C (2014) Potential Impacts of Climate Warming on Water Supply Reliability in the Tuolumne and Merced River Basins, California. PLoS ONE 9(1): e84946. doi:10.1371/journal.pone.0084946

Image 1 Credit: Mono Lake by Stuart Rankin

Image 2 Credit: Figure 1 pone.0084946

Image 3 Credit: Figure 2 pone.0084946

The post All Dried Up? Modeling the Effects of Climate Change in California’s River Basins appeared first on EveryONE.

We – and that includes you – must preserve Net Neutrality

I have just signed a petition on Net Neutrality; written to the MEP/Rapporteur for the ITRE process; and written to my MEPs. Ten years ago that would have taken me all day. Now it takes under half an hour.

Some of you may think – “isn’t it a bit radical to be doing all this? The world’s reasonably OK. Theare visible?) politicians will look after it, won’t they? And everybody knows the value of the Web – it’s unthinkable it will be removed. And it’s not really my business”.

Well it IS.

We are in the middle of a digital war. A war between corporate interests and freedom. Because there are huge amounts of money to be made by controlling people and the flow of information. A typical idea would be a “two-speed internet” – one for those who can pay, and one for the rest. And on the high-speed we wouldn’t get all that spam because it would be controlled by those who know best what we want. Like Movie corporations; Or mega-science-publishers. (Do you really want a dedicated net where only rich science publishers are present?).

The forces of control have money. Their weapon is the lobbyist. I know, for example, that publisher money is being spent to stop me – yes PM-R and colleagues – developing a free-to-everyone approach to Content Mining (Text and Data Mining). Loss of Net Neutrality could kill content mining as the content would only be available on the mega-publishers’ private web.

We have people’s minds and energy. That’s very powerful but it relies on YOU. I hope I have convinced you to care. Then it’s easy. Go to:

They (the wonderful MySociety) will tell you what to do. All you have to know is your postcode. They’ll work out who your MEPs are. Write your letter. It’s worth making it personal – you’ll see below that I have included bits of me, and bits of the local region. (Cut-and-paste is collected, counted, but not read in detail).

So here’s how it’s done… (the names are worked out by WTT)


  • Richard Howitt MEP
  • Vicky Ford MEP
  • Geoffrey Van Orden MEP
  • Stuart Agnew MEP
  • David Campbell Bannerman MEP
  • Andrew Duff MEP
  • Robert Sturdy MEP


Thursday 30 January 2014

From: Peter Murray-Rust

Dear Geoffrey Van Orden, Richard Howitt, Robert Sturdy, Andrew Duff, Stuart Agnew, Vicky Ford and David Campbell Bannerman,

I am writing to you to urge you to vote and campaign for Net Neutrality, which is being debated in a few days time in ITRE. I have written to MEP/Rapporteur Pilar del Castillo
( :

I am writing to urge you to preserve Net Neutrality in the ITRE process at all costs. Europe invented the Web. I was privileged to be at CERN in 1994 to hear the scientist Sir Tim Berners-Lee launch the Semantic Web.

Tim’s vision is simple – at the 2012 Olympics his message was

“This is for everyone”.

Europe has made massive contributions to the Web. It stands to gain massively more. I have estimated to the UK government that in my own discipline of chemistry we stand to gain “low billions” worldwide by making knowledge free. In Europe alone the new uses of scientific information could generate huge wealth.

Restriction – such as a divided web – kills innovation. Innovation and free information are the foundation for a better future.

This is compelling in itself, but there are also special local reasons to support Net Neutrality. Cambridge and the Eastern Region are making outstanding advances in new technology and deploying them for both wealth generation and the betterment of our society and the planet. Free flow of information and ideas are fundamental to this. So by supporting Net Neutrality you will also be helping to strengthen the outstanding potential of our region.


The Missing Underwater Forests of Australia: Restoring Phyllospora comosa Around Sydney

restoration_Adriana Verges

Although seaweed is the dominant habitat-forming organism along temperate coastlines, one of the major macroalgae of Australia, Phyllospora comosa, has disappeared over the last forty years from the urban shores around Sydney, Australia. Human activity is likely related to the degradation of these habitats in urbanized areas: During the 1970s and 1980s, humans discharged large amounts of sewage from nearby cities along surrounding coasts. Unfortunately, despite significant improvements in water quality around Sydney since, Phyllospora has not returned. To test whether Phyllospora can ever be restored in reefs where it was once abundant, authors of a recent PLOS ONE paper transplanted Phyllospora into two reefs in the Sydney area. In this interview, corresponding author Dr. Alexandra Campbell from the University of New South Wales elaborates on the group’s research and the impact of these ‘missing underwater forests’:

You’ve said that “seaweeds are the ‘trees’ of the ocean”. Can you tell us a little more about your study organism, Phyllospora, and explain its importance for coastal ecosystems around Australia?

Phyllospora comosa (known locally as ‘crayweed’) grows up to 2.5 m in length and forms dense, shallow forests along the south-eastern coastline of Australia, from near Port Macquarie in New South Wales, around Tasmania to Robe in South Australia. Individuals appear to persist on reefs for around 2 years and are reproductive year round.

How do these ecosystems change with the reduction of seaweed forests?

Large, canopy-forming macroalgae provide structural complexity, food and habitat for coastal marine ecosystems and other marine organisms. When these habitat-formers decline or disappear, the ecosystem loses its complexity, biodiversity decreases and many ecosystem services are also lost. Losing large seaweeds from temperate reefs has analogous ecosystem-level implications to losing corals from tropical reefs.

We’re interested in learning more about how you got involved in this research. Can you tell us how you became interested in studying Phyllospora?

For my doctorate, I studied how changing environmental conditions may disrupt relationships between seaweeds and microorganisms – which are abundant and ubiquitous in marine environments – potentially leading to climate-mediated diseases. During my PhD, my colleagues (Coleman et al.) published a paper describing the disappearance of crayweed from the urbanised coastline of Sydney and hypothesised that the cause was the high volume, low treatment, near shore sewage outfalls that used to flow directly on to some beaches and bays in the city. I wondered whether this pollution may have disrupted the relationship between Crayweed and its microbial associates and that’s how I got involved in the project.

Why is the loss of canopy-forming macroalgae difficult to study retrospectively and how has this informed your current study?

Once an organism has disappeared from an ecosystem, it can be difficult to piece together the processes that caused its demise, particularly if the disappearance occurred several decades ago and the ecosystem state shifted dramatically as a consequence.  In our study, we hypothesized that poor water quality might have caused the decline of Phyllospora. There have been significant improvements in water quality in the region since the decline, but the species and ecosystems they used to support have failed to recover. To test whether the water quality has improved enough to allow recolonisation of this seaweed, we transplanted the seaweed back onto reefs where it was once abundant. The survival rates of transplanted seaweed were very good, suggesting that with a little help, this species may be able to recolonize Sydney’s reefs.

What were some of the difficulties you faced while conducting your research?

Moving hundreds of large seaweeds many kilometres from donor populations to the restoration sites was a big job. Thankfully, we received a great deal of help from many volunteers from the local community – mostly divers, with an interest in conserving and restoring the marine ecosystems they visit recreationally and value as a natural resource.

You’ve talked about Phyllospora ‘recruitment’ at one recipient site. Can you explain in greater detail what a ‘recruit’ is and how this is important for the success of a restoration site?

Phyllospora reproduces sexually, with gametes from male individuals fertilizing gametes from females, forming zygotes, which then attach themselves to the bottom (usually not very far from their parents) and grow into juvenile algae which we call ‘recruits’. In the context of restoration, the high level of recruitment (i.e. successful reproduction) we observed at our transplant site is very encouraging because it creates the possibility for the establishment of a self-sustaining population of Phyllospora at this site for the first time in many decades.

Why do seaweed forests receive less attention than other marine ecosystems, for example mangroves or coral reefs?

Most people don’t think about seaweeds very often. When they do, it’s usually because the sight, touch or smell of seaweed on the beach is annoying or offensive. Even the name “seaweed” conjures negative imagery so perhaps it’s a PR issue! Arguably, macroalgae have traditionally received less attention from marine ecologists than other marine ecosystems as well, with much more attention and funding going to coral reef research. With global patterns of declines of temperate, habitat-forming macroalgae, this needs to change and our understanding of the processes that affect seaweed populations needs to grow.

What would a successful restoration of underwater kelp forests mean for the ecosystem and for the local population?

It’s our hope that, by restoring habitat-forming macroalgae like Phyllospora, we will also enhance populations of other organisms that rely on this species for food or shelter. Detecting such follow-on benefits of our seaweed restoration program is the focus of ongoing research and our initial results are very encouraging.

You’ve mentioned that larger scale restoration would be a sound way of combating the grazing (herbivory) you saw. What is the next step forward for you?

Enhanced grazing may be another mechanism by which Phyllospora disappeared from these reefs (or perhaps why it’s failed to recover). The impacts of grazing we observed were site-specific, so further investigations in to why one place was so severely impacted by herbivores while the other was not, are needed. Our first step towards resolving this is to establish more numerous restoration patches of different sizes to see whether we can satiate the herbivores and whether smaller patches are more susceptible to grazing than larger patches.

For more PLOS ONE articles about the ‘trees of the ocean’, check out the way seaweed and coral interact in “Seaweed-Coral Interactions: Variance in Seaweed Allelopathy, Coral Susceptibility, and Potential Effects on Coral Resilience” and how ocean currents influence seaweed community organization in “The Footprint of Continental-Scale Ocean Currents on the Biogeography of Seaweeds”.

Citation: Campbell AH, Marzinelli EM, Vergés A, Coleman MA, Steinberg PD (2014) Towards Restoration of Missing Underwater Forests. PLoS ONE 9(1): e84106. doi:10.1371/journal.pone.0084106

Image: Adriana Vergés, co-author

The post The Missing Underwater Forests of Australia: Restoring Phyllospora comosa Around Sydney appeared first on EveryONE.

Read the February Issue of Evolutionary Applications Online!

eva_7_2_coverThe February Issue of Evolutionary Applications has been published online. This issue features an image of a lone grizzly bear in Alberta, Canada, which relates to a study by Shafer and colleagues linking the genotype, ecotype, and phenotype in grizzly bears (Ursus arctos). Toward this end, this issue also launches a new series of research highlights that will offer brief synopses of new work with direct relevance to readers of Evolutionary Applications from across other journals with the aim of exploring the breadth of potential applications of evolutionary theory from across fields and disciplines. The Editor-in-Chief Louis Bernatchez has highlighted the following articles as of particular interest:

purple_lock_open Genomic selection for recovery of original genetic background from hybrids of endangered and common breeds by Carmen Amador, Ben J. Hayes and Hans D. Daetwyler

Summary: The authors present two genomic selection strategies, employing genome-wide DNA markers, to recover the genomic content of the original endangered population from admixtures. They also compare the efficiency of both strategies using empirical 50K SNP array data from sheep breeds.

purple_lock_open Anthropogenic selection enhances cancer evolution in Tasmanian devil tumours by Beata Ujvari, Anne-Maree Pearse, Kate Swift, Pamela Hodson, Bobby Hua, Stephen Pyecroft, Robyn Taylor, Rodrigo Hamede, Menna Jones, Katherine Belov and Thomas Madsen

Summary: The Tasmanian Devil Facial Tumour Disease (DFTD) provides a unique opportunity to study cancer evolution in vivo. Since it was first observed in 1996, this transmissible cancer has caused local population declines by 90%. In this study the authors focus on the evolutionary response of DFTD to a disease suppression trial.  The results reveal that DFTD has the capacity to rapidly respond to novel human-induced selective regimes and that disease eradication may result in novel tumour adaptations.

purple_lock_open Linking genotype, ecotype, and phenotype in an intensively managed large carnivore by Aaron B. A. Shafer, Scott E. Nielsen, Joseph M. Northrup and Gordon B. Stenhouse

Summary: In this study, integrated GPS habitat use data and genetic profiling were used to determine the influence of habitat and genetics on fitness proxies (mass, length, and body condition) in a threatened population of grizzly bears (Ursus arctos) in Alberta, Canada. The authors found that homozygosity had a positive effect on fitness these proxies, which may be indicative of outbreeding depression unintentionally caused by massive translocations of bears over large geographic distances.

We encourage you to submit papers applying concepts from evolutionary biology to address biological questions of health, social and economic relevance across a vast array of applied disciplines. We also welcome submissions of papers making use of modern genomics or other molecular methods to address important questions in an applied evolutionary framework. For more information please visit the aims and scopes page.

Submit your article to Evolutionary Applications here >

Sign up to receive email content alerts here >

Saulius Gražulis gets #blueobelisk award: if you want Open Crystallography go to COD

Today I presented Saulius Gražulis with a Blue Obelisk for his, and his colleagues’ , work on making Crystallography Open for everyone through the Crystallography Open Database (COD).

The is a very loose collaboration of people who work together on a semi-structured basis to create or liberate or otherwise provide:


In chemistry and related sciences. This is very much valued and many people and companies use BO software. Because it’s open then don’t have to ask our permission. They don’t have to say thank you (though it’s nice). But they do have to acknowledge author’s moral rights (i.e. acknowledge who wrote the software).

In macromolecules there’s an abundance of Open data – the Protein Databank for example. But in “small molecules” or minerals or materials there’s effectively no organized source – apart from the COD. It is hard to build up a voluntary database and keep it running bt that’s what Saulius has done.

And much more than that – it is improving rapidly. With the addition of our Crystaleye structures and chemical software the COD is now able to offer a wide range of crystallographic knowledge.

For example together with the PDB it’s the only crystallographic database in the Linked Open Data Cloud. Today we had confirmation that LOD2 was happy to work with us to include the RDFised COD. So here it is :

You can see PDB close by and Bio2RDF at the bottom. (We’ve talked with Michel Dumontier about how to link to that). So the semantic Web will recognize that if people want semantic crystallography they can come to COD.

And we stress that COD data is not only free but can be used for any purpose without permission. You can build programs round it, sell them, derive forcefields, create reference data tables, use it for validation, compute the structures and properties with QM or MM programs, etc. It’s truly OPEN and the largest data set (I think) in the BO collection.

We also applaud BO’s NMRShiftDB and will actively work to link this to COD.

And COD covers all disciplines – organic, organometallic, inorganic – no other database does that. And no other database allows you to link out to other disciplines and link back in.

Moreover COD will start exposing molecular structures. Often chemists find crystal structures too complicated – they want single molecules (“moieties”). Nick Day did that in Crystaleye and we’ve transferred the software to COD.

And every new addition to the BO repertoire increases its value n-squared.

Thank you Saulius – your obelisk will be in the post.

ChemistryOpen hot off the press!

Chemopen The current Issue of ChemistryOpen includes an exciting Communication on a Non-ATP-Mimetic Organometallic Protein Kinase Inhibitor. Eric Meggers, Holger Steuber and co-workers present an organometallic inhibitor scaffold for Pim kinases. These are interesting targets for cancer therapy as they are overexpressed in various human cancers. Usually kinase inhibitors are ATP competitive. However, as shown in a cocrystal structure with Pim1, their presented organometallic compound (based on a cyclometalated 1,8-phenanthrolin-7(8H)-one ligand) presents an unexpected non-hinge binding scaffold and could be a suitable lead structure for the development of potent and selective non-hinge-binding ATP-competitive inhibitors of Pim kinases.

In the Full Paper of this issue, Giampaolo Barone, F. Matthias Bickelhaupt and co-workers report on their dispersion-corrected density functional studies for the investigation of the DNA double helix structure. They calculate how B-DNA structure stability correlates with its nucleic acid composition and are able to show that the stability of the structure not only depends on the number of hydrogen bonds in Watson-Crick base pairs but also depends on the base pair order and orientation.

The newest contribution to ChemistryOpen‘s Thesis Treasury from Rafael Gramage-Doria features metallocyclodextrins. In his thesis he found that upon encapsulation of metal fragments in the cavity of a b-cyclodextrin-derived diphosphane, otherwise unstable metal species can be formed and coordination processes can be slowed down, which allows investigating mechanistic pathways for carbon-carbon bond-forming reactions.

To read all open-access full-text articles, visit our homepage!

Unearthing the Environmental Impact of Cambodia’s Ancient City, Mahendraparvata

Angkor from the air


From the 9th to the mid-14th century, the region of Angkor in modern-day northern Cambodia was the capital of Khmer Empire and the largest preindustrial city in the world. Home to possibly more than three quarters of a million people, several different urban plans and reservoir systems, and impressive monuments like the temple of Angkor Wat (pictured from a bird’s-eye-view above), Angkor was the core of the Khmer Empire, which dominated Southeast Asia by the 11th century CE. Like many modern, booming cities, Angkor was fed by water sourced from another city.

Mahendraparvata, a hill-top site in the mountain range of Phnom Kulen, is significant as the birthplace of the Khmer Kingdom and as the seat of Angkor’s water supply. In 802 CE, Jayavarman II proclaimed himself the universal king of the Angkor region on the top Mahendraparvata. Jayavarma’s ascension to power marked the unification of the Angkor region and the foundation of the Khmer Empire.



Until recently, however, little was known about the urban settlement of Mahendraparvata; a dense forest canopy obscures a great deal of the area’s archaeological landscape. To determine the extent of land use around Mahendraparvata, the authors of a recent PLOS ONE paper examined soil core samples taken from one of the Phnom Kulen region’s reservoirs.

As Angkor’s source of water, Phnom Kulen’s archaeological landscape is littered with hydraulic structures, like dams, dykes, and reservoirs (points A, B, and E on the remote sensing digital image shown below), meant to store and direct Angkor’s water sources strategically. The researchers focused on an ancient reservoir upstream of the main river running north to south, now a swamp, to find evidence of intensive land use.

Remote sensing


Core samples taken from the sediment of this ancient reservoir, point F on the image above, provided the researchers with chronological layers of earth containing organic materials, like wood, pollens, and spores, which could be assessed using radiocarbon dating.

By analyzing the sediment cores, researchers found that the reservoir was likely in use for about 400 years. Although the age of the reservoir itself remains inconclusive, sediment samples suggest that the valley was flooded in the mid-to-late 8th century CE, around the time Jayavarman II unified the area.

The authors found that medium-to-coarse sand deposition in the sediment samples beginning in the mid-9th century points to the presence of continual soil erosion, either from the surrounding hills or from the dyke itself, likely caused by deforestation in the area. By analyzing samples from the late 11th century, the authors found that the last and largest episode of erosion occurred, a possible result of intensive land use.

The researchers suggest that deforestation, as evidenced by soil erosion, implies that “settlement on Mahendraparvata was not only spatially extensive but temporally enduring.” In other words, the estimated extent of deforestation by continual sand deposits from the mid-9th century to the late-11th century in core samples indicates that Mahendraparvata was home to a large and thriving urban network in need of resources.

However, an increase in pollen spores dated to the 11th century, followed by the establishment of swamp forests in the early to mid-12th century in the reservoir, reflects that, by this time, the reservoir had fallen out of use, perhaps linked to changes in water management throughout the broader area, and possible population decline nearby. According to mid-16th century samples, the swamp flora around this time appears to have developed into the swamp flora seen today in the ruins of Mahendraparvata.

For some 400 years, the Phnom Kulen mountains acted as the main source of water for the Angkor region. The change of water management practices in the Phnom Kulen region has implications for the water supply to Angkor itself. In sum, by examining core samples drawn from one of Phnom Kulen’s ancient reservoirs, authors were able to explore an archaeological landscape that is still largely hidden and a history still mainly obscured by time. The potential link between the rise and fall of urban life in the Angkor region and the use of reservoirs the one used in this study helps to unearth a little bit more about the the Khmer Kingdom and the marked environmental impact of Mahendraparvata.

Citation: Penny D, Chevance J-B, Tang D, De Greef S (2014) The Environmental Impact of Cambodia’s Ancient City of Mahendraparvata (Phnom Kulen). PLoS ONE 9(1): e84252. doi:10.1371/journal.pone.0084252

Image 1: Angkor Wat by Mark McElroy

Image 2: journal.pone.0084252

Image 3: journal.pone.0084252

The post Unearthing the Environmental Impact of Cambodia’s Ancient City, Mahendraparvata appeared first on EveryONE.

Liberating Open Crystallography: My 2 weeks in Vilnius with COD; massive progress, Crystaleye moves

I have been in Vilnius LT for nearly two weeks. I had hoped to blog every day, but have failed to do so once. This is because we are working flat out on developing Open Crystallography (for “small molecules” – i.e. non-macromolecules). I have masses to write (and will do so) but here is the summary:

Much small-molecule crystallography is effectively Closed and certainly not conformant to the OKF’s Open Definition. I’ve written about this several times earlier – in essence people don’t have facile access to enough data, code. There’s a lot of people – not just practising crystallographers – who want to change this. Crystallography is a central science (and this year is recognised as The International Year of Crystallography) it’s used in:

  • Bioscience
  • Medicine
  • Materials
  • Chemistry
  • Mathematics

And much more.     

Ten years ago Armel Le Bail Set up an initiative – the Crystallography Open Database (COD) – to collect and store completely Open crystallography. It’s had a lot of support in kind, and some financial support. It now has about 250,000 structures. These are being widely used. Some years ago Armel handed over the direction to Saulius Grazulis (there’s a hacek on the “z”) and I’ve been visiting Saulius and colleagues for 2 weeks.

Independently Nick Day in our group in Cambridge built an Open Database of structures (“Crystaleye” (CY)). Like so many things (e.g. Figshare) it wasn’t planned as a world-beating database. Nick wanted these structures to validate computational methods, so he thought why no collect every structure on the web. Then he thought, why not offer them to the world ( ) and built a system wich not only exposed the data, but also calculated a huge variety of chemistry. This was possible not only because of the code we had written but the huge contributions of the community. We extensively use CDK, OpenBabel , Jmol, Avogadro and many others. This meant that Crystaleye could display over 10 million computed webpages to allow people to browse and display the chemistry.

I’ve formally shut down my group at Cambridge but continue to be active in chemistry and it would be a great pity if Crystaleye atrophied and died. Nick put many completely novel features into it. So Saulius and I planned that the two efforts would merge – COD has an emphasis on crystallography and CY ‘s is on chemistry. So they complement each other well.

In the time here we have tackled:

  • Pulling the Crystaleye entries to Vilnius. Of the 250,000 10,000 were unique to CY so COD has immediately increased.
  • Extracting the major chemistry routines from CY and installing them in COD-CY
  • Testing the extraction of chemistry from COD-CY
  • Designing novel functionality and display for the web pages
  • Expanding the community that COD-CY interacts with in both directions. I’ll write more about this. COD chemistry will be a massive resource for the whole chemical community and the BlueObelisk will contribute hugely to COD-CY;
  • Designing and implementing RDF for crystallography
  • Turning COC-CY into one of the first small-molecule chemical resources on the LinkedOpenData Cloud.

The group here is wonderful and the potential is huge. We are seeing how Open resources can liberate thought and action in chemistry and crystallography. There’s a commitment to being part of the world community.

I’ve particularly worked with Saulius – we’ve had many days where we have literally hacked from dawn to dusk. Saulius is an ace UNIX-hacker and the infrastructure of the COD is very impressive – with a lot of Perl and shellscripts. I contracts much of the BlueObelisk software is Java and many users run on windows. So we’ve spent a lot of time making CY tools and JUMBO-converters run on the commandline. We’ve cracked the main problems and Saulius can now run Nick’s Crystaleye ideas on the COD server.


Much more later

Chicken Little Meets British Humanities in the Times Literary Supplement

In a very silly opinion piece in the Times Literary Supplement, Shakespeare scholar Jonathan Bate — despite noting that until at least 2020 HEFCE has not mandated only for journal articles, not for books — decries shrilly the doom and gloom that the HEFCE mandate portends for book-based humanity scholarship. The gratuitous cavilling is, as usual, cloaked in shrill alarums about academic freedom infringement…

Watch Where I’m Going: Predicting Pedestrian Flow

Pedestrian traffic flow

Pedestrian traffic flow

At last check, the population of the world was around 7.1 billion and counting.  As we all know, the sheer number of people on the planet presents a host of new challenges and exacerbates existing ones.  The overarching population problem may seem daunting, but there’s still plenty we can do to make a crowded, urbanized world livable.  A new study in PLOS ONE focuses on the specific issue of pedestrian traffic and how to accurately model the flow of people through their environment.

Researchers with Siemens and the Munich University of Applied Sciences examined video recordings of commuters walking through a major German train station on a weekday, during both the morning and evening peak commute times. Scientists analyzed the videos to determine individual pedestrians’ paths and walking speeds, and used the resulting data to set the parameters for a simulation of pedestrian traffic flow.  According to the authors, this kind of calibration of theoretical models using real-world data is largely missing from the most pedestrian flow models, which are under-validated and imprecise.

Footage from train station

Footage from train station

The authors utilized a cellular automaton model to form the basis of this simulation. Cellular automatons are models in which cells in a grid evolve and change values through steps based on specific rules. In this instance, the authors used a hexagonal grid and a few simple rules about pedestrian movement:

  • Pedestrians know and will follow the shortest path to their destination unless pedestrians or other obstacles are in the way.
  • Pedestrians will walk at their own individual preferred speeds, so long as the path is unobstructed.
  • Individuals need personal space, which acts like a repelling force to other pedestrians and objects.
  • Walking speeds decrease as crowds get denser.
  • Factors like age and fitness are all captured by setting a range of individual walking speeds.
Pedestrian traffic flow model

Pedestrian traffic flow model (Settlers of Catan Pedestrian Expansion?)

This model also borrowed from electrostatics by treating people like electrons. As the authors write:

“Pedestrians are attracted by positive charges, such as exits, and repelled by negative charges, such as other pedestrians or obstacles.”

Add to this model rules about when and where pedestrians appear, the starting points and destinations, and the relative volume of traffic from each starting point to different destinations, and you’ve got a basic model of pedestrian traffic.

Next, the authors calibrated this model by setting parameters using real-world, observational data from the train station videos:  where people at each starting point were going, distance kept from walls, the distribution of walking speeds, and so on.  To test their model and parameters, the authors validated it by running predictive simulations and comparing it to real-world scenarios. Based on the results, the authors suggest that this kind of model, which includes parameters based on real-world observation, more accurately represents pedestrian flow than other models of walkers that do not incorporate observational data.

The authors also changed multiple parameters to determine which ones had the largest impact on the simulation. The parameter that had the largest effect when altered was the source-target distribution (the destinations of people coming from specific starting points), so the authors note that this is critical to measure accurately and precisely.

The ability to precisely predict the flow of traffic has many clear applications, from the design of buildings and public spaces to the prediction and prevention of unsafe crowd densities during large events or emergencies.

Next research question: when it’s crowded, does pushing really not make it go faster?

Related papers:

Citation: Davidich M, Köster G (2013) Predicting Pedestrian Flow: A Methodology and a Proof of Concept Based on Real-Life Data. PLoS ONE 8(12): e83355. doi:10.1371/journal.pone.0083355

Images: All images come from the manuscript

The post Watch Where I’m Going: Predicting Pedestrian Flow appeared first on EveryONE.

Onwards and Upwards for Wiley Open Access

2013 was quite a year for Wiley Open Access, with the addition of 16 new open access journals to our portfolio,  OnlineOpen orders reaching an all time high and the significant increase in institutions with Wiley Open Access Accounts.

2014 is also set to be a year of growth. Wiley will be publishing 33 journals as part of the Wiley Open Access program, many in partnership with societies.  In addition, over 1,300 of our subscription journals now offer the hybrid Online Open option to authors. We have recently launched or are planning to launch the following new Open Access journals:

In addition, the following journals ’flipped’  from the subscription model to Open Access on 1st January 2014:

The complete list of 2014 Journal Titles, Changes and Collections is available on Wiley Online Library.

Evolutionary Applications SPECIAL ISSUE: Climate change, adaptation and phenotypic plasticity

eva_v7_i1_OC_RevThe January Special issue of Evolutionary Applications edited by guest editors Juha Merilä and Andrew Hendry, reviews the available literature that studies the responses to climate change in a large variety of taxa, including terrestrial and aquatic phytoplankton, plants and invertebrates, as well as all classes of vertebrates, including fish, amphibians, reptiles birds and mammals. Clearly this Special Issue is the most updated and exhaustive coverage on this crucial topic. The cover image features a collage highlighting some of the species that have been the subject of focus in this issue for their response to climate change. The Editor-in-Chief Louis Bernatchez has highlighted the following Special Issue articles as of particular interest:

purple_lock_open Climate change, adaptation, and phenotypic plasticity: the problem and the evidence
by Juha Merilä and Andrew P. Hendry

This perspective article examines the levels of inference employed in studies where recorded phenotypic changes in natural populations have been attributed to climate change. Based on the reviews from this Special Issue, Merilä and Hendry conclude that evidence for genetic adaptation to climate change has been found in some systems, but remains relatively scarce compared to evidence for phenotypic plasticity. It is apparent that additional studies employing better inferential methods are required before drawing further conclusions.

purple_lock_openRapid evolution of quantitative traits: theoretical perspectives by Michael Kopp and Sebastian Matuszewski

 In this review and syntheses article the authors review the theoretical models of rapid evolution in quantitative traits, to shed light on the potential for adaptation to climate change. In particular, the authors demonstrate how survival can be greatly facilitated by phenotypic plasticity, and how heritable variation in plasticity can further speed up genetic evolution.

purple_lock_open Climate warming and Bergmann’s rule through time: is there any evidence? by Celine Teplitsky and Virginie Millien

In this article the authors investigate the hypothesis that the climate warming causes a reduction in body size. This hypothesis originates from Bergmann’s rule, whereby species in warmer climates exhibit a smaller body size when compared to endotherms found in colder climates. Reviewing the literature the authors find weak evidence for changes in body size through time as predicted by Bergmann’s rule.

We do hope you enjoy reading this month’s Special Issue, and encourage you to submit papers applying concepts from evolutionary biology to address biological questions of health, social and economic relevance across a vast array of applied disciplines. We also welcome submissions of papers making use of modern genomics or other molecular methods to address important questions in an applied evolutionary framework. For more information please visit the aims and scopes page.

Submit your article to Evolutionary Applications here >

Sign up to receive email content alerts here >

Rainforest Fungi Find Home in Sloth Hair

Bradypus_variegatusMost of us have seen a cute sloth video or two on the Internet. Their squished faces, long claws, and scruffy fur make these slow-moving mammals irresistible, but our furry friends aren’t just amusing Internet sensations. Like most inhabitants of the rainforest, little is known about the role sloths play in the rainforest ecosystem.

Three-toed sloths live most of their lives in the trees of Central and South American rainforests. Rainforests are some of the most biodiverse ecosystems in the world and home to a wide variety of organisms, some of which can be found in rather unusual places.

Due to their vast biodiversity, rainforests have been the source for a wide variety of new medicines, and researchers of this PLOS ONE study sought to uncover whether sloth hair may also contain potential new sources of drugs that could one day treat vector-borne diseases, cancer, or bacterial infections. Why look in sloth fur? It turns out that sloths carry a wide variety of micro- and macro-organisms in their fur, which consists of two layers: an inner layer of fine, soft hair close to the skin, and a long outer layer of coarse hair with “cracks” across it where microbes make their homes. The most well-known inhabitant of sloth fur is green algae. In some cases, the green algae makes the sloth actually appear green, providing a rainforest camouflage.

In the study, seventy-four separate fungi were obtained from the surface of coarse outer hair that were clipped from the lower back of nine living three-toed sloths in Soberanía National Park, Panama, and were cultivated and tested for bioactivity in the lab.

Researchers found a broad range of in vitro activities of the fungi against bugs that cause malaria and Chagas disease, as well as against a specific type of human breast cancer cells. In addition, 20 fungal extracts were active in vitro against at least one bacterial strain. The results may provide for the first time an indication of the biodiversity and bioactivity of microorganisms in sloth hair.

Since sloths are moving around in one of the most diverse ecosystems in the world, it’s possible that they may pick up “hitchhikers,” so the researchers can’t be sure how these fungi came to live on the sloth fur. They may even have a symbiotic relationship with the green algae. However the fungi ended up in the fur, the authors suggest their presence in the ecosystem provides support for the role biodiversity plays both in the rainforest and potentially our daily lives.

Citation: Higginbotham S, Wong WR, Linington RG, Spadafora C, Iturrado L, et al. (2014) Sloth Hair as a Novel Source of Fungi with Potent Anti-Parasitic, Anti-Cancer and Anti-Bacterial Bioactivity. PLoS ONE 9(1): e84549. doi:10.1371/journal.pone.0084549

Image: Bradypus variegates by Christian Mehlführer

The post Rainforest Fungi Find Home in Sloth Hair appeared first on EveryONE.

US requires Open Access to Scientific Research – huge progress

[I have been relatively quiet recently because I am in Lithuania working flat out to liberate Crystallographic Data and make it Open – expect several posts in the near future.]

Seven years ago I approached SPARC to suggest that together we ran an “Open Data” mailing list – it was one of the times the term “Open Data” had been used – now it’s everywhere, of course. I’m delighted to repost the following item from the list:!topic/sparc-opendata/b2Qkwx5K-nA

In essence it says that the US is going to make Open Data in science actually happen. My thanks to SPARC and many others who have pushed the cause. There’s a lot more that needs to happen but notice the clause allowing Content Mining which I
have highlighted.


For Immediate Release

Thursday, January 16, 2014


Contact: Ranit Schmelzer                                                           




Omnibus Appropriations Bill Codifies White House Directive


Washington, DC – Progress toward making taxpayer-funded scientific research freely accessible in a digital environment was reached today with congressional passage of the FY 2014 Omnibus Appropriations Act.  The bill requires federal agencies under the Labor, Health and Human Services, and Education portion of the Omnibus bill with research budgets of $100 million or more to provide the public with online access to articles reporting on federally funded research no later than 12 months after publication in a peer-reviewed journal.


“This is an important step toward making federally funded scientific research available for everyone to use online at no cost,” said Heather Joseph, Executive Director of the Scholarly Publishing and Academic Resources Coalition (SPARC).  ”We are indebted to the members of Congress who champion open access issues and worked tirelessly to ensure that this language was included in the Omnibus.  Without the strong leadership of the White House, Senator Harkin, Senator Cornyn, and others, this would not have been possible.” 


The additional agencies covered would ensure that approximately $31 billion of the total $60 billion annual US investment in taxpayer funded research is now openly accessible.


SPARC strongly supports the language in the Omnibus bill, which affirms the strong precedent set by the landmark NIH Public Access Policy, and more recently by the White House Office of Science and Technology Policy (OSTP) Directive on Public Access.  At the same time, SPARC is pressing for additional provisions to strengthen the language – many of which are contained in the Fair Access to Science and Technology Research (FASTR) Act – including requiring that articles are:

·      Available no later than six months after publication;

·      Available through a central repository similar to the National Institutes for Health’s (NIH) highly successful PubMed Central, a2008 model that opened the gateway to the human genome project and more recently the brain mapping initiative.  These landmark programs demonstrate quite clearly how opening up access to taxpayer funded research can accelerate the pace of scientific discovery, lead to both innovative new treatments and technologies, and generate new jobs in key sectors of the economy; and 

·      Provided in formats and under terms that ensure researchers have the ability to freely apply cutting-edge analysis tools and technologies to the full collection of digital articles resulting from public funding.

“SPARC is working toward codifying the principles in FASTR and is working with the Administration to use PubMed Central as the implementation model for the President’s directive,” said Joseph.  ”Only with a central repository and the ability to fully mine and reuse data will we have the access we need to really spur innovation and job creation in broad sections of the economy.”




Every year, the federal government uses taxpayer dollars to fund tens of billions of dollars of scientific research that results in thousands upon thousands of articles published in scientific journals.  The government funds this research with the understanding that it will advance science, spur the economy, accelerate innovation, and improve the lives of our citizens.  Yet most taxpayers – including academics, students, and patients – are shut out of accessing and using the results of the research that their tax dollars fund, because it is only available through expensive and often hard-to-access scientific journals.


By any measure, 2013 was a watershed year for the Open Access movement:  in February, the White House issued the landmark Directive; a major bill,  FASTR, was introduced in Congress; a growing number of higher education institutions – ranging from the University of California System, Harvard University, MIT, the University of Kansas, and Oberlin College – actively worked to maximize access to and sharing of research results; and, for the first time, state legislatures around the nation have begun debating open access policies supported by SPARC.


Details of the Omnibus Language


The Omnibus language (H.R. 3547) codifies a section of the White House Directive requirements into law for the Department of Labor, Health and Human Services, the Centers for Disease Control (CDC), the Agency for Healthcare Research and Quality (AHRQ), and the Department of Education, among other smaller agencies.


Additional report language was included throughout the bill directing agencies and OSTP to keep moving on the Directive policies, including the US Department of Agriculture, Department of the Interior, Department of Commerce, and the National Science Foundation.


President Obama is expected to sign the bill in the coming days.




SPARC®, the Scholarly Publishing and Academic Resources Coalition, is an international alliance of academic and research libraries working to correct imbalances in the scholarly publishing system.  Developed by the Association of Research Libraries, SPARC has become a catalyst for change.  Its pragmatic focus is to stimulate the emergence of new scholarly communication models that expand the dissemination of scholarly research and reduce financial pressures on libraries.  More information can be found and on Twitter @SPARC_NA.