How can academia kick its addiction to the impact factor? – ScienceOpen Blog

“The impact factor is academia’s worst nightmare. So much has been written about its flaws, both in calculation and application, that there is little point in reiterating the same tired points here (see here by Stephen Curry for a good starting point).”

Complying With HEFCE’s Open Access Policy: What You Need To Know

Complying With HEFCE’s Open Access Policy: What You Need To Know

Most researchers working in the UK will know that the Higher Education Funding Council for England (HEFCE) open access policy took effect from April 1st of this year, but what does that mean for you, and how can you make sure you are fully compliant?     What is the HEFCE open access policy? Around…

Launch of Data Management Planning Tool | Office for Sponsored Programs

“As a result of collaborations with the Office of the Vice Provost for Research, Harvard University Information Technology, and IQSS, Harvard Library has launched a customized version of DMPTool, an online data management planning tool, for Harvard University. Data management plans—documents that outline what researchers will do with data during and after a project—are becoming increasingly required by funding agencies such as the National Institutes of Health and the National Science Foundation. The online tool provides step-by-step guidance for creating data management plans that include templates and examples; it also helps researchers create and share their plans, assisting them in how to address requirements specific to Harvard….”


“The Reference & Scholarly Communication Librarian will have primary responsibility for leading the development and activities of the University’s Institutional Repository and other locally created digital collections, including planning and administering the repository, developing and conducting outreach initiatives, and assessing the effectiveness of those initiatives and services….”

@TheContentMine preparing for largescale high-throughput Mining (TDM)

The ContentMine ( has almost finished the infrastructure and software for automatic daily mining of the scientific literature. We hope to start testing in the next few days. I’ll try to post frequent information.

The software has been developed by the ContentMine Team, wonderfully funded by the Shuttleworth Foundation. The people involved include:

  • Mark MacGillivray
  • Anusha Ranganathan
  • Richard Smith-Unna
  • Tom Arrow
  • Peter Murray-Rust
  • Chris Kittel
  • and voluntary contributions

The daily oprtation (as opposed to user-driven getpapers) consists of:

  • DOIs and URLs provided by CrossRef
  • downloading software
  • indexing of fulltext documents (closed as well as open, legal under the UK “Hargreaves” exception)
  • fact extraction
  • display

We’ll detail this later.

The sources include:

  • open repositories such as EuropePubMedCentral
  • arxiv and other repositories
  • closed documents to which Cambridge University subscribes. We are working intimately with Cambridge University Library staff and offer public applause and thanks.

All closed work will be carried out on closed machines run by the University’s computer officers, primarily in Chemistry, and again public thanks to this wonderful group. We take great care to limit access so that no unauthorised access is possible and that there is also an audit trail of what we do and have done.

It is difficult to predict the daily volume. MarkMacG has found it to vary between 300 and 80,000 documents a day. My guess is about 2000-7000 on average.

This is NOT a resource problem. The whole scientific literature for a year can be held on a terabyte disk. The processing time is small – perhaps 1000 documents a minute on our system. The whole literature can be done within a long coffee break.

The impact on publisher servers is minimal. at, say, 5000 articles/day even the largest publisher would only get 1 request per minute. The others would be trivial (1 request every 5-10 minutes). There is no case that our responsible TDM would cause any problems at all.

And, just to reassure everyone, I and colleagues are working hard to stay completely within the law as we see it. We are not stealing content.


Off to Brussels for ContentMining (TDM) meeting.

I’m spending a (long) day going to Brussels to a meeting run by MEPs and the European Parliament on Text and Data Mining. Here’s the metadata:

“Demystifying Text and Data Mining in a copyright context”

When: Wednesday 27 April 2016, 13.00 – 15.00

Where: European Parliament, ASP, Room A5E2

Event co-hosted by Miapetra Kumpula-Natri & Therese Comodini Cachia & Catherine Stihler

First – I am a great supporter of the MEPs who propose reform – we can add Julia Reda (@senficon) to this.

The blurb is only present as a woolly GIF:  Why??? I can’t even cut-and-paste? we are in the digital century? euroinvite

The UK has one of the few Exceptions to Copyright allowing TDM (for very limited purposes – personal non-commercial research for those who have legal access to the material). I am one of the very few people – perhaps one of two – who is actually using this legal permission.

Europe has been fighting for similar rights – and so have individual jurisdictions such as France:

Declaration pro-exception in #copyright for #TDM in France (and in French) by group of entrepreneurs and leaders:
(PMR summary – the great-and-good of France are fighting for rights to carry out TDM).

However I am deeply worried about the European initiative. Every time there is to be a draft, the time slips. The current wording is so vague as to be almost useless. We are all fighting massive opposition from publishers and lobbyists and reform gets watered down month by month…

Simply – I (PMR) am allowed to mine in UK because ANYONE has “The right to read is the right to mine”. By contrast in Europe only “Public (Interest) Research Organisations” can mine.

  • Is a journalist a PIRI? No.
  • Is a teacher a PIRI? No.
  • Is PMR a PIRI? No.

Who is?

My guess is that this will turn out to require either/or

  • a regulator
  • a court case

If we rely on the EC then maybe I would have to register as an approved TDM’er and only carry out TDM at approved institutions.

Please tell me that I am overreacting.


I shall certainly ask this tomorrow if I am allowed to speak.

oh – and here is the awful GIF that accompanied the event. I hope against hope that it was a mistake. It sends out every wrong message…
Screen Shot 2016-04-26 at 19.24.39

TDM Copyright reform is about LICENSING?? NO, NO, NO

Opening up Malaria Research by Patrick Vallance and Tim Wells – Project Syndicate

“In recent years, tremendous progress has been made in the battle against malaria. According to the World Health Organization, the number of deaths from the disease has fallen by a staggering 60% since 2000 – the result of improved access to diagnostic testing and treatment. To be sure, there is still considerable work to be done, but the downward trend in new infections and deaths underscores the power of collaboration among governments (in malaria endemic and non-endemic countries alike), between commercial and non-profit organizations, and between academic science and medicine. Without such partnerships, advances in fighting this deadly disease would not have been possible. Alongside coordinated action on the ground, increasing openness and collaboration among scientists researching and developing a new generation of medicines and vaccines is paving the way for further progress …”

The Argonaut – UI celebrates Open Education Week as movement to adopt open resources picks up pace

“With the beginning of every new semester, one thing never seems to change — college textbooks are expensive, heavy, mostly required and often useless. At the University of Idaho, many are trying to do their part to ease that burden and Open Education Week is an attempt to demonstrate that. ASUI President Max Cowan said signing onto a partnership with OpenStax is one step forward the university has made this semester …”

Linked Open Data Services for OpenAIRE : OpenAIRE blog

“We’re happy to announce that the OpenAIRE Linked Open Data (LOD) Services are now available as a beta version at OpenAIRE already makes its data freely available for re-use via APIs. In line with its commitment to openness, OpenAIRE has been busy mapping OpenAIRE’s data onto suitable standard vocabularies in order to make OpenAIRE’s data available as Linked Open Data.  This started with a specification of the OpenAIRE data model as a Resource Description Framework (RDF) vocabulary, and then entailed mapping of the OpenAIRE data to the graph-based RDF data model. To interlink the OpenAIRE data with related data on the Web, we have identified a list of potential datasets with which to interlink, including the DBpedia dataset extracted from Wikipedia and the publication databases DBLP and CiteSeer. Making our data available in this way extends OpenAIRE’s technical interoperability and enables new user communities to engage with our data …”

[1604.05363] Comparing Published Scientific Journal Articles to Their Pre-print Versions

[Abstract] Academic publishers claim that they add value to scholarly communications by coordinating reviews and contributing and enhancing text during publication. These contributions come at a considerable cost: U.S. academic libraries paid $1.7 billion for serial subscriptions in 2008 alone. Library budgets, in contrast, are flat and not able to keep pace with serial price inflation. We have investigated the publishers’ value proposition by conducting a comparative study of pre-print papers and their final published counterparts. This comparison had two working assumptions: 1) if the publishers’ argument is valid, the text of a pre-print paper should vary measurably from its corresponding final published version, and 2) by applying standard similarity measures, we should be able to detect and quantify such differences. Our analysis revealed that the text contents of the scientific papers generally changed very little from their pre-print to final published versions. These findings contribute empirical indicators to discussions of the added value of commercial publishers and therefore should influence libraries’ economic decisions regarding access to scholarly publications.

Reporting of harms outcomes: a comparison of journal publications with unpublished clinical study reports of orlistat trials | Trials | Full Text

“In December 2009, Roche was the first global healthcare company to release ‘Clinical Study Reports’ after growing concerns over their product Tamiflu [8]. Their policy now allows researchers to access the CSRs and summary reports used for regulatory purposes since 1 January 1999. In 2010, the European Medicine Agency (EMA) [11] became the first major regulatory agency to agree to an open-access policy to confidential documents, including CSRs. However, in 2013, the EMA was forced to step backwards when the general court of the European Union (EU) ordered them to limit the access to their reports due to legal cases from two drug companies [12]….”

NIH Summit Sets Priorities for Research on the Non-Alzheimer’s Dementias

“We advance research by developing open-access databases of curated, highly specific scientific content to visualize and facilitate the exploration of complex data. Alzforum is a platform to disseminate the evolving knowledge around basic, translational, and clinical research in the field of AD….”

Tallahassee, Florida STEM Data and Research Librarian Job at Florida State University – Higher Education Career Center by University Business

“The Florida State University (FSU) Libraries seek a STEM Data and Research Librarian to join a team of science librarians who are transforming library support for the sciences at FSU. This position has a dual support role with abundant opportunities for collaboration and growth….Qualifications: MLS from an ALA-accredited program or an equivalent combination of relevant advanced degree and library experience. Knowledge of best practices in information literacy, library instruction, reference services, and resources in support of STEM research. Knowledge of data management plans and federal grant requirements, open access concepts and application, and scholarly communication principles….Preferred: xperience with e-science, data literacy, data management, open access, and scholarly communication….”