Using OpenRefine to assess open research at your organization — GO FAIR US

“The goal of this post is to share the process I used to collect open research publications data, using primarily OpenRefine, to understand the impact of our Open Access (OA) Publication Policy at the Michael J. Fox Foundation (MJFF). Inspired by the Year of Open Science, it is my hope that the practical recipes shared here can help others assess the impact of open research at their organizations. 

Tools used

For this work, I used three free tools which you can either use online or download to your computer:

Google Sheets – A spreadsheet program included as part of the free, web-based Google Docs Editors suite offered by Google. Wikipedia

OpenRefine – An open-source desktop application for data cleanup and transformation to other formats, an activity commonly known as data wrangling. Wikipedia

Publish or Perish – A software program that retrieves and analyzes academic citations….”

Analyzing Your Institution’s Publishing Output

Abstract:  Understanding institutional publishing output is crucial to scholarly communications work. This class will equip participants to analyze article publishing by authors at an institution.

After completing the course, participants will be able to

Gain an understanding of their institution’s publishing output, such as number of publications per year, open access status of the publications, major funders of the research, and estimates of how much funding might be spent toward article processing charges (APCs).
Think critically about institutional publishing data to make sustainable and values-driven scholarly communications decisions.

This course will build on open infrastructure, including Unpaywall and OpenRefine. We will provide examples of how to do analyses in both OpenRefine and Microsoft Excel. 

The course will consist of two parts. In the first, participants will learn how to build a dataset. We will provide lessons about downloading data from different sources: Web of Science, Scopus, and The Lens. (Web of Science and Scopus are subscription databases; The Lens is freely available.) 

In the second part of the course, participants will learn data analysis methods that can help answer questions such as:

Should you cancel or renew a subscription?
Who is funding your institution’s researchers?
Are your institution’s authors using an institutional repository?
Should you accept a publisher’s open access publishing offer?

Library agreements with publishers are at a crucial turning point, as they more and more often include OA publishing. By learning to do these analyses for themselves, participants will be better prepared to enter into negotiations with a publisher. The expertise developed through this course can make the uneven playing field of library-publisher negotiations slightly more even.

Course materials will be openly available. This will be a facilitated course taught by the authors.