“There has been much made of the recent Nature news declaration of the NIH Data Policy (from January 2023) as ‘seismic’. In my opinion, it truly is. Many others will argue that the language is not strong enough. But for me, the fact that the largest public funder of biomedical research in the world is telling researchers to share their data demonstrates how fast the push for open academic data is accelerating.
While a lot of the focus is on incentive structures and the burden for researchers, the academic community should not lose focus on the potential ‘seismic’ benefits that open data can have for reproducibility and efficiency in research, as well as the ability to move further and faster when it comes to knowledge advancement….
Reflecting on the past decade of open research data, there are a few key developments that have helped speed up the momentum in the space, as well as a few ideas that haven’t come to fruition…yet.
The NIH is not the first funder to tell the researchers they fund that they should be making their data openly available to all. 52 funders listed on Sherpa Juliet require data archiving as a condition of funding, while a further 34 encourage it. A push from publishers has also acted as a major motivator for researchers to share their data. This goes as far back as PLOS requiring all article authors to make their data publicly available back in 2014. Now, nearly all major science journals have an open data policy of some kind. Some may say there is no better motivator for a researcher to share their data than if a publication is at stake.
In 2016, the ‘FAIR Guiding Principles for scientific data management and stewardship’ were published in Scientific Data, and a flurry of debate on the definition of Findable, Accessible, Interoperable, and Reusable data has continued ever since. This has been a net win for the space. Although every institution, publisher and funder may not be aiming for the exact same outcome, it is a move to better describe and ultimately make data outputs usable as a standalone output. The principles for Findable, Accessible, Interoperable and Reusable data emphasize that when thinking of research data, future consumers will not just be human researchers — we also need to feed the machines. This means that computers will need to interpret content with little or no human intervention. For this to be possible, the outputs need to be in machine readable formats and the metadata needs to be sufficient to describe exactly what the data are and how the data was generated.
This highlights the area (in my opinion) that can create the most change in the shortest amount of time: quality of metadata….