AGU/CHORUS Forum: How Open is Open Data & Software?

“The word “open” is being used commonly in scholarly communications with little context as to what is meant or intended. Researchers have been working openly to their advantage throughout their careers.  What is different now, and what is the value of “being open”?  Is open a binary option?  On or off?  OR is “open” dependent on your context and research objectives with the necessary flexibility?  When we are being open, what is our responsibility around attribution and credit? 

The speakers of this AGU/CHORUS Forum will discuss the open sharing of data and software as a researcher, within a team, and across a community and how it better supports discovery, collaboration, transparency and innovation.”

Positioning software source code as digital heritage for sustainable development | UNESCO

“The second annual symposium on the theme “Software Source Code as documentary heritage and an enabler for sustainable development” organized by UNESCO and the French National Institute for Research in Digital Science and Technology (Inria) on 7 February 2023, took stock of the initiative’s achievements over the previous few years.

Throughout the conference, five major dimensions of software source code were explored:

as documentary heritage and as an enabler for digital skills education.
as a first-class research object in the open science ecosystem.
as an enabler for innovation and sharing in industry and administration.
its perspectives on long term preservation.
technological advances allowing massive analysis of software source code….”

Positioning software source code as digital heritage for sustainable development | UNESCO

“The second annual symposium on the theme “Software Source Code as documentary heritage and an enabler for sustainable development” organized by UNESCO and the French National Institute for Research in Digital Science and Technology (Inria) on 7 February 2023, took stock of the initiative’s achievements over the previous few years.

Throughout the conference, five major dimensions of software source code were explored:

as documentary heritage and as an enabler for digital skills education.
as a first-class research object in the open science ecosystem.
as an enabler for innovation and sharing in industry and administration.
its perspectives on long term preservation.
technological advances allowing massive analysis of software source code….”

What’s in a Badge? A Computational Reproducibility Investigation of the Open Data Badge Policy in One Issue of Psychological Science – Sophia Crüwell, Deborah Apthorp, Bradley J. Baker, Lincoln Colling, Malte Elson, Sandra J. Geiger, Sebastian Lobentanzer, Jean Monéger, Alex Patterson, D. Samuel Schwarzkopf, Mirela Zaneva, Nicholas J. L. Brown, 2023

Abstract:  In April 2019, Psychological Science published its first issue in which all Research Articles received the Open Data badge. We used that issue to investigate the effectiveness of this badge, focusing on the adherence to its aim at Psychological Science: sharing both data and code to ensure reproducibility of results. Twelve researchers of varying experience levels attempted to reproduce the results of the empirical articles in the target issue (at least three researchers per article). We found that all 14 articles provided at least some data and six provided analysis code, but only one article was rated to be exactly reproducible, and three were rated as essentially reproducible with minor deviations. We suggest that researchers should be encouraged to adhere to the higher standard in force at Psychological Science. Moreover, a check of reproducibility during peer review may be preferable to the disclosure method of awarding badges.

 

What’s in a Badge? A Computational Reproducibility Investigation of the Open Data Badge Policy in One Issue of Psychological Science – Sophia Crüwell, Deborah Apthorp, Bradley J. Baker, Lincoln Colling, Malte Elson, Sandra J. Geiger, Sebastian Lobentanzer, Jean Monéger, Alex Patterson, D. Samuel Schwarzkopf, Mirela Zaneva, Nicholas J. L. Brown, 2023

Abstract:  In April 2019, Psychological Science published its first issue in which all Research Articles received the Open Data badge. We used that issue to investigate the effectiveness of this badge, focusing on the adherence to its aim at Psychological Science: sharing both data and code to ensure reproducibility of results. Twelve researchers of varying experience levels attempted to reproduce the results of the empirical articles in the target issue (at least three researchers per article). We found that all 14 articles provided at least some data and six provided analysis code, but only one article was rated to be exactly reproducible, and three were rated as essentially reproducible with minor deviations. We suggest that researchers should be encouraged to adhere to the higher standard in force at Psychological Science. Moreover, a check of reproducibility during peer review may be preferable to the disclosure method of awarding badges.

 

Briefing document on strengthening high-quality, open, trustworthy and equitable scholarly publishing

“To increase the quality and impact of research, research results need to be timely disseminated and easily reused, both within the scientific community and to society in general….

Research results made open access immediately upon publication leads to more researchers being able to validate and build on previous results, which contributes to maintaining and promoting a high quality of research, and to strengthen trust in research. Open access to research results also strengthens the use and impact of research in society at large, e.g. for industry and the public sector…. 

The potential of the digital revolution for scholarly publishing has not yet been fully realized, notably in relation to the expanding range of increasingly important research outcomes such as datasets and software….”

Briefing document on strengthening high-quality, open, trustworthy and equitable scholarly publishing

“To increase the quality and impact of research, research results need to be timely disseminated and easily reused, both within the scientific community and to society in general….

Research results made open access immediately upon publication leads to more researchers being able to validate and build on previous results, which contributes to maintaining and promoting a high quality of research, and to strengthen trust in research. Open access to research results also strengthens the use and impact of research in society at large, e.g. for industry and the public sector…. 

The potential of the digital revolution for scholarly publishing has not yet been fully realized, notably in relation to the expanding range of increasingly important research outcomes such as datasets and software….”

Guest Post – Are We Providing What Researchers Need in the Transition to Open Science? – The Scholarly Kitchen

“Why — despite live examples of seeing the impact of open research practices and the indication from researchers and the academic community that they want open research practices to be the norm — is there such a disparity between awareness, behavior, and action? How can we close this gap so that behaviors align with aspirations around open science?

Putting all these studies together, the reasons presented for the gap are mixed but include concerns around data misuse; lack of credit for sharing data; and the need for better support in how to make data and research sustainably open. Mandates, particularly funder mandates for this particular sample group, seem to have a limited role in driving authors to practice open research (although that may well change with new mandates for data sharing coming into effect from very large funding bodies such as federal agencies in the US). Comparatively, institutional encouragement had relatively good success. Where applicable, journal requirements to share materials, code, or data, or journal encouragement to facilitate preprint deposition, drove the same or greater degree of success as institutional encouragement….

One conclusion that becomes apparent is that more can be done by publishers and their partners to directly help and facilitate the adoption of open research practices. Encouraging or mandating sharing of objects as part of the manuscript publication process is an effective and efficient way of ensuring that open science practices are followed. Journals have been successful in the past in enforcing data-sharing mandates around the release of protein and nucleic acid sequences, for example, so we know that the right policies and initiatives can bring positive change….”

Water science must be Open Science | Nature Water

“Since water is a common good, the outcome of water-related research should be accessible to everyone. Since Open Science is more than just open access research articles, journals must work with the research community to enable fully open and FAIR science…”

Open Science for water research | Luxembourg Institute of Science and Technology

“For the launch of the new scientific journal Nature Water, researchers Emma and Stan Schymanski contributed an article about the future of water research. This opinion paper focuses on the importance of open science in a field where, due to its global societal relevance, knowledge and research results should be freely accessible by a wide range of stakeholders. The publication also highlights the interdisciplinary expertise brought to Luxembourg by the two FNR ATTRACT fellows on such a topical subject….

Research on water systems can help us face these considerable challenges but needs to consider the global societal relevance of its subject. “Since water is a common good, it should be natural that the outcome of water-related research is accessible to everyone,” explains Dr Stan Schymanski. “It needs to become freely available and re-usable for everybody, without the need for paid licenses to view publications or use data.”

The two researchers insist on the importance of implementing Open Science in its broadest definition. It has to go beyond open access to research articles: it must also include open data and open-source computer code. Additionally, open data should be aligned with the FAIR Principles, which describe how to make data findable, accessible, interoperable and reusable. Open reproducible research can only be achieved through the combination of all these aspects.

Their Nature Water article details how this is vital for the development of Early Warning Systems for floods for example, as reliable forecasting relies heavily on real-time sharing of meteorological data. It is also crucial when studying processes on long time scales such as groundwater recharge, that can take centuries in arid systems. Understanding these natural mechanisms is only possible through free access to long time series of hydrological data across the globe.

After reviewing the tools already available to perform open water research – such as open repositories, templates to facilitate reproducibility assessments, practical guidelines for sharing code and choosing appropriate licenses – the two authors call for substantial additional efforts toward fully open science….”

GitHub is Sued, and We May Learn Something About Creative Commons Licensing – The Scholarly Kitchen

“I have had people tell me with doctrinal certainty that Creative Commons licenses allow text and data mining, and insofar as license terms are observed, I agree. The making of copies to perform text and data mining, machine learning, and AI training (collectively “TDM”) without additional licensing is authorized for commercial and non-commercial purposes under CC BY, and for non-commercial purposes under CC BY-NC. (Full disclosure: CCC offers RightFind XML, a service that supports licensed commercial access to full-text articles for TDM with value-added capabilities.)

I have long wondered, however, about the interplay between the attribution requirement (i.e., the “BY” in CC BY) and TDM. After all, the bargain with those licenses is that the author allows reuse, typically at no cost, but requires attribution. Attribution under the CC licenses may be the author’s primary benefit and motivation, as few authors would agree to offer the licenses without credit.

In the TDM context, this raises interesting questions:

Does the attribution requirement mean that the author’s information may not be removed as a data element from the content, even if inclusion might frustrate the TDM exercise or introduce noise into the system?
Does the attribution need to be included in the data set at every stage?
Does the result of the mining need to include attribution, even if hundreds of thousands of CC BY works were mined and the output does not include content from individual works?

While these questions may have once seemed theoretical, that is no longer the case. An analogous situation involving open software licenses (GNU and the like) is now being litigated….”

2022 PLOS accomplishments – The Official PLOS Blog

“Here are some highlights:

Our new journals, launched to address global issues like climate change, published more than 1,000 papers
We just published our first dataset on our Open Science Indicators, a new initiative that will help us surface and understand researcher practices with regards to sharing data and code, among other Open Science practices 
We set up PLOS entities across the globe and formed relationships with stakeholders within local research ecosystems
We doubled the number of our institutional partnerships…”

Kenyon | The Journal Article as a Means to Share Data: a Content Analysis of Supplementary Materials from Two Disciplines | Journal of Librarianship and Scholarly Communication

Abstract:  INTRODUCTION The practice of publishing supplementary materials with journal articles is becoming increasingly prevalent across the sciences. We sought to understand better the content of these materials by investigating the differences between the supplementary materials published by authors in the geosciences and plant sciences. METHODS We conducted a random stratified sampling of four articles from each of 30 journals published in 2013. In total, we examined 297 supplementary data files for a range of different factors. RESULTS We identified many similarities between the practices of authors in the two fields, including the formats used (Word documents, Excel spreadsheets, PDFs) and the small size of the files. There were differences identified in the content of the supplementary materials: the geology materials contained more maps and machine-readable data; the plant science materials included much more tabular data and multimedia content. DISCUSSION Our results suggest that the data shared through supplementary files in these fields may not lend itself to reuse. Code and related scripts are not often shared, nor is much ‘raw’ data. Instead, the files often contain summary data, modified for human reading and use. CONCLUSION Given these and other differences, our results suggest implications for publishers, librarians, and authors, and may require shifts in behavior if effective data sharing is to be realized.

 

Reproducibility Policy | Sociological Science

“Starting with submissions received after April 1, 2023, authors of articles relying on statistical or computational methods will be required to deposit replication packages as a condition of publication in Sociological Science. Replication packages must contain both the statistical code and — when legally and ethically possible — the data required to fully reproduce the reported results. With this policy, Sociological Science hopes other high-impact journals in Sociology will follow suit in setting standards for reproducibility of published work….”

Plos launches open science data collection push | Times Higher Education (THE)

“The Public Library of Science is beginning a project to track open science behaviours across scientific publishing, calling the lack of such data a critical barrier to making meaningful advances in research-sharing.

Plos, the pioneering non-profit open-access publisher founded in 2000, said that its new Open Science Indicator project would measure and report three characteristics of published articles: how many appeared in a preprint format, shared their research data, and made available the computer code underlying that data….”