Global Community Guidelines for Documenting, Sharing, and Reusing Quality Information of Individual Digital Datasets

Open-source science builds on open and free resources that include data, metadata, software, and workflows. Informed decisions on whether and how to (re)use digital datasets are dependent on an understanding about the quality of the underpinning data and relevant information. However, quality information, being difficult to curate and often context specific, is currently not readily available for sharing within and across disciplines. To help address this challenge and promote the creation and (re)use of freely and openly shared information about the quality of individual datasets, members of several groups around the world have undertaken an effort to develop international community guidelines with practical recommendations for the Earth science community, collaborating with international domain experts. The guidelines were inspired by the guiding principles of being findable, accessible, interoperable, and reusable (FAIR). Use of the FAIR dataset quality information guidelines is intended to help stakeholders, such as scientific data centers, digital data repositories, and producers, publishers, stewards and managers of data, to: i) capture, describe, and represent quality information of their datasets in a manner that is consistent with the FAIR Guiding Principles; ii) allow for the maximum discovery, trust, sharing, and reuse of their datasets; and iii) enable international access to and integration of dataset quality information. This article describes the processes that developed the guidelines that are aligned with the FAIR principles, presents a generic quality assessment workflow, describes the guidelines for preparing and disseminating dataset quality information, and outlines a path forward to improve their disciplinary diversity.

An open science argument against closed metrics

“In the Open Scientist Handbook, I argue that open science supports anti-rivalrous science collaborations where most metrics are of little, or of negative value. I would like to share some of these arguments here….

Institutional prestige is a profound drag on the potential for networked science. If your administration has a plan to “win” the college ratings game, this plan will only make doing science harder. It makes being a scientist less rewarding. Playing finite games of chasing arbitrary metrics or ‘prestige’ drags scientists away from the infinite play of actually doing science….

As Cameron Neylon said at the metrics breakout of the ‘Beyond the PDF’ conference some years ago, “reuse is THE metric.” Reuse reveals and confirms the advantage that open sharing has over current, market-based, practices. Reuse validates the work of the scientist who contributed to the research ecosystem. Reuse captures more of the inherent value of the original discovery and accelerates knowledge growth….”

How to reuse & share your knowledge as you wish through Rights Retention – YouTube

“In 2020 cOAlition S released its Rights Retention Strategy (RRS) with the dual purpose of enabling authors to retain rights that automatically belong to the author, and to enable compliance with their funders’ Open Access policy via dissemination in a repository.

This video explains briefly the steps a researcher has to follow to retain their intellectual property rights….”

The doors of precision: Reenergizing psychiatric drug development with psychedelics and open access computational tools

“In a truly remarkable way, the study was performed at essentially no additional cost. Ballentine et al. (3) made use of existent, openly available resources: the Erowid psychedelic “experience vault,” the pharmacokinetic profiles of each psychedelic, the Allen Human Brain gene transcription profiles, and the Schafer-Yeo brain atlas that mapped gene transcript to brain structure. The computational tools—primarily python toolboxes—that Ballentine et al. deployed were also available at no cost. So in the same way that the psychedelics industry is repurposing old drugs, Ballentine et al. repurposed old data and tools to define a new framework….”

The doors of precision: Reenergizing psychiatric drug development with psychedelics and open access computational tools

“In a truly remarkable way, the study was performed at essentially no additional cost. Ballentine et al. (3) made use of existent, openly available resources: the Erowid psychedelic “experience vault,” the pharmacokinetic profiles of each psychedelic, the Allen Human Brain gene transcription profiles, and the Schafer-Yeo brain atlas that mapped gene transcript to brain structure. The computational tools—primarily python toolboxes—that Ballentine et al. deployed were also available at no cost. So in the same way that the psychedelics industry is repurposing old drugs, Ballentine et al. repurposed old data and tools to define a new framework….”

Which solutions best support sharing and reuse of code? – The Official PLOS Blog

“PLOS has released a preprint and supporting data on research conducted to understand the needs and habits of researchers in relation to code sharing and reuse as well as to gather feedback on prototype code notebooks and help determine strategies that publishers could use to increase code sharing.

Our previous research led us to implement a mandatory code sharing policy at PLOS Computational Biology in March 2021 to increase the amount of code shared alongside published articles. As well as exploring policy to support code sharing, we have also been collaborating with NeuroLibre, an initiative of the Canadian Open Neuroscience Platform, to learn more about the potential role of technological solutions for enhancing code sharing. Neurolibre is one of a growing number of interactive or executable technologies for sharing and publishing research, some of which have become integrated with publishers’ workflows….”

Data Reuse Days 2022, March 14-24, 2022 | Wikidata Events

The Data Reuse Days are a series of gatherings taking place from March 14th to 24th, 2022, focusing on Wikidata data reuse and reusers. With presentations, discussions, editing sprints and more, the main goal of this event is to provide a space to bring together anyone interested in the topic of re-using Wikidata’s data. This means for example:

to gather people who reuse Wikidata’s data (on products, apps, websites, research, etc.) in order to understand better what they are building and what their needs and wishes regarding Wikidata’s data and technical infrastructure are
to bring together data reusers and data editors to talk about issues, wishes and common efforts, so each group can hear the other’s point of view on things to improve (data quality, ontologies, etc.)
to onboard developers who want to build applications on top of Wikidata’s data, as well as editors from other Wikimedia projects

How to participate?

This initiative is open to everyone who’s interested in reusing Wikidata’s data. As a first step, you can add yourself to the list of participants.
On this page, you will find an overview of the schedule and resources to get started. You’re welcome to join any session that you find interesting.
This initiative is coordinated by Lea Lacroix (WMDE) but its content is community-powered: if you want to organize a session, work on a project, help with documentation, you’re very welcome to add it to the schedule (instructions TBD).

 

A survey of researchers’ code sharing and code reuse practices, and assessment of interactive notebook prototypes | OSF Preprints

Cadwallader, L., & Hrynaszkiewicz, I. (2022, March 2). A survey of researchers’ code sharing and code reuse practices, and assessment of interactive notebook prototypes. https://doi.org/10.31219/osf.io/tys8p

Abstract: This research aimed to understand the needs and habits of researchers in relation to code sharing and reuse; gather feedback on prototype code notebooks created by Neurolibre; and help determine strategies that publishers could use to increase code sharing. We surveyed 188 researchers in computational biology. Respondents were asked about how often and why they look at code, which methods of accessing code they find useful and why and what aspects of code sharing are important to them, and how satisfied they are with their ability to complete these. Respondents were asked to look at a prototype code notebook and give feedback on its features. Respondents were also asked how much time they spent preparing code and if they would be willing to increase this to use a code sharing tool, such as a notebook. As a reader of research articles the most common reason (70%) for looking at code was to gain a better understanding of the article. The most commonly encountered method for code sharing – linking articles to a code repository — was also the most useful method of accessing code from the reader’s perspective. As authors, the respondents were largely satisfied with their ability to carry out tasks related to code sharing. The most important of these tasks were ensuring that the code was running in the correct environment, and sharing code with good documentation. The average researcher, according to our results, is unwilling to incur additional costs (in time, effort or expenditure) that are currently needed to use code sharing tools alongside a publication. We infer this means we need different models for funding and producing interactive or executable research outputs if they are to reach a large number of researchers. For the purpose of increasing the amount of code shared by authors, PLOS Computational Biology is, as a result, focusing on policy rather than tools.

The reuse of public datasets in the life sciences: potential risks and rewards [PeerJ]

Abstract:  The ‘big data’ revolution has enabled novel types of analyses in the life sciences, facilitated by public sharing and reuse of datasets. Here, we review the prodigious potential of reusing publicly available datasets and the associated challenges, limitations and risks. Possible solutions to issues and research integrity considerations are also discussed. Due to the prominence, abundance and wide distribution of sequencing data, we focus on the reuse of publicly available sequence datasets. We define ‘successful reuse’ as the use of previously published data to enable novel scientific findings. By using selected examples of successful reuse from different disciplines, we illustrate the enormous potential of the practice, while acknowledging the respective limitations and risks. A checklist to determine the reuse value and potential of a particular dataset is also provided. The open discussion of data reuse and the establishment of this practice as a norm has the potential to benefit all stakeholders in the life sciences.

 

Biodiversity Community Integrated Knowledge Library (BiCIKL)

Abstract:  BiCIKL is an European Union Horizon 2020 project that will initiate and build a new European starting community of key research infrastructures, establishing open science practices in the domain of biodiversity through provision of access to data, associated tools and services at each separate stage of and along the entire research cycle. BiCIKL will provide new methods and workflows for an integrated access to harvesting, liberating, linking, accessing and re-using of subarticle-level data (specimens, material citations, samples, sequences, taxonomic names, taxonomic treatments, figures, tables) extracted from literature. BiCIKL will provide for the first time access and tools for seamless linking and usage tracking of data along the line: specimens > sequences > species > analytics > publications > biodiversity knowledge graph > re-use.