“The Biodiversity Literature Repository (BLR) has been growing from a community on Zenodo to be a service dedicated to liberate and make open access, FAIR (findable, accessible, interoperable and reusable) data hidden in the hundreds of millions of pages of scholarly publications.
It is built on top of Zenodo, a digital repository hosted at CERN, which provides a sustainable and robust infrastructure for long tail research data, which can consist of small datasets that otherwise would be lost.
Originally a collaboration between Zenodo, Plazi and Pensoft, BLR began as a repository for taxonomic publications which lacked Digital Object Identifiers (DOI) and thus were effectively orphaned from the network of online citations. As it grew its scope expanded to morphed into a highly interlinked repository that focuses on include illustrations and taxonomic treatments contained in publications with all these content types interlinked among themselves and enhanced with and rich metadata.
The source data for BLR are scholarly publications that are most often in PDF or html format but sometimes in XML formats whose structured data facilitates the automated data extraction.
The largest data users are the Global Biodiversity Information Facility (GBIF) and the United States’ National Center for Biotechnology Information (NCBI).
Support of BLR comes from the Arcadia Fund and the three partner institutions Zenodo, Plazi and Pensoft.”
The purpose of this paper is to report a study of how research literature addresses researchers’ attitudes toward data repository use. In particular, the authors are interested in how the term data sharing is defined, how data repository use is reported and whether there is need for greater clarity and specificity of terminology.
To study how the literature addresses researcher data repository use, relevant studies were identified by searching Library Information Science and Technology Abstracts, Library and Information Science Source, Thomas Reuters’ Web of Science Core Collection and Scopus. A total of 62 studies were identified for inclusion in this meta-evaluation.
The study shows a need for greater clarity and consistency in the use of the term data sharing in future studies to better understand the phenomenon and allow for cross-study comparisons. Furthermore, most studies did not address data repository use specifically. In most analyzed studies, it was not possible to segregate results relating to sharing via public data repositories from other types of sharing. When sharing in public repositories was mentioned, the prevalence of repository use varied significantly.
Researchers’ data sharing is of great interest to library and information science research and practice to inform academic libraries that are implementing data services to support these researchers. This study explores how the literature approaches this issue, especially the use of data repositories, the use of which is strongly encouraged. This paper identifies the potential for additional study focused on this area.
Abstract: The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic will be remembered as one of the defining events of the 21st century. The rapid global outbreak has had significant impacts on human society and is already responsible for millions of deaths. Understanding and tackling the impact of the virus has required a worldwide mobilisation and coordination of scientific research. The COVID-19 Data Portal (https://www.covid19dataportal.org/) was first released as part of the European COVID-19 Data Platform, on April 20th 2020 to facilitate rapid and open data sharing and analysis, to accelerate global SARS-CoV-2 and COVID-19 research. The COVID-19 Data Portal has fortnightly feature releases to continue to add new data types, search options, visualisations and improvements based on user feedback and research. The open datasets and intuitive suite of search, identification and download services, represent a truly FAIR (Findable, Accessible, Interoperable and Reusable) resource that enables researchers to easily identify and quickly obtain the key datasets needed for their COVID-19 research.
“Global-south scientists say that an open-access movement led by wealthy nations deprives them of credit and undermines their efforts….
But a growing faction of scientists, mostly from wealthy nations, argues that sequences should be shared on databases with no gatekeeping at all. They say this would allow huge analyses combining hundreds of thousands of genomes from different databases to flow seamlessly, and therefore deliver results more rapidly.
The debate has caught the attention of the US National Institutes of Health (NIH) — which runs its own genome repository, called GenBank — and the Bill & Melinda Gates Foundation, which has considered encouraging grantees to share on sites without such strong protections, Nature has learnt.
But many researchers — particularly those in resource-limited countries — are pushing back. They tell Nature that they see potential for exploitation in this no-strings-attached approach — and that GISAID’s gatekeeping is one of its biggest attractions because it ensures that users who analyse sequences from GISAID acknowledge those who deposited them. The database also requests that users seek to collaborate with the depositors….
Fears of inequitable data use are amplified by the fact that only 0.3% of COVID-19 vaccines have gone to low-income countries. “Imagine Africans working so hard to contribute to a database that’s used to make or update vaccines, and then we don’t get access to the vaccines,” says Christian Happi, a microbiologist at the African Centre of Excellence for Genomics of Infectious Diseases in Ede, Nigeria. “It’s very demoralizing.” …”
Abstract: DataONE, funded from 2009-2019 by the U.S. National Science Foundation, is an early example of a large-scale project that built both a cyberinfrastructure and culture of data discovery, sharing, and reuse. DataONE used a Working Group model, where a diverse group of participants collaborated on targeted research and development activities to achieve broader project goals. This article summarizes the work carried out by two of DataONE’s working groups: Usability & Assessment (2009-2019) and Sociocultural Issues (2009-2014). The activities of these working groups provide a unique longitudinal look at how scientists, librarians, and other key stakeholders engaged in convergence research to identify and analyze practices around research data management through the development of boundary objects, an iterative assessment program, and reflection. Members of the working groups disseminated their findings widely in papers, presentations, and datasets, reaching international audiences through publications in 25 different journals and presentations to over 5,000 people at interdisciplinary venues. The working groups helped inform the DataONE cyberinfrastructure and influenced the evolving data management landscape. By studying working groups over time, the paper also presents lessons learned about the working group model for global large-scale projects that bring together participants from multiple disciplines and communities in convergence research.
“genomeRxiv is a newly-funded US-UK collaboration to provide a public, web-accessible database of public genome sequences, accurately catalogued and classified by whole-genome similarity independent of their taxonomic affiliation. Our goal is to supply the basic and applied research community with rapid, precise and accurate identification of unknown isolates based on genome sequence alone, and with molecular tools for environmental analysis….”
ARL is heartened to see Congress acknowledge the necessity of machine-readable data management plans (DMPs) and open repositories in supporting the academic research enterprise. At a National Science Foundation–funded conference on effective data practices in December 2019, ARL, along with the Association of American Universities, the Association of Public and Land-grant Universities, and the California Digital Library, convened stakeholders including university research officers, scientists, and librarians. Conference participants agreed that data management planning is important for sharing and use of research data and outputs. Participants suggested that the ability to update plans (“just in time”) across the project life cycle and as part of progress reporting would accelerate the value and adoption of DMPs among researchers, beyond what is required for compliance.
ARL encourages the development of a collaborative set of data repository criteria. Coordination among federal agencies will be necessary, as will stakeholder input from researchers, repository managers, librarians, and others. ARL looks forward to continuing these conversations and building upon work already underway within groups such as the Confederation of Open Access Repositories, the Research Data Alliance, and the World Data System….”
Abstract: A group of publishers came together to discuss how we could reduce the complexity and inconsistency provided in publisher’s advice to researchers when selecting an appropriate data repository. It is a shared goal among publishers and other stakeholders to increase repository use – which remains far from optimal – and we assume that helping researchers find a suitable repository more easily will help achieve this.
To address this a list of features has been created and it is intended only as a framework within which publishers can make recommendations to researchers, not as a way to restrict which repositories researchers may choose for their data. Our intention is that the features we highlight will act to initiate engagement and collaboration among publishers, repositories and the RPOs, government and funders that ultimately make the policies around Open Research. As we start this conversation, it is important that we act together with other stakeholders to raise awareness of the challenges involved around FAIR data and to prevent any perverse consequences.
From the RDA FAIRsharing WG point of view, the ultimate objective is to map repository features across all existing initiatives, and to identify a common core set of metadata fields that all stakeholders want to see in registry of repositories. The FAIRsharing registry in particular is agnostic as to the selection process of standards, repositories and policies, as part of its commitment to working with and for all stakeholder groups.
“As a Data Architect, Sabrina is available to support DGHI in achieving their data sharing goals. She takes a holistic approach to identifying areas where the team needs data support. Considering at each stage of the project lifecycle how system design and data architecture will influence how data can be shared. This may entail drafting informed consent documents, developing strategies for de-identification, curating and managing data, or discovering solutions for data storage and publishing. For instance, in collaboration with CDVS Research Data Management Consultants, Sabrina has helped AMANI create a Dataverse to enable sharing restricted access health data for international junior researchers. Data from one of DGHI’s studies are also available in the Duke Research Data Repository….
Reproducibility is another reason that sharing and publishing data is important to Sabrina. DGHI wants to increase data availability in accordance with FAIR principles so other researchers can independently verify, reproduce, and iterate on their work. This supports peers and contributes to the advancement of the field. Publishing data in an open repository can also increase their reach and impact. DGHI is also currently examining how to incorporate the CARE principles and other frameworks for ethical data sharing within their international collaborations….”
“PsychOpen CAMA enables accessing meta-analytic datasets, reproducing meta-analyses and dynamically updating evidence from new primary studies collaboratively….
A CAMA (Community Augmented Meta Analysis) is an open repository for meta-analytic data, that provides meta-analytic analysis tools….
PsychOpen CAMA enables easy access and automated reproducibility of meta-analyses in psychology and related fields. This has several benefits for the research community:
Evidence can be kept updated by adding new studies published after the meta-analysis.
Researchers with special research questions can use subsets of the data or rerun meta-analyses using different moderators.
Flexible analyses with the datasets enable the application of new statistical procedures or different graphical displays.
The cumulated evidence in the CAMA can be used to get a quick overview of existing research gaps. This may give an idea of which study designs or moderators may be especially interesting for future studies to use limited resources for research in a way to enhance evidence.
Given existing meta-analytic evidence, the necessary sample size of future studies to detect an effect of a reasonable size can be estimated. Moreover, the effect of possible future studies on the results of the existing meta-analytic evidence can be simulated.
PsychOpen CAMA offers tutorials to better understand the reasoning behind meta-analyses and to learn the basic steps of conducting a meta-analysis to empower other researchers to contribute to our project for the benefit of the research community….”