Infrastructures are being developed to enhance and facilitate the sharing of cohort data internationally. However, empirical studies show that many barriers impede sharing data broadly.
Therefore, our aim is to describe the barriers and concerns for the sharing of cohort data, and the implications for data sharing platforms.
Seventeen participants involved in developing data sharing platforms or tied to cohorts that are to be submitted to platforms were recruited for semi-structured interviews to share views and experiences regarding data sharing.
Credit and recognition, the potential misuse of data, loss of control, lack of resources, socio-cultural factors and ethical and legal barriers are elements that influence decisions on data sharing. Core values underlying these reasons are equality, reciprocity, trust, transparency, gratification and beneficence.
Data generators might use data sharing platforms primarily for collaborative modes of working and network building. Data generators might be unwilling to contribute and share for non-collaborative work, or if no financial resources are provided for sharing data.
“Digital objects are inextricably dependent on their context, the infrastructure of people, processes, and technology that care for them. The FAIR Principles are at the heart of the data ecosystem, but they do not specify how digital objects are made FAIR or for how long they should be kept FAIR. This perspective is provided by the Trustworthy Digital Repository (TDR) requirements by defining long-term digital object preservation expectations. We’re all doing something for someone, and to deliver an effective service at scale, we need a sense of the types of users we have and how we can meet their needs, also in the future.
FAIRsFAIR, SSHOC, and EOSC Nordic are all supporting digital repositories in their journey to achieve TDR status. When sharing experiences, the project teams found out that two fundamental TDR concepts are not always easy to understand: preservation and Designated Community. The draft working paper FAIR + Time: Preservation for a Designated Community was prepared in collaboration with the three projects. It seeks to present key concepts and expand on them to specify the standards and assessments required for an interoperable ecosystem of FAIR (findable, accessible, interoperable and reusable) data preserved for the long term in generalist and specialist FAIR-enabling trustworthy digital repositories (TDR) for a defined designated community of users. It seeks to provide context and define these concepts for audiences familiar with research data and technical data management systems but with less direct experience of digital preservation and trustworthy digital repositories. This is intended to help clarify which organisations are potential candidates to receive CoreTrustSeal TDR status and identify and support the types of organisations that may not be candidates but play a vital role in the data ecosystem. …”
Sharing qualitative data does not mean depositing them somewhere on the internet.
Sharing qualitative data through data repositories enables controlling secondary use and is safe.
Research data archives offer help in processing data for reuse and some even offer financial support….”
“The Biodiversity Literature Repository (BLR) has been growing from a community on Zenodo to be a service dedicated to liberate and make open access, FAIR (findable, accessible, interoperable and reusable) data hidden in the hundreds of millions of pages of scholarly publications.
It is built on top of Zenodo, a digital repository hosted at CERN, which provides a sustainable and robust infrastructure for long tail research data, which can consist of small datasets that otherwise would be lost.
Originally a collaboration between Zenodo, Plazi and Pensoft, BLR began as a repository for taxonomic publications which lacked Digital Object Identifiers (DOI) and thus were effectively orphaned from the network of online citations. As it grew its scope expanded to morphed into a highly interlinked repository that focuses on include illustrations and taxonomic treatments contained in publications with all these content types interlinked among themselves and enhanced with and rich metadata.
The source data for BLR are scholarly publications that are most often in PDF or html format but sometimes in XML formats whose structured data facilitates the automated data extraction.
The largest data users are the Global Biodiversity Information Facility (GBIF) and the United States’ National Center for Biotechnology Information (NCBI).
Support of BLR comes from the Arcadia Fund and the three partner institutions Zenodo, Plazi and Pensoft.”
The purpose of this paper is to report a study of how research literature addresses researchers’ attitudes toward data repository use. In particular, the authors are interested in how the term data sharing is defined, how data repository use is reported and whether there is need for greater clarity and specificity of terminology.
To study how the literature addresses researcher data repository use, relevant studies were identified by searching Library Information Science and Technology Abstracts, Library and Information Science Source, Thomas Reuters’ Web of Science Core Collection and Scopus. A total of 62 studies were identified for inclusion in this meta-evaluation.
The study shows a need for greater clarity and consistency in the use of the term data sharing in future studies to better understand the phenomenon and allow for cross-study comparisons. Furthermore, most studies did not address data repository use specifically. In most analyzed studies, it was not possible to segregate results relating to sharing via public data repositories from other types of sharing. When sharing in public repositories was mentioned, the prevalence of repository use varied significantly.
Researchers’ data sharing is of great interest to library and information science research and practice to inform academic libraries that are implementing data services to support these researchers. This study explores how the literature approaches this issue, especially the use of data repositories, the use of which is strongly encouraged. This paper identifies the potential for additional study focused on this area.
Abstract: The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic will be remembered as one of the defining events of the 21st century. The rapid global outbreak has had significant impacts on human society and is already responsible for millions of deaths. Understanding and tackling the impact of the virus has required a worldwide mobilisation and coordination of scientific research. The COVID-19 Data Portal (https://www.covid19dataportal.org/) was first released as part of the European COVID-19 Data Platform, on April 20th 2020 to facilitate rapid and open data sharing and analysis, to accelerate global SARS-CoV-2 and COVID-19 research. The COVID-19 Data Portal has fortnightly feature releases to continue to add new data types, search options, visualisations and improvements based on user feedback and research. The open datasets and intuitive suite of search, identification and download services, represent a truly FAIR (Findable, Accessible, Interoperable and Reusable) resource that enables researchers to easily identify and quickly obtain the key datasets needed for their COVID-19 research.
“Global-south scientists say that an open-access movement led by wealthy nations deprives them of credit and undermines their efforts….
But a growing faction of scientists, mostly from wealthy nations, argues that sequences should be shared on databases with no gatekeeping at all. They say this would allow huge analyses combining hundreds of thousands of genomes from different databases to flow seamlessly, and therefore deliver results more rapidly.
The debate has caught the attention of the US National Institutes of Health (NIH) — which runs its own genome repository, called GenBank — and the Bill & Melinda Gates Foundation, which has considered encouraging grantees to share on sites without such strong protections, Nature has learnt.
But many researchers — particularly those in resource-limited countries — are pushing back. They tell Nature that they see potential for exploitation in this no-strings-attached approach — and that GISAID’s gatekeeping is one of its biggest attractions because it ensures that users who analyse sequences from GISAID acknowledge those who deposited them. The database also requests that users seek to collaborate with the depositors….
Fears of inequitable data use are amplified by the fact that only 0.3% of COVID-19 vaccines have gone to low-income countries. “Imagine Africans working so hard to contribute to a database that’s used to make or update vaccines, and then we don’t get access to the vaccines,” says Christian Happi, a microbiologist at the African Centre of Excellence for Genomics of Infectious Diseases in Ede, Nigeria. “It’s very demoralizing.” …”
Abstract: DataONE, funded from 2009-2019 by the U.S. National Science Foundation, is an early example of a large-scale project that built both a cyberinfrastructure and culture of data discovery, sharing, and reuse. DataONE used a Working Group model, where a diverse group of participants collaborated on targeted research and development activities to achieve broader project goals. This article summarizes the work carried out by two of DataONE’s working groups: Usability & Assessment (2009-2019) and Sociocultural Issues (2009-2014). The activities of these working groups provide a unique longitudinal look at how scientists, librarians, and other key stakeholders engaged in convergence research to identify and analyze practices around research data management through the development of boundary objects, an iterative assessment program, and reflection. Members of the working groups disseminated their findings widely in papers, presentations, and datasets, reaching international audiences through publications in 25 different journals and presentations to over 5,000 people at interdisciplinary venues. The working groups helped inform the DataONE cyberinfrastructure and influenced the evolving data management landscape. By studying working groups over time, the paper also presents lessons learned about the working group model for global large-scale projects that bring together participants from multiple disciplines and communities in convergence research.
“genomeRxiv is a newly-funded US-UK collaboration to provide a public, web-accessible database of public genome sequences, accurately catalogued and classified by whole-genome similarity independent of their taxonomic affiliation. Our goal is to supply the basic and applied research community with rapid, precise and accurate identification of unknown isolates based on genome sequence alone, and with molecular tools for environmental analysis….”
ARL is heartened to see Congress acknowledge the necessity of machine-readable data management plans (DMPs) and open repositories in supporting the academic research enterprise. At a National Science Foundation–funded conference on effective data practices in December 2019, ARL, along with the Association of American Universities, the Association of Public and Land-grant Universities, and the California Digital Library, convened stakeholders including university research officers, scientists, and librarians. Conference participants agreed that data management planning is important for sharing and use of research data and outputs. Participants suggested that the ability to update plans (“just in time”) across the project life cycle and as part of progress reporting would accelerate the value and adoption of DMPs among researchers, beyond what is required for compliance.
ARL encourages the development of a collaborative set of data repository criteria. Coordination among federal agencies will be necessary, as will stakeholder input from researchers, repository managers, librarians, and others. ARL looks forward to continuing these conversations and building upon work already underway within groups such as the Confederation of Open Access Repositories, the Research Data Alliance, and the World Data System….”
Abstract: A group of publishers came together to discuss how we could reduce the complexity and inconsistency provided in publisher’s advice to researchers when selecting an appropriate data repository. It is a shared goal among publishers and other stakeholders to increase repository use – which remains far from optimal – and we assume that helping researchers find a suitable repository more easily will help achieve this.
To address this a list of features has been created and it is intended only as a framework within which publishers can make recommendations to researchers, not as a way to restrict which repositories researchers may choose for their data. Our intention is that the features we highlight will act to initiate engagement and collaboration among publishers, repositories and the RPOs, government and funders that ultimately make the policies around Open Research. As we start this conversation, it is important that we act together with other stakeholders to raise awareness of the challenges involved around FAIR data and to prevent any perverse consequences.
From the RDA FAIRsharing WG point of view, the ultimate objective is to map repository features across all existing initiatives, and to identify a common core set of metadata fields that all stakeholders want to see in registry of repositories. The FAIRsharing registry in particular is agnostic as to the selection process of standards, repositories and policies, as part of its commitment to working with and for all stakeholder groups.
“As a Data Architect, Sabrina is available to support DGHI in achieving their data sharing goals. She takes a holistic approach to identifying areas where the team needs data support. Considering at each stage of the project lifecycle how system design and data architecture will influence how data can be shared. This may entail drafting informed consent documents, developing strategies for de-identification, curating and managing data, or discovering solutions for data storage and publishing. For instance, in collaboration with CDVS Research Data Management Consultants, Sabrina has helped AMANI create a Dataverse to enable sharing restricted access health data for international junior researchers. Data from one of DGHI’s studies are also available in the Duke Research Data Repository….
Reproducibility is another reason that sharing and publishing data is important to Sabrina. DGHI wants to increase data availability in accordance with FAIR principles so other researchers can independently verify, reproduce, and iterate on their work. This supports peers and contributes to the advancement of the field. Publishing data in an open repository can also increase their reach and impact. DGHI is also currently examining how to incorporate the CARE principles and other frameworks for ethical data sharing within their international collaborations….”
“PsychOpen CAMA enables accessing meta-analytic datasets, reproducing meta-analyses and dynamically updating evidence from new primary studies collaboratively….
A CAMA (Community Augmented Meta Analysis) is an open repository for meta-analytic data, that provides meta-analytic analysis tools….
PsychOpen CAMA enables easy access and automated reproducibility of meta-analyses in psychology and related fields. This has several benefits for the research community:
Evidence can be kept updated by adding new studies published after the meta-analysis.
Researchers with special research questions can use subsets of the data or rerun meta-analyses using different moderators.
Flexible analyses with the datasets enable the application of new statistical procedures or different graphical displays.
The cumulated evidence in the CAMA can be used to get a quick overview of existing research gaps. This may give an idea of which study designs or moderators may be especially interesting for future studies to use limited resources for research in a way to enhance evidence.
Given existing meta-analytic evidence, the necessary sample size of future studies to detect an effect of a reasonable size can be estimated. Moreover, the effect of possible future studies on the results of the existing meta-analytic evidence can be simulated.
PsychOpen CAMA offers tutorials to better understand the reasoning behind meta-analyses and to learn the basic steps of conducting a meta-analysis to empower other researchers to contribute to our project for the benefit of the research community….”
Abstract: Over the past three years, “Data Repository Selection-Criteria That Matter” – “a set of criteria for the identification and selection of those data repositories that accept research data submissions” – were developed by a group of publishers facilitated by the FAIRsharing initiative. Throughout this time, a large number of organizations and individuals have formulated responses and expressed concern about the criteria and the process through which the criteria were developed. Collectively, our organizations consider that the “Data Repository: Selection Criteria that Matter” recommendations – as currently conceived – will act as an impediment to achieving these aims. As such, we are issuing this Joint Position Statement to highlight the community’s concerns and request that the authors of these criteria respond with specific actions.
“Cryogenic electron microscopy (cryo-EM) methods began to be used in the mid-1970s to study thin and periodic arrays of proteins. Following a half-century of development in cryo-specimen preparation, instrumentation, data collection, data processing and modeling software, cryo-EM has become a routine method for solving structures from large biological assemblies to small biomolecules at near to true atomic resolution. This review explores the critical roles played by the Protein Data Bank (PDB) and Electron Microscopy Data Bank (EMDB) in partnership with the community to develop the necessary infrastructure to archive cryo-EM maps and associated models. Public access to cryo-EM structure data has in turn facilitated better understanding of structure-function relationships and advancement of image processing and modeling tool development. The partnership between the global cryo-EM community and PDB and EMDB leadership has synergistically shaped the standards for metadata, one-stop deposition of maps and models, and validation metrics to assess the quality of cryo-EM structures. The advent of cryo-electron tomography (cryo-ET) for in situ molecular cell structures at a broad resolution range and their correlations with other imaging data introduces new data archival challenges in terms of data size and complexity in the years to come.