“NIH is committed to making findings from the research that it funds accessible and available in a timely manner, while also providing safeguards for privacy, intellectual property, security, and data management. For instance, NIH-funded investigators are expected to make the results and accomplishments of their activities freely available within 12 months of publication. NIH also encourages investigators to share results prior to peer review, such as through preprints, to speed the dissemination of their findings and enhance the rigor of their work through informal peer review. A robust culture of data sharing is critical to continued progress in science, maximizing NIH’s investment in research, and assurance of the highest levels of transparency and rigor. To this end, NIH will continue to promote opportunities for data management and sharing while allowing flexibility for various data types, sharing platforms, and strategies. Additionally, NIH is implementing a policy requiring that all applications include data sharing and management plans that consider input from stakeholders….”
“The National Institutes of Health (NIH) released today a Request for Information (RFI) on streamlining access to controlled data from NIH data repositories (NOT-OD-21-157). Responses are due Aug. 9.
The NIH is requesting input on strategies for harmonizing, simplifying, and streamlining mechanisms for accessing data in NIH-supported controlled-access data repositories that continue to uphold robust data privacy and security protections. In particular, NIH would like to understand better researchers’ experiences in finding and accessing controlled access data housed in NIH-supported repositories and the extent to which existing NIH policies address aggregation and linkage of controlled access data….”
“I sincerely believe that there is a need for more venues that talk about emerging scholarly content types such as research data, research software or preprints as scholarly outputs. The Front Matter Blog hopes to become such a venue. As a starting point I have added (almost) all my blog posts since 2007, collected from my previous blogging locations (Nature Network, PLOS Blogs, my Personal Blog, and the DataCite blog), and I hope at least some of them still make an interesting read all these years later….
But Front Matter is more than a blogging platform. It is also a consulting business, which will help with building and hosting scholarly infrastructure. To kick this off, I am involved with development work for the invenioRDM data management repository platform. More on that in the next blog post on Thursday.”
Infrastructures are being developed to enhance and facilitate the sharing of cohort data internationally. However, empirical studies show that many barriers impede sharing data broadly.
Therefore, our aim is to describe the barriers and concerns for the sharing of cohort data, and the implications for data sharing platforms.
Seventeen participants involved in developing data sharing platforms or tied to cohorts that are to be submitted to platforms were recruited for semi-structured interviews to share views and experiences regarding data sharing.
Credit and recognition, the potential misuse of data, loss of control, lack of resources, socio-cultural factors and ethical and legal barriers are elements that influence decisions on data sharing. Core values underlying these reasons are equality, reciprocity, trust, transparency, gratification and beneficence.
Data generators might use data sharing platforms primarily for collaborative modes of working and network building. Data generators might be unwilling to contribute and share for non-collaborative work, or if no financial resources are provided for sharing data.
“Digital objects are inextricably dependent on their context, the infrastructure of people, processes, and technology that care for them. The FAIR Principles are at the heart of the data ecosystem, but they do not specify how digital objects are made FAIR or for how long they should be kept FAIR. This perspective is provided by the Trustworthy Digital Repository (TDR) requirements by defining long-term digital object preservation expectations. We’re all doing something for someone, and to deliver an effective service at scale, we need a sense of the types of users we have and how we can meet their needs, also in the future.
FAIRsFAIR, SSHOC, and EOSC Nordic are all supporting digital repositories in their journey to achieve TDR status. When sharing experiences, the project teams found out that two fundamental TDR concepts are not always easy to understand: preservation and Designated Community. The draft working paper FAIR + Time: Preservation for a Designated Community was prepared in collaboration with the three projects. It seeks to present key concepts and expand on them to specify the standards and assessments required for an interoperable ecosystem of FAIR (findable, accessible, interoperable and reusable) data preserved for the long term in generalist and specialist FAIR-enabling trustworthy digital repositories (TDR) for a defined designated community of users. It seeks to provide context and define these concepts for audiences familiar with research data and technical data management systems but with less direct experience of digital preservation and trustworthy digital repositories. This is intended to help clarify which organisations are potential candidates to receive CoreTrustSeal TDR status and identify and support the types of organisations that may not be candidates but play a vital role in the data ecosystem. …”
“Three key learnings:
Sharing qualitative data does not mean depositing them somewhere on the internet.
Sharing qualitative data through data repositories enables controlling secondary use and is safe.
Research data archives offer help in processing data for reuse and some even offer financial support….”
“The Biodiversity Literature Repository (BLR) has been growing from a community on Zenodo to be a service dedicated to liberate and make open access, FAIR (findable, accessible, interoperable and reusable) data hidden in the hundreds of millions of pages of scholarly publications.
It is built on top of Zenodo, a digital repository hosted at CERN, which provides a sustainable and robust infrastructure for long tail research data, which can consist of small datasets that otherwise would be lost.
Originally a collaboration between Zenodo, Plazi and Pensoft, BLR began as a repository for taxonomic publications which lacked Digital Object Identifiers (DOI) and thus were effectively orphaned from the network of online citations. As it grew its scope expanded to morphed into a highly interlinked repository that focuses on include illustrations and taxonomic treatments contained in publications with all these content types interlinked among themselves and enhanced with and rich metadata.
The source data for BLR are scholarly publications that are most often in PDF or html format but sometimes in XML formats whose structured data facilitates the automated data extraction.
The largest data users are the Global Biodiversity Information Facility (GBIF) and the United States’ National Center for Biotechnology Information (NCBI).
Support of BLR comes from the Arcadia Fund and the three partner institutions Zenodo, Plazi and Pensoft.”
The purpose of this paper is to report a study of how research literature addresses researchers’ attitudes toward data repository use. In particular, the authors are interested in how the term data sharing is defined, how data repository use is reported and whether there is need for greater clarity and specificity of terminology.
To study how the literature addresses researcher data repository use, relevant studies were identified by searching Library Information Science and Technology Abstracts, Library and Information Science Source, Thomas Reuters’ Web of Science Core Collection and Scopus. A total of 62 studies were identified for inclusion in this meta-evaluation.
The study shows a need for greater clarity and consistency in the use of the term data sharing in future studies to better understand the phenomenon and allow for cross-study comparisons. Furthermore, most studies did not address data repository use specifically. In most analyzed studies, it was not possible to segregate results relating to sharing via public data repositories from other types of sharing. When sharing in public repositories was mentioned, the prevalence of repository use varied significantly.
Researchers’ data sharing is of great interest to library and information science research and practice to inform academic libraries that are implementing data services to support these researchers. This study explores how the literature approaches this issue, especially the use of data repositories, the use of which is strongly encouraged. This paper identifies the potential for additional study focused on this area.
Abstract: The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic will be remembered as one of the defining events of the 21st century. The rapid global outbreak has had significant impacts on human society and is already responsible for millions of deaths. Understanding and tackling the impact of the virus has required a worldwide mobilisation and coordination of scientific research. The COVID-19 Data Portal (https://www.covid19dataportal.org/) was first released as part of the European COVID-19 Data Platform, on April 20th 2020 to facilitate rapid and open data sharing and analysis, to accelerate global SARS-CoV-2 and COVID-19 research. The COVID-19 Data Portal has fortnightly feature releases to continue to add new data types, search options, visualisations and improvements based on user feedback and research. The open datasets and intuitive suite of search, identification and download services, represent a truly FAIR (Findable, Accessible, Interoperable and Reusable) resource that enables researchers to easily identify and quickly obtain the key datasets needed for their COVID-19 research.
“Global-south scientists say that an open-access movement led by wealthy nations deprives them of credit and undermines their efforts….
But a growing faction of scientists, mostly from wealthy nations, argues that sequences should be shared on databases with no gatekeeping at all. They say this would allow huge analyses combining hundreds of thousands of genomes from different databases to flow seamlessly, and therefore deliver results more rapidly.
The debate has caught the attention of the US National Institutes of Health (NIH) — which runs its own genome repository, called GenBank — and the Bill & Melinda Gates Foundation, which has considered encouraging grantees to share on sites without such strong protections, Nature has learnt.
But many researchers — particularly those in resource-limited countries — are pushing back. They tell Nature that they see potential for exploitation in this no-strings-attached approach — and that GISAID’s gatekeeping is one of its biggest attractions because it ensures that users who analyse sequences from GISAID acknowledge those who deposited them. The database also requests that users seek to collaborate with the depositors….
Fears of inequitable data use are amplified by the fact that only 0.3% of COVID-19 vaccines have gone to low-income countries. “Imagine Africans working so hard to contribute to a database that’s used to make or update vaccines, and then we don’t get access to the vaccines,” says Christian Happi, a microbiologist at the African Centre of Excellence for Genomics of Infectious Diseases in Ede, Nigeria. “It’s very demoralizing.” …”
Abstract: DataONE, funded from 2009-2019 by the U.S. National Science Foundation, is an early example of a large-scale project that built both a cyberinfrastructure and culture of data discovery, sharing, and reuse. DataONE used a Working Group model, where a diverse group of participants collaborated on targeted research and development activities to achieve broader project goals. This article summarizes the work carried out by two of DataONE’s working groups: Usability & Assessment (2009-2019) and Sociocultural Issues (2009-2014). The activities of these working groups provide a unique longitudinal look at how scientists, librarians, and other key stakeholders engaged in convergence research to identify and analyze practices around research data management through the development of boundary objects, an iterative assessment program, and reflection. Members of the working groups disseminated their findings widely in papers, presentations, and datasets, reaching international audiences through publications in 25 different journals and presentations to over 5,000 people at interdisciplinary venues. The working groups helped inform the DataONE cyberinfrastructure and influenced the evolving data management landscape. By studying working groups over time, the paper also presents lessons learned about the working group model for global large-scale projects that bring together participants from multiple disciplines and communities in convergence research.
“genomeRxiv is a newly-funded US-UK collaboration to provide a public, web-accessible database of public genome sequences, accurately catalogued and classified by whole-genome similarity independent of their taxonomic affiliation. Our goal is to supply the basic and applied research community with rapid, precise and accurate identification of unknown isolates based on genome sequence alone, and with molecular tools for environmental analysis….”
“Data Management Plans
ARL is heartened to see Congress acknowledge the necessity of machine-readable data management plans (DMPs) and open repositories in supporting the academic research enterprise. At a National Science Foundation–funded conference on effective data practices in December 2019, ARL, along with the Association of American Universities, the Association of Public and Land-grant Universities, and the California Digital Library, convened stakeholders including university research officers, scientists, and librarians. Conference participants agreed that data management planning is important for sharing and use of research data and outputs. Participants suggested that the ability to update plans (“just in time”) across the project life cycle and as part of progress reporting would accelerate the value and adoption of DMPs among researchers, beyond what is required for compliance.
ARL encourages the development of a collaborative set of data repository criteria. Coordination among federal agencies will be necessary, as will stakeholder input from researchers, repository managers, librarians, and others. ARL looks forward to continuing these conversations and building upon work already underway within groups such as the Confederation of Open Access Repositories, the Research Data Alliance, and the World Data System….”
Abstract: A group of publishers came together to discuss how we could reduce the complexity and inconsistency provided in publisher’s advice to researchers when selecting an appropriate data repository. It is a shared goal among publishers and other stakeholders to increase repository use – which remains far from optimal – and we assume that helping researchers find a suitable repository more easily will help achieve this.
To address this a list of features has been created and it is intended only as a framework within which publishers can make recommendations to researchers, not as a way to restrict which repositories researchers may choose for their data. Our intention is that the features we highlight will act to initiate engagement and collaboration among publishers, repositories and the RPOs, government and funders that ultimately make the policies around Open Research. As we start this conversation, it is important that we act together with other stakeholders to raise awareness of the challenges involved around FAIR data and to prevent any perverse consequences.
From the RDA FAIRsharing WG point of view, the ultimate objective is to map repository features across all existing initiatives, and to identify a common core set of metadata fields that all stakeholders want to see in registry of repositories. The FAIRsharing registry in particular is agnostic as to the selection process of standards, repositories and policies, as part of its commitment to working with and for all stakeholder groups.
“As a Data Architect, Sabrina is available to support DGHI in achieving their data sharing goals. She takes a holistic approach to identifying areas where the team needs data support. Considering at each stage of the project lifecycle how system design and data architecture will influence how data can be shared. This may entail drafting informed consent documents, developing strategies for de-identification, curating and managing data, or discovering solutions for data storage and publishing. For instance, in collaboration with CDVS Research Data Management Consultants, Sabrina has helped AMANI create a Dataverse to enable sharing restricted access health data for international junior researchers. Data from one of DGHI’s studies are also available in the Duke Research Data Repository….
Reproducibility is another reason that sharing and publishing data is important to Sabrina. DGHI wants to increase data availability in accordance with FAIR principles so other researchers can independently verify, reproduce, and iterate on their work. This supports peers and contributes to the advancement of the field. Publishing data in an open repository can also increase their reach and impact. DGHI is also currently examining how to incorporate the CARE principles and other frameworks for ethical data sharing within their international collaborations….”