Metrics for Data Repositories and Knowledgebases: Working Group Report | Data Science at NIH

“The National Institutes of Health (NIH) Data Resources Lifecycle and Metrics Working Group and Metrics for Repositories (MetRe) subgroup have released “Metrics for Data Repositories and Knowledgebases: A Working Group Report”. This report presents the findings of an exploration of the current landscape of biomedical data repository metrics. The work was carried out in two phases consisting of a small pilot (phase 1) and a public survey (phase 2).

Below is an excerpt from the report:

“This report includes input from representatives of 13 NIH repositories from Phase 1 and 92 repository managers in Phase 2. The metrics these respondents reported using are divided into several broad categories, including (from most to least commonly collected) User Behavior Characteristics, Scientific Contribution/Impact, and Repository Operations, and the respondents from the two groups reported similar patterns in the metrics they collect. The majority of respondents (77%) also indicated a willingness to share their metrics data – an encouraging finding given that such metrics can be helpful to NIH in better understanding how datasets and repositories are used.” …”

Enhancing transparency through open government data: the case of data portals and their features and capabilities | Emerald Insight

Abstract:  Purpose

The purpose of this paper was to draw on evidence from computer-mediated transparency and examine the argument that open government data and national data infrastructures represented by open data portals can help in enhancing transparency by providing various relevant features and capabilities for stakeholders’ interactions.


The developed methodology consisted of a two-step strategy to investigate research questions. First, a web content analysis was conducted to identify the most common features and capabilities provided by existing national open data portals. The second step involved performing the Delphi process by surveying domain experts to measure the diversity of their opinions on this topic.


Identified features and capabilities were classified into categories and ranked according to their importance. By formalizing these feature-related transparency mechanisms through which stakeholders work with data sets we provided recommendations on how to incorporate them into designing and developing open data portals.

Social implications

The creation of appropriate open data portals aims to fulfil the principles of open government and enables stakeholders to effectively engage in the policy and decision-making processes.


By analyzing existing national open data portals and validating the feature-related transparency mechanisms, this paper fills this gap in existing literature on designing and developing open data portals for transparency efforts.

“Qualitative Data Repository’s Curation Handbook” by Robert Demgenski, Sebastian Karcher et al.

Abstract:  In this short practice paper, we introduce the public version of the Qualitative Data Repository’s (QDR) Curation Handbook. The Handbook documents and structures curation practices at QDR. We describe the background and genesis of the Handbook and highlight some of its key content.


OSF Preprints | A survey of funders’ and institutions’ needs for understanding researchers’ open research practices

Abstract:  A growing number of research-performing organisations (institutions) and funding agencies have policies that support open research practices — sharing of research data, code and software. However, funders and institutions lack sufficient tools, time or resources to monitor compliance with these policies.

  To better understand funder and institution needs related to understanding open research practices of researchers, we targeted funders and institutions with a survey in 2020 and received 122 completed responses. Our survey assessed and scored, (from 0-100), the importance of and satisfaction with 17 factors associated with understanding open research practices. This includes things such as knowing if a research paper includes links to research data in a repository; knowing if a research grant made code available in a public repository; knowing if research data were made available in a reusable form; and knowing reasons why research data are not publicly available. Half of respondents had tried to evaluate researchers’ open research practices in the past and 78% plan to do this in the future. The most common method used to find out if researchers are practicing open research was personal contact with researchers and the most common reason for doing it was to increase their knowledge of researchers’ sharing practices (e.g. determine current state of sharing; track changes in practices over time; compare different departments/disciplines). The results indicate that nearly all of the 17 factors we asked about in the survey were underserved. The mean importance of all factors to respondents was 71.7, approaching the 75 threshold of “very important”. The average satisfaction of all factors was 41.3, indicating a negative level of satisfaction with ability to complete these tasks. The results imply an opportunity for better solutions to meet these needs. The growth of policies and requirements for making research data and code available does not appear to be matched with solutions for determining if these policies have been complied with. We conclude that publishers can better support some of the needs of funders and institutions by introducing simple solutions such as: – Mandatory data availability statements (DAS) in research articles – Not permitting generic “data available on request” statements – Enabling and encouraging the use of data repositories and other methods that make data available in a more reusable way – Providing visible links to research data on publications – Making information available on data and code sharing practices in publications available to institutions and funding agencies – Extending policies that require transparency in sharing of research data, to sharing of code

How can publishers better meet the open research needs of funders and institutions?

“Publishers investing in simple solutions in their workflows can help to better meet the needs of funders and institutions who wish to support open research practices, research released this week by PLOS concludes.

Policies can be an effective solution for changing research culture and practice. A growing number of research-performing organisations (institutions) and funding agencies have policies that support open research practices — sharing of research data, code and software — as do publishers. Seeking to deepen our understanding of funder and institution needs related to open research, we surveyed more than 100 funders and institutions in 2020. We wanted to know if they are evaluating how researchers share data and code, how they are doing it, why they are doing it, and how satisfied they are with their ability to get these tasks done. Our results are available as a preprint along with an anonymised dataset….

Simple solutions more publishers could provide include:

Mandatory Data Availability Statements (DAS) in all relevant publications.
Across the STM industry around 15% of papers include a DAS. Since we introduced our data availability policy in 2014, 100% of PLOS research articles include a DAS.
Supporting researchers to provide information on why research data (and code) are not publicly available with their publications.
Time and again “data available on request” has been shown to be ineffective at supporting new research — and is not permitted in PLOS journals. 
Enabling and encouraging the use of data repositories.
Recommending the use of data repositories is a useful step, but making them easily and freely accessible — integrated into the publishing process — can be even more effective. Rates of repository use are higher in journals that partner closely with repositories and remove cost barriers to their use.
Providing visible links to research data on publications. Many researchers also struggle to find data they can reuse, hence PLOS will soon be experimenting with improving this functionality in our articles, and integrating the Dryad repository with submission….”


Four key challenges in the open?data revolution – Salguero?Gómez – 2021 – Journal of Animal Ecology – Wiley Online Library

Abstract:  In Focus: Culina, A., Adriaensen, F., Bailey, L. D., et al. (2021) Connecting the data landscape of long-term ecological studies: The SPI-Birds data hub. Journal of Animal Ecology, Long-term, individual-based datasets have been at the core of many key discoveries in ecology, and calls for the collection, curation and release of these kinds of ecological data are contributing to a flourishing open-data revolution in ecology. Birds, in particular, have been the focus of international research for decades, resulting in a number of uniquely long-term studies, but accessing these datasets has been historically challenging. Culina et al. (2021) introduce an online repository of individual-level, long-term bird records with ancillary data (e.g. genetics), which will enable key ecological questions to be answered on a global scale. As well as these opportunities, however, we argue that the ongoing open-data revolution comes with four key challenges relating to the (1) harmonisation of, (2) biases in, (3) expertise in and (4) communication of, open ecological data. Here, we discuss these challenges and how key efforts such as those by Culina et al. are using FAIR (Findable, Accessible, Interoperable and Reproducible) principles to overcome them. The open-data revolution will undoubtedly reshape our understanding of ecology, but with it the ecological community has a responsibility to ensure this revolution is ethical and effective.




FAIRsFAIR Repository Support Series Webinars | FAIRsFAIR

“FAIRsFAIR webinar series aims to help repository managers become familiar with FAIR-enabling practices. Each webinar will provide an overview of a specific FAIR-enabling activity, share information on recent developments within FAIRsFAIR and other initiatives as well as offering examples of good practice, practical tips and recommendations. Each webinar will last a maximum in 1.5 hours and include time for questions and discussion. Registration is free and open to all however the main audience is repository managers and service providers. Data stewards and developers may also find the session informative.”

IEEE – IEEE and Edge Announce Partnership to Enhance Research Data Management and Collaboration with IEEE DataPort

“Edge, a nonprofit research and education network and technology partner, has announced a partnership with IEEE, the world’s largest technical professional organization dedicated to advancing technology for humanity. The two organizations will collaborate to offer increased awareness of institutional subscriptions to IEEE DataPort — a web-based, cloud services platform supporting the data-related needs of the global technical community — making it available to academic, government, and not-for-profit institutions across the United States.

IEEE DataPort provides a unified data and collaboration platform which researchers can leverage to efficiently store, share, access, and manage research data, accelerating institutional research efforts. Researchers at subscribing institutions will gain access to the more than 2,500 research datasets available on the platform and the ability to collaborate with more than 1.25 million IEEE DataPort users worldwide. The platform also enables institutions to meet funding agency requirements for the use of and sharing of data….”

Characteristics of available studies and dissemination of research using major clinical data sharing platforms – Enrique Vazquez, Henri Gouraud, Florian Naudet, Cary P Gross, Harlan M Krumholz, Joseph S Ross, Joshua D Wallach, 2021

Abstract:  Background/Aims:

Over the past decade, numerous data sharing platforms have been launched, providing access to de-identified individual patient-level data and supporting documentation. We evaluated the characteristics of prominent clinical data sharing platforms, including types of studies listed as available for request, data requests received, and rates of dissemination of research findings from data requests.


We reviewed publicly available information listed on the websites of six prominent clinical data sharing platforms: Biological Specimen and Data Repository Information Coordinating Center,, Project Data Sphere, Supporting Open Access to Researchers–Bristol Myers Squibb, Vivli, and the Yale Open Data Access Project. We recorded key platform characteristics, including listed studies and available supporting documentation, information on the number and status of data requests, and rates of dissemination of research findings from data requests (i.e. publications in a peer-reviewed journals, preprints, conference abstracts, or results reported on the platform’s website).


The number of clinical studies listed as available for request varied among five data sharing platforms: Biological Specimen and Data Repository Information Coordinating Center (n?=?219), (n?=?2,897), Project Data Sphere (n?=?154), Vivli (n?=?5426), and the Yale Open Data Access Project (n?=?395); Supporting Open Access to Researchers did not provide a list of Bristol Myers Squibb studies available for request. Individual patient-level data were nearly always reported as being available for request, as opposed to only Clinical Study Reports (Biological Specimen and Data Repository Information Coordinating Center?=?211/219 (96.3%); (99.6%); Project Data Sphere?=?154/154 (100.0%); and the Yale Open Data Access Project?=?355/395 (89.9%)); Vivli did not provide downloadable study metadata. Of 1201 data requests listed on, Supporting Open Access to Researchers–Bristol Myers Squibb, Vivli, and the Yale Open Data Access Project platforms, 586 requests (48.8%) were approved (i.e. data access granted). The majority were for secondary analyses and/or developing/validating methods ( (83.7%); Supporting Open Access to Researchers–Bristol Myers Squibb?=?22/30 (73.3%); Vivli?=?63/84 (75.0%); the Yale Open Data Access Project?=?111/159 (69.8%)); four were for re-analyses or corroborations of previous research findings ( (1.0%) and the Yale Open Data Access Project?=?1/159 (0.6%)). Ninety-five (16.1%) approved data requests had results disseminated via peer-reviewed publications ( (19.5%); Supporting Open Access to Researchers–Bristol Myers Squibb?=?3/30 (10.0%); Vivli?=?4/84 (4.8%); the Yale Open Data Access Project?=?27/159 (17.0%)). Forty-two (6.8%) additional requests reported results through preprints, conference abstracts, or on the platform’s website ( (3.8%); Supporting Open Access to Researchers–Bristol Myers Squibb?=?3/30 (10.0%); Vivli?=?2/84 (2.4%); Yale Open Data Access Project?=?25/159 (15.7%)).


Across six prominent clinical data sharing platforms, information on studies and request metrics varied in availability and format. Most data requests focused on secondary analyses and approximately one-quarter of all approved requests publicly disseminated their results. To further promote the use of shared clinical data, platforms should increase transparency, consistently clarify the availability of the listed studies and supporting documentation, and ensure that research findings from data requests are disseminated.

Dryad appoints Jennifer Gibson as Executive Director | Dryad news and views

“Dryad, the open-access repository and curation service for international research data, has announced that Jennifer Gibson (née McLennan) will join as Executive Director this October. An accomplished non-profit executive and open science advocate, Gibson’s leadership will help Dryad navigate a time of ambitious growth and transformation. …”

Sharing data to fuel discovery | VTx | Virginia Tech

“The University Libraries provides expertise in data planning, management, and publishing to fuel discovery and future research. Recently, the library launched a new version of its research data repository platform, powered by Figshare. 

Accessible from anywhere, Figshare is a cloud-based platform for storing, sharing, and citing research data. Virginia Tech researchers can upload their research data and receive a digital object identifier (DOI) for citing the data in publications and meet sponsor requirements for openly available data. Data uploaded to the Virginia Tech research data repository is discoverable in search engines, including Google Scholar and Google Dataset Search. Engagement and impact of the research can be tracked through views, downloads, citations, and Altmetric usage tracking.  …”

Promoting FAIR Data Through Community-driven Agile Design: the Open Data Commons for Spinal Cord Injury ( | SpringerLink

Abstract:  The past decade has seen accelerating movement from data protectionism in publishing toward open data sharing to improve reproducibility and translation of biomedical research. Developing data sharing infrastructures to meet these new demands remains a challenge. One model for data sharing involves simply attaching data, irrespective of its type, to publisher websites or general use repositories. However, some argue this creates a ‘data dump’ that does not promote the goals of making data Findable, Accessible, Interoperable and Reusable (FAIR). Specialized data sharing communities offer an alternative model where data are curated by domain experts to make it both open and FAIR. We report on our experiences developing one such data-sharing ecosystem focusing on ‘long-tail’ preclinical data, the Open Data Commons for Spinal Cord Injury ( ODC-SCI was developed with community-based agile design requirements directly pulled from a series of workshops with multiple stakeholders (researchers, consumers, non-profit funders, governmental agencies, journals, and industry members). ODC-SCI focuses on heterogeneous tabular data collected by preclinical researchers including bio-behaviour, histopathology findings and molecular endpoints. This has led to an example of a specialized neurocommons that is well-embraced by the community it aims to serve. In the present paper, we provide a review of the community-based design template and describe the adoption by the community including a high-level review of current data assets, publicly released datasets, and web analytics. Although is in its late beta stage of development, it represents a successful example of a specialized data commons that may serve as a model for other fields.