DataWorks! Prize – Incentives for building a culture of data sharing and reuse – NIH Extramural Nexus

“A $500,000 prize purse, rewarding data sharing and reuse in biomedical research, is a new, innovative strategy for supporting the research community. The DataWorks! Prize highlights the role of data sharing and reuse in scientific discovery while recognizing and rewarding researchers who engage in these practices. This prize, which launched on May 11, 2022, is a partnership between the NIH Office of Data Science Strategy and the Federation of American Societies for Experimental Biology (FASEB)….”

Gearing Up for 2023 Part II: Implementing the NIH Data Management and Sharing Policy – NIH Extramural Nexus

“NIH has a long history of developing consent language and, as such, our team worked across the agency – and with you! – to develop a new resource that shares best practices for developing informed consents to facilitate data/biospecimen storage and sharing for future use.  It also provides modifiable sample language that investigators and IRBs can use to assist in the clear communication of potential risks and benefits associated with data/biospecimen storage and sharing.  In developing this resource, we engaged with key federal partners, as well as scientific societies and associations.  Importantly, we also considered the 102 comments from stakeholders in response to a RFI that we issued in 2021.

As for our second resource, we are requesting public comment on protecting the privacy of research participants when data is shared. I think I need to be upfront and acknowledge that we have issued many of these types of requests over the last several months and NIH understands the effort that folks take to thoughtfully respond.  With that said, we think the research community will greatly benefit from this resource and we want to hear your thoughts on whether it hits the mark or needs adjustment….”

DataWorks! Challenge | HeroX

“Share your story of how you reused or shared data to further your biological and/or biomedical research effort and get recognized!…

The Federation of American Societies for Experimental Biology (FASEB) and the National Institutes of Health (NIH) are championing a bold vision of data sharing and reuse. The DataWorks! Prize fuels this vision with an annual challenge that showcases the benefits of research data management while recognizing and rewarding teams whose research demonstrates the power of data sharing or reuse practices to advance scientific discovery and human health. We are seeking new and innovative approaches to data sharing and reuse in the fields of biological and biomedical research. 

To incentivize effective practices and increase community engagement around data sharing and reuse, the 2022 DataWorks! Prize will distribute up to 12 monetary team awards. Submissions will undergo a two-stage review process, with final awards selected by a judging panel of NIH officials. The NIH will recognize winning teams with a cash prize, and winners will share their stories in a DataWorks! Prize symposium.”

Using the State of Open Data survey to put the NIH Policy on Data Management and Sharing into practice

“Join us for a webinar on how the State of Open Data survey — the annual survey on researchers’ attitudes toward open data and data sharing — can help your institution put the NIH Policy on Data Management and Sharing into practice. …”

NIH issues a seismic mandate: share data publicly

“In January 2023, the US National Institutes of Health (NIH) will begin requiring most of the 300,000 researchers and 2,500 institutions it funds annually to include a data-management plan in their grant applications — and to eventually make their data publicly available.

Researchers who spoke to Nature largely applaud the open-science principles underlying the policy — and the global example it sets. But some have concerns about the logistical challenges that researchers and their institutions will face in complying with it. Namely, they worry that the policy might exacerbate existing inequities in the science-funding landscape and could be a burden for early-career scientists, who do the lion’s share of data collection and are already stretched thin….

Such a seismic shift in practice has left some researchers worried about the amount of work that the mandate will require when it becomes effective….

Others worry that data-management activities will further sap funds from under-resourced labs. Although the policy outlines certain fees that researchers can add to their proposed budgets to offset the costs of compliance with the mandate, it doesn’t specify what criteria the NIH will use to grant these requests….

Despite its potential pitfalls, Ross thinks that the policy will have a ripple effect that will persuade smaller funding agencies and industry to adopt similar changes. “This policy establishes what people expect from clinical research,” he says. “It’s essentially saying the culture of research needs to change.” ”

NOT-OD-22-029: Request for Information on Proposed Updates and Long-Term Considerations for the NIH Genomic Data Sharing Policy

“Respect for and protection of the interests of research participants are central tenets of the NIH GDS Policy and are fundamental to NIH’s stewardship of large-scale genomic data. Data derived from human research participants under the GDS Policy must be de-identified and provided with a random, unique code, the key to which is held by the submitting institution. NIH acknowledges that the concept of “identifiability” is a matter of ongoing deliberation within the scientific and bioethics communities. NIH relies on robust protections beyond de-identification, such as Institutional Review Board (IRB) consideration of risks associated with data submission, designating controlled access for certain data types, use of Data Access Committees to review requests, data use agreements to prohibit data disclosure and participant re-identification, and Certificates of Confidentiality[ii] to prohibit disclosure. As outlined in the NIH GDS Policy, the criteria for establishing de-identification are:

Identities of research participants cannot be readily ascertained or otherwise associated with the data by the repository staff or secondary data users (45 CFR 46.102(e) (Federal Policy for the Protection of Human Subjects); and
18 identifiers enumerated at 45 CFR 164.514(b)(2)(the HIPAA Privacy Rule) are removed.

The reliance on the 18 identifiers enumerated at 45 CFR 164.514(b)(2) (the HIPAA Privacy Rule) as the only acceptable method under the GDS Policy for de-identification has recently presented several challenges. Certain data elements considered potentially identifiable, such as date ranges shorter than a year, may have scientific utility, especially when studying disease progression (e.g., with COVID-19) or higher resolution location data than the regulatory standard (e.g., full ZIP codes or mobile location data), which may be valuable for studying the social determinants of health or environmental risk.

Challenges have also arisen recently around data linkage. It is difficult to know in advance which data sources may add scientific value when combined, so it is not always possible to tell participants about data linkage during their initial consent. Linking data refers to connecting two or more data sources (often multiple studies) to bring together information about a person, enabling researchers to learn more about a participant or small group of participants. For example, a participant might enroll in a study that uses their electronic health record as well as a separate study that uses a sample of their blood, and the data about them from those studies could later be linked in new research for more powerful analyses. This challenge in prospectively informing participants about data linkage raises questions about respecting individuals’ autonomy and what participants understand about how their data will be used. Furthermore, data from multiple sources may not have been obtained under the same consent and de-identification expectations as the GDS Policy….”

NOT-OD-22-029: Request for Information on Proposed Updates and Long-Term Considerations for the NIH Genomic Data Sharing Policy

“Respect for and protection of the interests of research participants are central tenets of the NIH GDS Policy and are fundamental to NIH’s stewardship of large-scale genomic data. Data derived from human research participants under the GDS Policy must be de-identified and provided with a random, unique code, the key to which is held by the submitting institution. NIH acknowledges that the concept of “identifiability” is a matter of ongoing deliberation within the scientific and bioethics communities. NIH relies on robust protections beyond de-identification, such as Institutional Review Board (IRB) consideration of risks associated with data submission, designating controlled access for certain data types, use of Data Access Committees to review requests, data use agreements to prohibit data disclosure and participant re-identification, and Certificates of Confidentiality[ii] to prohibit disclosure. As outlined in the NIH GDS Policy, the criteria for establishing de-identification are:

Identities of research participants cannot be readily ascertained or otherwise associated with the data by the repository staff or secondary data users (45 CFR 46.102(e) (Federal Policy for the Protection of Human Subjects); and
18 identifiers enumerated at 45 CFR 164.514(b)(2)(the HIPAA Privacy Rule) are removed.

The reliance on the 18 identifiers enumerated at 45 CFR 164.514(b)(2) (the HIPAA Privacy Rule) as the only acceptable method under the GDS Policy for de-identification has recently presented several challenges. Certain data elements considered potentially identifiable, such as date ranges shorter than a year, may have scientific utility, especially when studying disease progression (e.g., with COVID-19) or higher resolution location data than the regulatory standard (e.g., full ZIP codes or mobile location data), which may be valuable for studying the social determinants of health or environmental risk.

Challenges have also arisen recently around data linkage. It is difficult to know in advance which data sources may add scientific value when combined, so it is not always possible to tell participants about data linkage during their initial consent. Linking data refers to connecting two or more data sources (often multiple studies) to bring together information about a person, enabling researchers to learn more about a participant or small group of participants. For example, a participant might enroll in a study that uses their electronic health record as well as a separate study that uses a sample of their blood, and the data about them from those studies could later be linked in new research for more powerful analyses. This challenge in prospectively informing participants about data linkage raises questions about respecting individuals’ autonomy and what participants understand about how their data will be used. Furthermore, data from multiple sources may not have been obtained under the same consent and de-identification expectations as the GDS Policy….”

Home – NIH ODSS Search Workshop

“The goal of the Workshop is to explore current capabilities, gaps and opportunities for global data search across the data ecosystem. Workshop will explore selected science drivers across these main themes:

Using search to build cohorts: finding data across different platforms/repositories using patient attributes in order to create a cohort of patients for clinical analysis
Using search to find relevant data & repositories: finding data & repositories in order to access and analyze the data further, including its use for creating computational models.
Using search for (complex) information retrieval: answering specific questions without the additional burden of data download or analysis…”

Home – NIH ODSS Search Workshop

“The goal of the Workshop is to explore current capabilities, gaps and opportunities for global data search across the data ecosystem. Workshop will explore selected science drivers across these main themes:

Using search to build cohorts: finding data across different platforms/repositories using patient attributes in order to create a cohort of patients for clinical analysis
Using search to find relevant data & repositories: finding data & repositories in order to access and analyze the data further, including its use for creating computational models.
Using search for (complex) information retrieval: answering specific questions without the additional burden of data download or analysis…”

NIH-Wide Strategic Plan: Fiscal Years 2021-2025

“NIH is committed to making findings from the research that it funds accessible and available in a timely manner, while also providing safeguards for privacy, intellectual property, security, and data management. For instance, NIH-funded investigators are expected to make the results and accomplishments of their activities freely available within 12 months of publication. NIH also encourages investigators to share results prior to peer review, such as through preprints, to speed the dissemination of their findings and enhance the rigor of their work through informal peer review. A robust culture of data sharing is critical to continued progress in science, maximizing NIH’s investment in research, and assurance of the highest levels of transparency and rigor. To this end, NIH will continue to promote opportunities for data management and sharing while allowing flexibility for various data types, sharing platforms, and strategies. Additionally, NIH is implementing a policy requiring that all applications include data sharing and management plans that consider input from stakeholders….”