“To make data sharing easier and to establish a clear baseline for what well-considered data-sharing policies should encompass, we recommend that funders:
1. Clearly specify which data grantees are required to share. Do you want grantees to share only data underlying published studies or all data generated during the funded project? Do you want raw or pre-processed data? If qualitative (not just quantitative) data are also covered by your policy, do you provide guidance for grantees on good practices for sharing qualitative data?
2. Consider incorporating code- and software-sharing requirements as a necessary extension of their data-sharing policies. To be able to reproduce results accurately and build upon shared data, researchers must not only have access to the files but also the code and software used to open and analyze data. Only then are data truly findable, accessible, interoperable, and reusable. The ORFG and the Higher Education Leadership Initiative for Open Scholarship (HELIOS) have prepared a more detailed brief.
3. Clearly specify the required timing of data sharing. The timing will vary based on what data are to be shared and what constitutes the event that triggers the sharing requirement. If data underlie a published study, complying or aligning with new federal policies will require data to be shared immediately at the time of publication. If, however, the policy requires sharing of all data, then the timing may be tied to the award period (as the NIH requires).
4. Require grantees to deposit data in trusted public repositories that assign a persistent identifier (e.g., DOI), provide the necessary infrastructure to host and export quality metadata, implement strategies for long-term preservation, and otherwise meet the National Science and Technology Council’s Desirable Characteristics of Data Repositories. To make compliance easier for grantees, funders should provide a list of approved data repositories that meet these characteristics and are appropriate for the disciplines they fund.
5. Require grantees to share data under licenses that facilitate reuse. The recommended free culture license for data is the Creative Commons Public Domain Dedication (CC0). The reasoning behind this is two-fold: first, data do not always incur copyright and, therefore, reserving certain rights under other licenses may be inappropriate, and second, we should avoid attribution or license stacking that may occur as datasets are remixed and reused. Other options include the Creative Commons Attribution (CC BY) or ShareAlike (CC BY-SA) licenses.
6. Strongly encourage grantees to share data according to established best practices. These include, but are not limited to: a) the FAIR Principles, which outline how to share data so they are Findable, Accessible, Interoperable, and Reusable; b) the CARE Principles for Indigenous Data Governance, which emphasize the importance of Collective Benefit, Authority to Control, Responsibility, and Ethics in the context of Indigenous data, but could also inform the responsible management and sharing of data for other populations; and c) privacy rules, such as those provided under HIPAA. Funders should communicate that it is the responsibility of grantees to get the appropriate consent and ethical approval (e.g., from their institutional review board) that will allow them to collect and subsequently openly share de-identified data.
7. Allow grantees to include data sharing costs in their grant budgets. This could include costs associated with data management, curation, hosting, and long-term preservation. For many projects, data hosting costs will likely be minimal—several public repositories allow researchers to store significant amounts of data for free. For projects that will generate larger amounts of data, additional hosting costs can be budgeted. The most important cost may be the personnel time and expertise required to properly prepare data for sharing and reuse. Funders should consider increasing the allowable personnel costs to secure extra curation time for team