Proceedings of the Workshop Exploring National Infrastructure for Public Access Usage and Impact Reporting

“Invited international experts and leading scholarly cyberinfrastructure representatives joined workshop organizers Christina Drummond and Charles Watkinson for an eight-hour facilitated workshop on April 2, 2023. Together they aimed to: ? identify the challenges preventing cross-platform public and open scholarship impact analytics at scale, ? explore open infrastructure opportunities to improve the findability, accessibility, interoperability, and reuse i.e. “FAIRness” of usage data, and ? identify what’s needed to scaffold America’s national infrastructure for scholarly output impact reporting in light of a) the August 2022 Office of Science and Technology Policy (OSTP) “Nelson Memo” regarding “Ensuring Free, Immediate, and Equitable Access to Federally Funded Research,” and b) the European Open Science Cloud Core and Interoperability Framework. Participants were encouraged to consider the challenges related to impact reporting and storytelling for research outputs ranging from data, articles, and books to simulations, 3D models, and other multimedia. The workshop objectives shared in advance of the meeting with participants were: ? identify what’s needed to scaffold America’s national infrastructure for scholarly output impact reporting, ? develop recommendations for national infrastructure and investment, and ? prioritize and begin to map out what activities we need to undertake next to support these recommendations. 1…”

Crowdsourced Science Pulls Off a Daring WWII Data Rescue – Eos

“In the early 1940s, sailors aboard U.S. Navy vessels recorded weather and sea conditions as they cruised the Pacific Ocean during World War II (WWII). After the war, most of these observations languished in classified logbooks for decades and were left out of data sets that formed the backbone of modern climate models. Recently, a crowdsourced science project recovered and digitized these now-declassified meteorological records, rescuing nearly 4 million observations that span a critical wartime data gap….”

Data sharing and reuse practices: disciplinary differences and improvements needed | Emerald Insight

Abstract:  Purpose

This study investigates differences and commonalities in data production, sharing and reuse across the widest range of disciplines yet and identifies types of improvements needed to promote data sharing and reuse.

Design/methodology/approach

The first authors of randomly selected publications from 2018 to 2019 in 20 Scopus disciplines were surveyed for their beliefs and experiences about data sharing and reuse.

Findings

From the 3,257 survey responses, data sharing and reuse are still increasing but not ubiquitous in any subject area and are more common among experienced researchers. Researchers with previous data reuse experience were more likely to share data than others. Types of data produced and systematic online data sharing varied substantially between subject areas. Although the use of institutional and journal-supported repositories for sharing data is increasing, personal websites are still frequently used. Combining multiple existing datasets to answer new research questions was the most common use. Proper documentation, openness and information on the usability of data continue to be important when searching for existing datasets. However, researchers in most disciplines struggled to find datasets to reuse. Researchers’ feedback suggested 23 recommendations to promote data sharing and reuse, including improved data access and usability, formal data citations, new search features and cultural and policy-related disciplinary changes to increase awareness and acceptance.

Originality/value

This study is the first to explore data sharing and reuse practices across the full range of academic discipline types. It expands and updates previous data sharing surveys and suggests new areas of improvement in terms of policy, guidance and training programs.

Full article: Data sharing and re-use in the traumatic stress field: An international survey of trauma researchers

“Background:

The FAIR data principles aim to make scientific data more Findable, Accessible, Interoperable, and Reusable. In the field of traumatic stress research, FAIR data practices can help accelerate scientific advances to improve clinical practice and can reduce participant burden. Previous studies have identified factors that influence data sharing and re-use among scientists, such as normative pressure, perceived career benefit, scholarly altruism, and availability of data repositories. No prior study has examined researcher views and practices regarding data sharing and re-use in the traumatic stress field.

Objective:

To investigate the perspectives and practices of traumatic stress researchers around the world concerning data sharing, re-use, and the implementation of FAIR data principles in order to inform development of a FAIR Data Toolkit for traumatic stress researchers.

Method:

A total of 222 researchers from 28 countries participated in an online survey available in seven languages, assessing their views on data sharing and re-use, current practices, and potential facilitators and barriers to adopting FAIR data principles.

Results:

The majority of participants held a positive outlook towards data sharing and re-use, endorsing strong scholarly altruism, ethical considerations supporting data sharing, and perceiving data re-use as advantageous for improving research quality and advancing the field. Results were largely consistent with prior surveys of scientists across a wide range of disciplines. A significant proportion of respondents reported instances of data sharing and re-use, but gold standard practices such as formally depositing data in established repositories were reported as infrequent. The study identifies potential barriers such as time constraints, funding, and familiarity with FAIR principles.

Conclusions:

These results carry crucial implications for promoting change and devising a FAIR Data Toolkit tailored for traumatic stress researchers, emphasizing aspects such as study planning, data preservation, metadata standardization, endorsing data re-use, and establishing metrics to assess scientific and societal impact.

Tracing data: A survey investigating disciplinary differences in data citation | Quantitative Science Studies | MIT Press

Abstract:  Data citations, or citations in reference lists to data, are increasingly seen as an important means to trace data reuse and incentivize data sharing. Although disciplinary differences in data citation practices have been well documented via scientometric approaches, we do not yet know how representative these practices are within disciplines. Nor do we yet have insight into researchers’ motivations for citing – or not citing – data in their academic work. Here, we present the results of the largest known survey (n = 2,492) to explicitly investigate data citation practices, preferences, and motivations, using a representative sample of academic authors by discipline, as represented in the Web of Science (WoS). We present findings about researchers’ current practices and motivations for reusing and citing data and also examine their preferences for how they would like their own data to be cited. We conclude by discussing disciplinary patterns in two broad clusters, focusing on patterns in the social sciences and humanities, and consider the implications of our results for tracing and rewarding data sharing and reuse.

 

Copyright Law in Academia (Urheberrecht in der Wissenschaft) | German Federal Ministry for Education and Research (BMBF)

Authors: Till Kreutzer and Georg Fischer, iRights.Law;

English abstract (via deepl.com):

The updated and completely revised handout “Urheberrecht in der Wissenschaft” (Copyright in Science) provides practical and comprehensible answers to typical questions on copyright for teaching and research.

These include, for example, the use of third-party materials or the creation and publication of one’s own copyright-protected works.

German original abstract:

Die aktualisierte und vollständig überarbeitete Handreichung “Urheberrecht in der Wissenschaft” beantwortet praxisnah und verständlich typische Fragen zum Urheberrecht für Lehre und Forschung.

Diese umfassen etwa die Verwendung von Materialien Dritter oder die Erstellung und Veröffentlichung eigener urheberrechtlich geschützter Werke.

Draft Vancouver Statement on CAD – Google Docs

“Since the Santa Barbara Statement on Collections as Data (2017) was published, engagement with collections as data has grown internationally. Institutions large and small, individually and collectively, have invested in developing, providing access to, and supporting responsible computational use of collections as data. An updated statement is needed in light of increased community implementation of collections as data in context of an ever more complex data landscape….”

Policy recommendations to ensure that research software is openly accessible and reusable | PLOS Biology

“To do this, we recommend:

As part of their updated policy plans submitted in response to the 2022 OSTP memo, US federal agencies should, at a minimum, articulate a pathway for developing guidance on research software sharing, and, at a maximum, incorporate research software sharing requirements as a necessary extension of any data sharing policy and a critical strategy to make data truly FAIR (as these principles have been adapted to apply to research software [12]).
As part of sharing requirements, federal agencies should specify that research software should be deposited in trusted, public repositories that maximize discovery, collaborative development, version control, long-term preservation, and other key elements of the National Science and Technology Council’s “Desirable Characteristics of Data Repositories for Federally Funded Research” [13], as adapted to fit the unique considerations of research software.
US federal agencies should encourage grantees to use non-proprietary software and file formats, whenever possible, to collect and store data. We realize that for some research areas and specialized techniques, viable non-proprietary software may not exist for data collection. However, in many cases, files can be exported and shared using non-proprietary formats or scripts can be provided to allow others to open files.
Consistent with the US Administration’s approach to cybersecurity [14], federal agencies should provide clear guidance on measures grantees are expected to undertake to ensure the security and integrity of research software. This guidance should encompass the design, development, dissemination, and documentation of research software. Examples include the National Institute of Standards and Technology’s secure software development framework and Linux Foundation’s open source security foundation.
As part of the allowable costs that grantees can request to help them meet research sharing requirements, US federal agencies should include reasonable costs associated with developing and maintaining research software needed to maximize data accessibility and reusability for as long as it is practical. Federal agencies should ensure that such costs are additive to proposal budgets, rather than consuming funds that would otherwise go to the research itself.
US federal agencies should encourage grantees to apply licenses to their research software that facilitate replication, reuse, and extensibility, while balancing individual and institutional intellectual property considerations. Agencies can point grantees to guidance on desirable criteria for distribution terms and approved licenses from the Open Source Initiative.
In parallel with the actions listed above that can be immediately incorporated into new public access plans, US federal agencies should also explore long-term strategies to elevate research software to co-equal research outputs and further incentivize its maintenance and sharing to improve research reproducibility, replicability, and integrity….”

Aligning data-sharing policies: Meeting the moment | Commentary and opinion | Features | PND

“To make data sharing easier and to establish a clear baseline for what well-considered data-sharing policies should encompass, we recommend that funders:

1. Clearly specify which data grantees are required to share. Do you want grantees to share only data underlying published studies or all data generated during the funded project? Do you want raw or pre-processed data? If qualitative (not just quantitative) data are also covered by your policy, do you provide guidance for grantees on good practices for sharing qualitative data?

2. Consider incorporating code- and software-sharing requirements as a necessary extension of their data-sharing policies. To be able to reproduce results accurately and build upon shared data, researchers must not only have access to the files but also the code and software used to open and analyze data. Only then are data truly findable, accessible, interoperable, and reusable. The ORFG and the Higher Education Leadership Initiative for Open Scholarship (HELIOS) have prepared a more detailed brief.

3. Clearly specify the required timing of data sharing. The timing will vary based on what data are to be shared and what constitutes the event that triggers the sharing requirement. If data underlie a published study, complying or aligning with new federal policies will require data to be shared immediately at the time of publication. If, however, the policy requires sharing of all data, then the timing may be tied to the award period (as the NIH requires).

4. Require grantees to deposit data in trusted public repositories that assign a persistent identifier (e.g., DOI), provide the necessary infrastructure to host and export quality metadata, implement strategies for long-term preservation, and otherwise meet the National Science and Technology Council’s Desirable Characteristics of Data Repositories. To make compliance easier for grantees, funders should provide a list of approved data repositories that meet these characteristics and are appropriate for the disciplines they fund.

5. Require grantees to share data under licenses that facilitate reuse. The recommended free culture license for data is the Creative Commons Public Domain Dedication (CC0). The reasoning behind this is two-fold: first, data do not always incur copyright and, therefore, reserving certain rights under other licenses may be inappropriate, and second, we should avoid attribution or license stacking that may occur as datasets are remixed and reused. Other options include the Creative Commons Attribution (CC BY) or ShareAlike (CC BY-SA) licenses.

6. Strongly encourage grantees to share data according to established best practices. These include, but are not limited to: a) the FAIR Principles, which outline how to share data so they are Findable, Accessible, Interoperable, and Reusable; b) the CARE Principles for Indigenous Data Governance, which emphasize the importance of Collective Benefit, Authority to Control, Responsibility, and Ethics in the context of Indigenous data, but could also inform the responsible management and sharing of data for other populations; and c) privacy rules, such as those provided under HIPAA. Funders should communicate that it is the responsibility of grantees to get the appropriate consent and ethical approval (e.g., from their institutional review board) that will allow them to collect and subsequently openly share de-identified data.

7. Allow grantees to include data sharing costs in their grant budgets. This could include costs associated with data management, curation, hosting, and long-term preservation. For many projects, data hosting costs will likely be minimal—several public repositories allow researchers to store significant amounts of data for free. For projects that will generate larger amounts of data, additional hosting costs can be budgeted. The most important cost may be the personnel time and expertise required to properly prepare data for sharing and reuse. Funders should consider increasing the allowable personnel costs to secure extra curation time for team

Wikipedia’s Moment of Truth – The New York Times, 18 July 2023

“…In late June, I began to experiment with a plug-in the Wikimedia Foundation had built for ChatGPT. At the time, this software tool was being tested by several dozen Wikipedia editors and foundation staff members, but it became available in mid-July on the OpenAI website for subscribers who want augmented answers to their ChatGPT queries. The effect is similar to the “retrieval” process that Jesse Dodge surmises might be required to produce accurate answers. GPT-4’s knowledge base is currently limited to data it ingested by the end of its training period, in September 2021. A Wikipedia plug-in helps the bot access information about events up to the present day. At least in theory, the tool — lines of code that direct a search for Wikipedia articles that answer a chatbot query — gives users an improved, combinatory experience: the fluency and linguistic capabilities of an A.I. chatbot, merged with the factuality and currency of Wikipedia….”

https://web.archive.org/web/20230718101549/https://www.nytimes.com/2023/07/18/magazine/wikipedia-ai-chatgpt.html

Extending Accessible Data to more articles, repositories, and outputs – The Official PLOS Blog

“In March 2022, with support from the Wellcome Trust, we launched an experimental “Accessible Data” feature designed to increase research data sharing and reuse. Having observed some interesting preliminary results, we’re extending – and extending the scope of – our “Accessible Data” experiment….

The Accessible Data icon rewards sharing data (and code) in a repository via a weblink. Best practice is sharing via a link-able persistent identifier, such as a DOI, but many PLOS articles link to data in other ways, such as via URLs or private links that are intended to be used for peer review only (a common problem for publishers). There is clearly work to do to improve consistency and practice of how data links are shared, but we decided to be inclusive in how we deploy the Accessible Data icon. It displays as long readers can access the data. We decided it was more important to help researchers as authors  – who may be unaware of the nuances of DOIs and private links – and also help them as readers, by including imperfect but functional links to data in our articles.”

Frontiers | Editorial: Opportunities and challenges in reusing public genomics data

“Genomics data is accumulating in public repositories at an ever-increasing rate. Large consortia and individual labs continue to probe animal and plant tissue and cell cultures, generating vast amounts of data using established and novel technologies. The human genome project kick started the era of systems biology (Lander et al., 2001; Gates et al., 2021). Ambitious projects followed to characterize non-coding regions, variations across species, and between populations (Feingold et al., 2004; Sabeti et al., 2007; Auton et al., 2015). The cost reduction allowed individual labs to generate numerous smaller high-throughput datasets (Edgar et al., 2002; Parkinson et al., 2007; Metzker, 2010; Leinonen et al., 2011). As a result, the scientific community should consider strategies to overcome the challenges and maximize the opportunities to use these resources for research and the public good. In this Research Topic, we have elicited opinions and perspectives from researchers in the field on the opportunities and challenges of reusing public genomics data. The articles in this Research Topic converge on the need for data sharing while acknowledging the challenges that come with it. Two articles defined and highlighted the distinction between data and metadata. The characteristic of each should be considered when designing optimal sharing strategies. One article focuses on the specific issues surrounding the sharing of genomics interval data, and another on balancing the need for protecting pediatric rights and the sharing benefits….”

Project Retain. Enabling the dissemination of knowledge. – SPARC Europe

“Europe has seen a significant growth in activity to establish and advance open access (OA) policies over the last decade. However, copyright has been the thorn in the side of many authors, funders, and their institutions who wish to publish OA, since many publisher policies and processes are no longer fit for purpose. 

Today, we require the rights to publish, share, adapt, and reuse material for research, educational, or multilingual needs….”

Opening Knowledge: Retaining Rights and Open Licensing in Europe | Zenodo

“This report investigates the current landscape of non-legislative policy practices affecting researchers and authors in the authors’ rights and licensing domain. It is an outcome of research conducted by Project Retain led by SPARC Europe, as part of the Knowledge Rights 21 programme. The report concludes with a set of recommendations for institutional policymakers, funders and legislators, and publishers. 

It is accompanied by the study dataset.

This project was funded by Arcadia – a charitable fund of Lisbet Rausing and Peter Baldwin.”

Crossref Research and Development: Releasing our Tools from the Ground Up – Crossref

“Why is this important? As many readers doubtless know, Crossref is committed to The Principles of Open Scholarly Infrastructure. For reasons of insurance, everything we do and newly develop is open source and we want our members to be able to re-use the software that we create. It’s also important because, if we centralize these low-level building blocks, we make it much easier to fix bugs when they occur, which would otherwise be distributed across all of our projects.

As a result, Crossref Labs has a series of small code libraries that we have released for various service interactions. We often find ourselves needing to interact with AWS services. Indeed, Crossref’s live systems are in the process of transitioning to running in the cloud, rather than our own data centre. It makes sense, therefore, for prototype Labs systems to run on this infrastructure, too. However, the boto3 library is not terribly Pythonic. As a result, many of our low-level tools interact with AWS. These include: …

I should also say that our openness is more than unidirectional. While we are putting a lot of effort into ensuring that everything new we put out is openly accessible, we are also open to contributions coming in. If we’ve built something and you make changes or improve it, please do get in touch or submit a pull request. Openness has to work both ways if projects are truly to be used by the community….”