Abstract: Software and data citation are emerging best practices in scholarly communication. This article provides structured guidance to the academic publishing community on how to implement software and data citation in publishing workflows. These best practices support the verifiability and reproducibility of academic and scientific results, sharing and reuse of valuable data and software tools, and attribution to the creators of the software and data. While data citation is increasingly well-established, software citation is rapidly maturing. Software is now recognized as a key research result and resource, requiring the same level of transparency, accessibility, and disclosure as data. Software and data that support academic or scientific results should be preserved and shared in scientific repositories that support these digital object types for discovery, transparency, and use by other researchers. These goals can be supported by citing these products in the Reference Section of articles and effectively associating them to the software and data preserved in scientific repositories. Publishers need to markup these references in a specific way to enable downstream processes.
Category Archives: oa.data
How Open Science Unlocks Scientific Details: A Journey of Discovery – Open and Universal Science (OPUS) Project
“By fostering transparency, collaboration, and accessibility, Open Science has unleashed the potential for researchers to delve deeper into the heart of scientific phenomena. In this article, we will explore how Open Science is instrumental in unraveling scientific intricacies and ushering in a new era of discovery….”
Responsible data sharing: Identifying and remedying possible re-identification of human participants
Abstract: Open data collected from humans creates a tension between scholarly values of transparency and sharing on the one hand, and privacy and security on the other. A common solution is to make datasets anonymous by removing personally identifying information before sharing. However, ostensibly anonymized datasets may be at risk of re-identification if they include demographic information. In the present article, we (a) review current privacy standards; (b) describe computer science data protection frameworks and their adaptability to the social sciences; (c) provide practical guidance for assessing and addressing re-identification risk; (d) introduce two open-source algorithms – MinBlur and MinBlurLite – to increase privacy while maintaining the integrity of open data; and (e) highlight aspects of ethical data sharing that require further attention. Technical innovations can support competing values so that science can be as open as possible to promote transparency and sharing, and as closed as necessary to maintain privacy and security.
FACT SHEET: As Part of President Biden’s Unity Agenda, White House Cancer Moonshot Announces New Actions and Commitments to End Cancer as We Know It | The White House
“Today’s announcements from the Biden Cancer Moonshot include: …A new “biomedical data fabric toolbox” to advance cancer research progress. ARPA-H is partnering with the National Institutes of Health, the National Cancer Institute (NCI), and other agencies to develop a new Biomedical Data Fabric Toolbox for Cancer. Starting with cancer datasets, this program represents the first step toward transforming data accessibility across all medical domains…”
NOAA Open Data Dissemination: Petabyte-scale Earth system data in the cloud | Science Advances
Abstract: NOAA Open Data Dissemination (NODD) makes NOAA environmental data publicly and freely available on Amazon Web Services (AWS), Microsoft Azure (Azure), and Google Cloud Platform (GCP). These data can be accessed by anyone with an internet connection and span key datasets across the Earth system including satellite imagery, radar, weather models and observations, ocean databases, and climate data records. Since its inception, NODD has grown to provide public access to more than 24 PB of NOAA data and can support billions of requests and petabytes of access daily. Stakeholders routinely access more than 5 PB of NODD data every month. NODD continues to grow to support open petabyte-scale Earth system data science in the cloud by onboarding additional NOAA data and exploring performant data formats. Here, we document how this program works with a focus on provenance, key datasets, and use. We also highlight how to access these data with the goal of accelerating use of NOAA resources in the cloud.
Umbrella Data Management Plans to Integrate FAIR Data: Lessons From the ISIDORe and BY-COVID Consortia for Pandemic Preparedness – Data Science Journal
Abstract: The Horizon Europe project ISIDORe is dedicated to pandemic preparedness and responsiveness research. It brings together 17 research infrastructures (RIs) and networks to provide a broad range of services to infectious disease researchers. An efficient and structured treatment of data is central to ISIDORe’s aim to furnish seamless access to its multidisciplinary catalogue of services, and to ensure that users’ results are treated FAIRly. ISIDORe therefore requires a data management plan (DMP) covering both access management and research outputs, applicable over a broad range of disciplines, and compatible with the constraints and existing practices of its diverse partners. Here, we describe how, to achieve that aim, we undertook an iterative, step-by-step, process to build a community-approved living document, identifying good practices and processes, on the basis of use cases, presented as proof of concepts. International fora such as the RDA and EOSC, and primarily the BY-COVID project, furnished registries, tools and online data platforms, as well as standards, and the support of data scientists. Together, these elements provide a path for building an umbrella, FAIR-compliant DMP, aligned as fully as possible with FAIR principles, which could also be applied as a framework for data management harmonisation in other large-scale, challenge-driven projects. Finally, we discuss how data management and reuse can be further improved through the use of knowledge models when writing DMPs and, how, in the future, an inter-RI network of data stewards could contribute to the establishment of a community of practice, to be integrated subsequently into planned trans-RI competence centres.
OA Book Usage Data Trust | Data Quality Community Consultation
OA Book Usage Data Trust | Data Quality Community Consultation
Provide your feedback by the 15th of October via this form: https://forms.gle/3yRb7QfcodndHiTW9
About our work: This effort enables the community-governed sharing of quality, interoperable, Open Access (e)Book Usage (OAEBU) data. Our governance abides by Guiding Principles for OA Book Usage Data Services as we develop a data space for public and private organizations that create, combine, and innovate with OA book usage data. Our Board of Trustees is a working board composed of elected Trustees led by an executive committee of board officers. As noted in the data trust’s governance documentation, trustees share responsibility for: strategy and direction setting, fiduciary oversight of project financials and fiscal sponsorship arrangements, and supervision of the effort’s executive director.
Community Consultation Background: Multiple challenges exist for those who would like to share generated OA book usage data with others (data creators), and for those who rely on such data to provide reporting or services analytics (data users). Currently, individual organizations encounter challenges when aggregating OA book usage data to make strategic decisions about their OA publishing and OA programs. They individually manage, compile, and link usage data metrics making it time-consuming. Additionally, they may face resource challenges in adopting COUNTER, instead relying on tools such as Google Analytics that further complicate usage data interoperability. While many deliver audited, trusted usage metrics through COUNTER-compliant reports, aggregating usage metrics across platforms is not common. Finally, some organizations are often unable to provide their raw usage data to competitors due to dynamics that cannot be resolved by trust alone.
Such challenges extend beyond scholarly communications. To address such issues across industries in an interoperable way, European agencies have fostered International Data Spaces (IDS) infrastructure for data sharing through a neutral intermediary that is as open as possible, but as controlled as necessary. The IDS aim is to provide a digital infrastructure to foster the exchange and computation of data among public and private competitors, to generate value through: 1) increased interoperability, 2) trust in secure and transparent exchange, and 3) multiparty data governance through usage controls and community-based accountability measures.
The Open Access Book Usage Data Trust (OAEBUDT) is piloting the Industry Data Space (IDS) model in scholarly communications, building upon emerging design principles, technical architectures, and standards. Supported by the Mellon Foundation, the OAEBUDT is developing ‘Governance Building Blocks’ for its IDS, to guide the rules and accountability measures for OAEBUDT participation. To inform the development of model standard contractual clauses, this project is developing an OA Book Usage Data Exchange and Stewardship Rulebook to specify the principles that generate trust through OAEBUDT participation, usage data management, processing, and provision.
You can preview this consultation in full prior to submitting your comments.
Community Consultation | Participant Notice
In this community consultation, we invite feedback on principles drafted to ensure trust in the provided usage data and its quality notification process via the OAEBUDT.
Submitted comments will be discussed among the OAEBUDT project team, including advisors and the OAEBUDT community governance. Unattributed submissions will also be published on the project’s Zenodo community. By submitting comments through this form you agree to the publication, sharing and reuse of your comments under a CC0 license or CCBY license.
Contact: For any questions please contact Ursula Rabar, OAEBUDT Community Manager, at ursula.rabar@operas-eu.org.
Key Exploitable Results of Skills3EOSC
“The project will directly support EOSC Partnership Specific Objective 1.2. Professional data stewards are available in research-performing organisations in Europe to support Open Science, measured through two specific KPIs
KPI (2025) European curricula for data stewards are defined
KPI (2027) All research done by EOSC Association members is supported by professional data stewards.
Skills4EOSC actions address the three gaps identified in the EOSC SRIA concerning skills and training: a lack of Open Science and data expertise, a lack of clearly defined data professional profiles and career paths for these roles, and fragmentation in training resources….”
University of Sussex connects Figshare to Symplectic Elements to create a joined-up research data management solution – Symplectic
“Digital Science, a technology company serving stakeholders across the research ecosystem, is pleased to announce that the University of Sussex has successfully integrated Figshare and Symplectic Elements from Digital Science’s flagship products to create a seamless, interoperable research information and data management solution….
Sussex has been using Symplectic Elements as its Current Research Information System (CRIS) since 2020, initially integrated with EPrints as its institutional repository (called Sussex Research Online, or SRO). In 2022, Sussex took the decision to migrate SRO from EPrints to Figshare in order to create a more joined-up solution to support its Open Access needs. Moving to a full Figshare institutional repository supports the streamlining of IT Services and also enables repository staff teams to be more flexible as they work with Figshare alone, as opposed to two varying systems for papers and data….”
Open Science and Policy Interface: The Tanzania Perspective | East African Journal of Science, Technology and Innovation
Abstract: The 21st century has seen a paradigm shift in scholarly communication, with digital technology changing the entire process of the scholarly communication lifecycle. As the cost of online reference materials for research continues to rise and restrictive conditions persist, global academic and research communities are pursuing countermeasures to make knowledge equitable and accessible. This is made possible through the Open Science (OS) movement that aims to make knowledge accessible to researchers and citizens irrespective of their technical or financial capability. This paper explores open science to ascertain the status of open science practices in Tanzania. The paper highlights the policy interfaces and frameworks that favor open science practices in research endeavors. Also, it provides a baseline for understanding the situation to inform scientific research and education communities about the status of open science and possible areas of intervention. Open science is still in its infancy, although certain steps have been taken in adopting it for example the adoption of open access practices, including the creation of institutional repositories and the adoption of policies that direct its implementation. Additionally, the implementation of open data practices has been quite slow. Also, researchers and organizations in Tanzania are gradually adopting open data practices. Currently, some academic institutions, particularly public universities, have adopted and used open journal publishing systems, particularly the online journal system (OJS). The published journal articles through journal systems are freely accessible online like other open-access content, however, the journals are not yet registered in the Directory of Open Access Journals (DOAJ) despite the fact that some are already indexed in different abstracting services such as Africa Journal Online (AJOL) and they have Digital Object Identifiers (DOI). The policy interface of open science needs to be harmonized and COSTECH is strategically positioned to take the lead.
Social Media Archive (SOMAR)
“The Social Media Archive (SOMAR) is a revolutionary initiative and data resource that makes social media data accessible and useful to researchers like never before. Housed in the Inter-university Consortium for Political and Social Research (ICPSR) at the University of Michigan Institute for Social Research (ISR), SOMAR will democratize access to some of the most consequential information in contemporary society. From the emotional well-being of local youth to the outcomes of global political processes, social media play a critical, but poorly understood, role in socio-political life. Social media platforms contain a treasure trove of information that can help us better understand human behavior, social systems, and psychological processes. At ISR and ICPSR, it is our imperative to shed light on these processes.”
Full article: Data sharing and re-use in the traumatic stress field: An international survey of trauma researchers
“Background:
The FAIR data principles aim to make scientific data more Findable, Accessible, Interoperable, and Reusable. In the field of traumatic stress research, FAIR data practices can help accelerate scientific advances to improve clinical practice and can reduce participant burden. Previous studies have identified factors that influence data sharing and re-use among scientists, such as normative pressure, perceived career benefit, scholarly altruism, and availability of data repositories. No prior study has examined researcher views and practices regarding data sharing and re-use in the traumatic stress field.
Objective:
To investigate the perspectives and practices of traumatic stress researchers around the world concerning data sharing, re-use, and the implementation of FAIR data principles in order to inform development of a FAIR Data Toolkit for traumatic stress researchers.
Method:
A total of 222 researchers from 28 countries participated in an online survey available in seven languages, assessing their views on data sharing and re-use, current practices, and potential facilitators and barriers to adopting FAIR data principles.
Results:
The majority of participants held a positive outlook towards data sharing and re-use, endorsing strong scholarly altruism, ethical considerations supporting data sharing, and perceiving data re-use as advantageous for improving research quality and advancing the field. Results were largely consistent with prior surveys of scientists across a wide range of disciplines. A significant proportion of respondents reported instances of data sharing and re-use, but gold standard practices such as formally depositing data in established repositories were reported as infrequent. The study identifies potential barriers such as time constraints, funding, and familiarity with FAIR principles.
Conclusions:
These results carry crucial implications for promoting change and devising a FAIR Data Toolkit tailored for traumatic stress researchers, emphasizing aspects such as study planning, data preservation, metadata standardization, endorsing data re-use, and establishing metrics to assess scientific and societal impact.
Open access dataset integrating EEG and fNIRS during Stroop tasks | Scientific Data
Abstract: Conflict monitoring and processing are crucial components of the human cognitive system, with significant implications for daily life and the diagnosis of cognitive disorders. The Stroop task, combined with brain function detection technology, has been widely employed as a classical paradigm for investigating conflict processing. However, there remains a lack of public datasets that integrate Electroencephalogram (EEG) and functional Near-infrared Spectroscopy (fNIRS) to simultaneously record brain activity during a Stroop task. We introduce a dual-modality Stroop task dataset incorporating 34-channel EEG (sampling frequency is 1000?Hz) and 20-channel high temporal resolution fNIRS (sampling frequency is 100?Hz) measurements covering the whole frontal cerebral cortex from 21 participants (9 females/12 males, aged 23.0?±?2.3 years). Event-related potential analysis of EEG recordings and activation analysis of fNIRS recordings were performed to show the significant Stroop effect. We expected that the data provided would be utilized to investigate multimodal data processing algorithms during cognitive processing.
CARE Principles — Global Indigenous Data Alliance
“The current movement toward open data and open science does not fully engage with Indigenous Peoples rights and interests. Existing principles within the open data movement (e.g. FAIR: findable, accessible, interoperable, reusable) primarily focus on characteristics of data that will facilitate increased data sharing among entities while ignoring power differentials and historical contexts. The emphasis on greater data sharing alone creates a tension for Indigenous Peoples who are also asserting greater control over the application and use of Indigenous data and Indigenous Knowledge for collective benefit.
This includes the right to create value from Indigenous data in ways that are grounded in Indigenous worldviews and realise opportunities within the knowledge economy. The CARE Principles for Indigenous Data Governance are people and purpose-oriented, reflecting the crucial role of data in advancing Indigenous innovation and self-determination. These principles complement the existing FAIR principles encouraging open and other data movements to consider both people and purpose in their advocacy and pursuits….”
Information sciences professor developing tool to make data visualizations accessible to blind researchers, students | Illinois
“JooYoung Seo, a professor of information sciences at the University of Illinois Urbana-Champaign, is developing a data visualization tool that will help make visual representations of statistical data accessible to researchers and students who are blind or visually impaired.
The multimodal representation tool is aimed at the accessibility of statistical graphs, such as bar plots, box plots, scatter plots and heat maps….
The tool, called Multimodal Access and Interactive Data Representation, presents data through sonification, text and Braille….
His accessibility module will be added to the Teach Access repository, and Seo plans to share it on GitHub as an open-source project. He’ll also introduce it to his data science students during this academic year….”