#ASAPpdb: Structural biologists commit to releasing data with preprints – ASAPbio

“The Protein Data Bank (PDB) was established as the first open access repository for biological data, and the datasets it hosts have been invaluable to research in fundamental biology and the understanding of health and disease. Just this month, we witnessed the announcement of the AlphaFold2 results toward structure prediction, made possible thanks to the more than 170,000 freely accessible structures in the PDB which provided “training data” for the structure prediction software.

It was not always the case that such structural biology data were freely available, even upon journal publication. From the founding of the PDB in 1971 until the late 1980s, most journals did not require deposition of structures in a public database. A key moment was a petition, circulated in 1987 by a group of leading structural biologists, demanding that the data created be made openly available upon journal publication. This petition led to major journals adopting data deposition standards. In the early 1990s, the National Institute of General Medical Sciences (NIGMS) imposed similar requirements on all grantees. 

The revolution in publishing made possible by preprints calls for a re-evaluation of data disclosure practices in structural biology. While journal review processes take weeks, months, or even years, preprints allow researchers to rapidly communicate their findings to the community. However, withholding access to PDB files that accompany preprints inhibits the progress towards scientific discovery which preprints can enable. 

Commitment

We pledge to publicly release our PDB files (and associated structure factor, restraint, and map files) with deposition of our preprints.

We encourage all structural biologists to also deposit raw data in appropriate resources (e.g. EMPIAR, proteindiffraction.org, https://data.sbgrid.org/, etc). …”

Scientific Data recommended repositories

“Spreadsheet listing data repositories that are recommended by Scientific Data (Springer Nature) as being suitable for hosting data associated with peer-reviewed articles. Please see the repository list on Scientific Data’s website for the most up to date list….”

What Data Repository Should I use? – a help article for using figshare

There is a home for every dataset. 

Ideally each subject would have a subject specific data repository with custom metadata capture and a subject specialist to curate the data. If there is a subject specific repository that is suitable for your datasets, use that one.

A list of subject specific repositories that are recommended by Scientific Data (Springer Nature) as being suitable for hosting data associated with peer-reviewed articles can be found here.

If there is no subject specific repository, you should check with your Institution’s Library as to whether they have a data repository. If they do, they will most likely have a team of data librarians to guide you through your dataset publication.
 

If there is no subject specific repository, you should upload and publish your data on Figshare.com, or another suitable generalist repository. How to upload and publish your data on Figshare.com
 

For more information on which repository to use, head to Re3Data ….”

Open Context: Web-based research data publishing

“Open Context reviews, edits, annotates, publishes and archives research data and digital documentation. We publish your data and preserve it with leading digital libraries. We take steps beyond archiving to richly annotate and integrate your analyses, maps and media. This links your data to the wider world and broadens the impact of your ideas….”

Input to “Data Repository Selection: Criteria that Matter” – COAR

“There has been significant concern expressed in the repository community about the requirements contained in the Data Repository Selection: Criteria that Matter, which sets out a number of criteria for the identification and selection of data repositories that will be used by publishers to guide authors in terms of where they should deposit their data.

COAR agrees that it is important to encourage and support the adoption of best practices in repositories. And there are a number of initiatives looking at requirements for repositories, based on different objectives such as the FAIR Principles, CoreTrustSeal, the TRUST Principles, and the CARE Principles of Indigenous Data Governance. Recently COAR brought together many of these requirements – assessed and validated them with a range of repository types and across regions – resulting in the publication of the COAR Community Framework for Best Practices in Repositories.

However, there is a risk that if repository requirements are set very high or applied strictly, then only a few well-resourced repositories will be able to fully comply. The criteria set out in Data Repository Selection: Criteria that Matter are not currently supported by most domain or generalist data repositories, in particular the dataset-level requirements. If implemented by publishers, this will have a very detrimental effect on the open science ecosystem by concentrating repository services within a few organizations, further exacerbating inequalities in access to services. Additionally, it will introduce bias against some researchers, for example,  researchers who prefer to share their data locally; researchers in the global south; or researchers who want to share their data in a relevant domain repository, so it can be visible to their peers and integrated with other similar datasets….”

Internship Opportunity: (Dis)Trust in Public-Sector Data Infrastructures – Social Media Collective

“Microsoft Research NYC is looking for an advanced PhD student to conduct an original research project on a topic under the rubric of “(dis)trust in public-sector data infrastructures.” MSR internships provide PhD students with an opportunity to work on an independent research project that advances their intellectual development while collaborating with a multi-disciplinary group of scholars. Interns typically relish the networks that they build through this program. This internship will be mentored by danah boyd; the intern will be part of both the NYC lab’s cohort and a member of the Social Media Collective. Applicants for this internship should be interested in conducting original research related to how trust in public-sector data infrastructures is formed and/or destroyed….”

Data sharing policies in scholarly publications: interdisciplinary comparisons on JSTOR

Abstract:  Digital sharing of research data is becoming an important research integrity norm. Data sharing is promoted in different avenues, one being the scholarly publication process: journals serve as gatekeepers, recommending or mandating data sharing as a condition for publication. While there is now a sizeable corpus of research assessing the pervasiveness and efficacy of journal data sharing policies in various disciplines, available research is largely piecemeal and mitigates against meaningful comparisons across disciplines. A major contribution of the present research is that it makes direct across-discipline comparisons employing a common methodology. The paper opens with a discussion of the arguments aired in favour and against data sharing (with an emphasis on ethical issues, which stand behind these policies). The websites of 150 journals, drawn from 15 disciplines, were examined for information on data sharing. The results consolidate the notion of the primacy of biomedical sciences in the implementation of data sharing norms and the lagging implementation in the arts and humanities. More surprisingly, they attest to similar levels of norms adoption in the physical and social sciences. The results point to the overlooked status of the formal sciences, which demonstrate low levels of data sharing implementation. The study also examines the policies of the major journal publishers. The paper concludes with a presentation of the current preferences for different data sharing solutions in different fields, in specialized repositories, general repositories, or publishers’ hosting area.

 

re3data – Advancing Services for Open Science

Abstract:  re3data is the global registry for research data repositories [2]. With January 2019 the service lists over 2250 digital repositories and provides an extensive description based on a detailed metadata schema [3]. A variety of funders, publishers and scientific organizations around the world refer to re3data within their guidelines and policies, recommending the service to researchers looking for appropriate repositories for storage and search of research data. Starting with an introduction and overview to re3data and its current status under the auspices of DataCite, the talk will outline the recent and upcoming development in a heterogeneous and highly dynamic research data infrastructure landscape. The diverse requirements of the institutional stakeholders as well as the scientific communities impose demanding challenges on the architecture, networking with other services and technical implementation. The presentation will illustrate that with recent examples, like the integration and reuse of re3data in the American Geophysical Union’s (AGU) ‘Repository Finder’[1] , landscape analysis of data repositories for the Swiss National Science Foundation (SNSF) and a planned cooperation with B2FIND, RADAR and GeRDI on the subject classification. Fostering Open Science and FAIR data, the talk will close with a prospect on the planned next steps towards an open and linked data service matching the demands of researchers and organization.

ICPSR

“An international consortium of more than 750 academic institutions and research organizations, Inter-university Consortium for Political and Social Research (ICPSR) provides leadership and training in data access, curation, and methods of analysis for the social science research community.

ICPSR maintains a data archive of more than 250,000 files of research in the social and behavioral sciences. It hosts 21 specialized collections of data in education, aging, criminal justice, substance abuse, terrorism, and other fields.

ICPSR collaborates with a number of funders, including U.S. statistical agencies and foundations, to create thematic data collections and data stewardship and research projects….”

ICPSR

“An international consortium of more than 750 academic institutions and research organizations, Inter-university Consortium for Political and Social Research (ICPSR) provides leadership and training in data access, curation, and methods of analysis for the social science research community.

ICPSR maintains a data archive of more than 250,000 files of research in the social and behavioral sciences. It hosts 21 specialized collections of data in education, aging, criminal justice, substance abuse, terrorism, and other fields.

ICPSR collaborates with a number of funders, including U.S. statistical agencies and foundations, to create thematic data collections and data stewardship and research projects….”

Making data open, accessible for researchers and scholars | University of Arizona Libraries

“A new service created by the University of Arizona Libraries is helping researchers and students amplify their individual or cross-departmental work, while taking the our commitment to open to the next level.

ReDATA—a free research data repository that stores and shares datasets produced by University of Arizona researchers—was recently launched by the Libraries’ Office of Innovation of Digital Innovation & Stewardship.

In addition to addressing the growing number of funding agencies and journal publishers that require open access to underlying research data, the team that developed ReDATA identified an opportunity to tackle a strategic gap on campus. …

The service, which aligns with the Libraries’ mission to reduce barriers to accessing and sharing information, also allows researchers to receive credit and track the impact of their work. The platform looks at embedded download and citation counts, as well as altmetrics, which counts all of the mentions tracked for an individual research output. 

Traditional scholarly outputs include journal articles, books, conference proceedings, and monographs. Over the last decade, there has been an increase in expectations from the research community to provide supporting data and software alongside the original publication.

ReDATA accepts and archives all types of data, including spreadsheets, binary files, software and scripts, audiovisual content, and presentations….”

RDM-Services – Events – GÉANT federated confluence

“This collaborative workshop will explore different service delivery models that research institutions can adopt when supporting data management. These could apply to research information management systems (CRIS), data repositories, e-Lab notebooks and many other platforms.

Delivery models typically include open source software that is supported in-house, outsourced hosting of OSS, vendor-supported commercial services, and bespoke institutional services. Various partnership models supported by institutional groups, national consortia and NRENs will also be explored.

The workshop will run adjacent to the 16th Research Data Alliance plenary in Costa Rica. In order to support international participation, all sessions will take place daily at 20:00-22:00 UTC – Check your timezone here.  Attendees can sign up for individual sessions.

Monday 2nd November: Opening panel and workshop introduction
Tuesday 3rd November: Procurement pain points
Wednesday 4th November: Open Source business models
Thursday 5th November: Partnerships
Friday 6th November: Closing discussion …”

Hahnel Argues for Making Data as Open as Possible | NIH Record

“Speaking virtually from London to a group of more than 120 NIH employees at a recent NIH Data Science Town Hall sponsored by the Office of Data Science Strategy, Dr. Mark Hahnel said, “To get the most out of science, research data needs to be as open as possible, as closed as necessary.”

For Hahnel, “open as possible” means data that is published openly and well-described. It also means educating researchers on the importance of data-sharing and the tools available to them….”

The Case for Making Data as Open as Possible | Data Science at NIH

“In July 2020 the Office of Data Science Strategy (ODSS) at the National Institutes of Health (NIH) completed the NIH Figshare Instance project, a one-year pilot with existing generalist repository Figshare to determine how biomedical researchers may use a generalist repository for sharing and reusing NIH-funded data.

To mark the conclusion of this project, ODSS invited Figshare founder and CEO Mark Hahnel, Ph.D., to share some of the pilot outcomes, his perspective on lessons learned from the project, and his thoughts on the future of data sharing at the NIH Data Science Town Hall, a monthly meeting for NIH employees interested in data science activities across the agency. The recording(link is external) of his presentation is now available. …”