Post by the PLOS ONE Editors on behalf of the PLOS Data Team Since 2015, the PLOS journals have maintained a list of repositories that we have determined to be suitable for authors depositing datasets that
0000-0003-1953-5833Since its inception, PLOS has encouraged data sharing; our original data policy (2003 – March 2014) required authors to share data upon request after publication. In line with PLOS’ ethos of open science and accelerating
In line with our updated Data Policy, we are pleased to announce a PLOS Data Repository Recommendation Guide. To support the selection of data repositories for authors, PLOS has identified a set of established repositories, which are recognized and trusted within their respective communities. To … Continue reading
Access to research results, immediately and without restriction, has always been at the heart of PLOS’ mission and the wider Open Access movement. However, without similar access to the data underlying the findings, the article can be of limited use. For this reason, PLOS has always required that authors make their data available to other academic researchers who wish to replicate, reanalyze, or build upon the findings published in our journals.
In an effort to increase access to this data, we are now revising our data-sharing policy for all PLOS journals: authors must make all data publicly available, without restriction, immediately upon publication of the article. Beginning March 3rd, 2014, all authors who submit to a PLOS journal will be asked to provide a Data Availability Statement, describing where and how others can access each dataset that underlies the findings. This Data Availability Statement will be published on the first page of each article.
What do we mean by data?
“Data are any and all of the digital materials that are collected and analyzed in the pursuit of scientific advances.” Examples could include spreadsheets of original measurements (of cells, of fluorescent intensity, of respiratory volume), large datasets such as
next-generation sequence reads, verbatim responses from qualitative studies, software code, or even image files used to create figures. Data should be in the form in which it was originally collected, before summarizing, analyzing or reporting.
What do we mean by publicly available?
All data must be in one of three places:
- the body of the manuscript; this may be appropriate for studies where the dataset is small enough to be presented in a table
- in the supporting information; this may be appropriate for moderately-sized datasets that can be reported in large tables or as compressed files, which can then be downloaded
- in a stable, public repository that provides an accession number or digital object identifier (DOI) for each dataset; there are many repositories that specialize in specific data types, and these are particularly suitable for very large datasets
Do we allow any exceptions?
Yes, but only in specific cases. We are aware that it is not ethical to make all datasets fully public, including private patient data, or specific information relating to endangered species. Some authors also obtain data from third parties and therefore do not have the right to make that dataset publicly available. In such cases, authors must state that “Data is available upon request”, and identify the person, group or committee to whom requests should be submitted. The authors themselves should not be the only point of contact for requesting data.
Where can I go for more information?
The revised data sharing policy, along with more information about the issues associated with public availability of data, can be reviewed in full at:
Image: Open Data stickers by Jonathan Gray
Last month PLOS ONE attended the ISMB/ECCB 2013 conference in Berlin on Intelligent Systems for Molecular Biology. More than 1,500 delegates attended what is the largest conference on computational biology in the world to discuss the latest developments in computational methods that address biological questions.
The opening keynote from PLOS ONE Academic Editor Gil Ast focused on alternative splicing, a mechanism by which several mRNA transcripts are generated from the same mRNA precursor, thus enhancing transcriptome and proteome diversity. He mentioned a paper his group published earlier this year in PLOS ONE, in which they showed that pre-mRNA splicing influences nucleosome organization, suggesting that there is a bi-directional interplay between chromatin organization and splicing. While it is widely accepted that chromatin organization and DNA modification regulate transcription, it is intriguing that splicing can in turn affect chromatin organization, and this may constitute an additional layer of regulation of gene expression. He also presented exciting recent findings showing how pre-mRNA splicing and the creation of new exons in the human genome may be linked to certain genetic disorders and types of cancers.
Understanding the biology of complex human disease is also one of Goncalo Abecasis’s objectives, winner of the ISCB 2013 Overton Prize. Specifically, he is interested in better understanding genetic variation and its connections to human diseases using computational methods and statistical tools. In his talk, he emphasized that the identification and characterization of the genetic variants that affect human traits may be achieved by examining the link between these traits and the complete genome sequences of thousands of individuals. To collect DNA from as many people as possible, he wondered whether we should make use of social media to call for volunteers to send their DNA samples. Are Facebook and Twitter the key to understanding human genetics?
One topic that generated much discussion at the meeting was data sharing. In her talk, Carole Goble called for all scientists to share their data widely as to enable reproducibility, a principle underpinning the scientific method. Several journals, including PLOS ONE, require that all data (including all relevant raw data) described in the manuscript be made freely available to any scientist wishing to use them for the purpose of academic, non-commercial research. Well established and widely supported public repositories already exist for certain types of data such as nucleic acid sequences, and in cases where an appropriate repository does not exist, there are also general data repositories such as Dryad. Assigned accession numbers or digital object identifiers (DOIs) facilitate data citation and ensure accountability. An increasing number of research funding agencies also now support data sharing in the life sciences. Whilst there is indeed increasing discussion to make primary data from published research publicly available, Goble mentioned a paper by Ioannidis and colleagues showing that a substantial proportion of articles published in high-impact journals do not comply (or only weakly comply) with data availability requirements. According to Goble, a lack of data sharing, and thus reproducibility, could lead to an increase in retracted scientific papers.
She also urged the computational biology community to release their “dark data”, i.e. data that is not published and remains hidden on various USB drives and computers, the point being that if shared more people will be able to use these results, increasing visibility, accountability and reproducibility. As highlighted by a recent study, data sharing is not an end in itself, but rather a crucial form of scientific knowledge dissemination.
Keren-Shaul H, Lev-Maor G, Ast G (2013) Pre-mRNA Splicing Is a Determinant of Nucleosome Organization. PLoS ONE 8(1): e53506. doi:10.1371/journal.pone.0053506
Alsheikh-Ali AA, Qureshi W, Al-Mallah MH, Ioannidis JPA (2011) Public Availability of Published Research Data in High-Impact Journals. PLoS ONE 6(9): e24357. doi:10.1371/journal.pone.0024357
Wallis JC, Rolando E, Borgman CL (2013) If We Share Data, Will Anyone Use Them? Data Sharing and Reuse in the Long Tail of Science and Technology. PLoS ONE 8(7): e67332. doi:10.1371/journal.pone.0067332
Wikimedia by Angelineri
Modified from Schwartz S, Oren R, Ast G (2011) Detection and Removal of Biases in the Analysis of Next-Generation Sequencing Reads. PLoS ONE 6(1): e16685. doi:10.1371/journal.pone.0016685
Ethics is a cornerstone of science, informing everything from how we design our experiments to what we do with the resulting data. Given the diverse nature of the research results published in PLOS ONE, no short and simple set of ethical guidelines can cover every situation. Thus, it is important for the journal to adapt and expand its ethical standards as the journal itself expands.
The field of paleontology, by its very nature, presents some special situations in ethics. Although the fossil subjects are long-dead, rendering matters of patient consent or laboratory animal care non-existent, other complicated concerns ranging from legalities to reproducibility must be taken into account. Any journal that hopes to be a major player in the study of fossils must confront these issues head-on.
As the volunteer section editor for paleontology at PLOS ONE, I am thrilled by the growth in the number of high-quality publications related to my field. I also want to make sure that all of these papers are held to the highest ethical standards, and many of my colleagues and I felt it was important to provide explicit ethical guidelines focused on paleontology. After extensive and thoughtful discussion with the journal’s internal editors and other interested parties, I am happy to announce that a specific set of editorial standards for paleontology submissions is now in place.
Critically, reproducible research in paleontology requires a long-term guarantee of accessibility and safety for fossils. This means that all fossils should be deposited in a permanent repository, such as a museum or university collection. Unfortunately, there is no guarantee that collections owned by private individuals—no matter how noble their intentions—will be accessible in the long-term. In one notable recent case, the family of a fossil enthusiast sold off the bulk of his scientifically important collection after he died. Some of the specimens had even been published in the peer-reviewed literature, but there is now little guarantee that any of those fossils will be accessible in fifty years, or even five. Moreover, not everything that calls itself a museum is a permanent collection; some are little more than showroom floors for a commercial fossil business. Some of these fossils do end up in permanent museum collections, but until this happens, it is extremely hazardous to publish on the specimens. Reproducibility and accessibility are key, as reflected in the new policies.
Ethical consideration is also critical for fossil collection in the field. Stories abound of skeletons in the Gobi Desert being looted for the most marketable parts (such as skulls or claws), which then end up for auction in Europe or North America. In fact, one recent PLOS ONE paper discussed dinosaur skin impressions salvaged from the mess left by fossil poachers who carted off more enticing pieces. Legal loopholes often mean that the specimens can then be traded or sold elsewhere, often accompanied by official-looking paperwork that purports to legitimize the original export. This horrible practice drains the world of its historical heritage and destroys scientific information. Thus, the new ethics policy explicitly prohibits publication of specimens that were obtained without permission or legal export.
This is a great day for paleontology at PLOS ONE, helping to ensure the journal’s future as a trustworthy publication with the highest ethical standards. I challenge everyone—authors, editors, readers, and reviewers—to carry the torch forward into a better world.
About the Author: Dr. Andrew Farke is a vertebrate paleontologist and an academic editor at PLOS ONE. Andy also has his own blog, The Open Source Paleontologist, and can be found on Twitter @andyfarke.
Image: The fossil reptile Captorhinus (collection of Raymond M. Alf Museum of Paleontology)