“On March 31 2021, PLOS Computational Biology introduced a new journal requirement: mandated code sharing. If the research process included the creation of custom code, the authors were required to make it available during the peer review assessment, and to make it public upon the publication of their research article—similar to the longstanding data sharing requirement for all PLOS journals. The aim, of course, is to improve reproducibility and increase understanding of research.
At the end of the year-long trial period, code sharing had risen from 53% in 2019 to 87% for 2021 articles submitted after the policy went into effect. Evidence in hand, the journal Editors-in-Chief decided to make code sharing a permanent feature of the journal. Today, the sharing rate is 96%….”
“Traditionally, research articles comprise the majority of publications across all disciplines, with journals prioritizing these above all else. However, in Open Research Europe, they represent 51% of all articles – so what are the remaining 49%?
In this blog, we highlight why broadening this focus of publishable work is important and the benefits different article types on Open Research Europe can bring to your research….”
Abstract: Early in the pandemic, pre-print servers sped rapid evidence sharing. A collaborative of major medical journals supported their use to ensure equitable access to scientific advancements. In the intervening three years, we have made major advancements in the prevention and treatment of COVID-19 and learned about the benefits and limitations of pre-prints as a mechanism for sharing and disseminating scientific knowledge.
Pre-prints increase attention, citations, and ultimately impact policy, often before findings are verified. Evidence suggests that pre-prints have more spin relative to peer-reviewed publications. Clinical trial findings posted on pre-print servers do not change substantially following peer-review, but other study types (e.g., modeling and observational studies) often undergo substantial revision or are never published.
Nuanced policies about sharing results are needed to balance rapid implementation of true and important advancements with accuracy. Policies recommending immediate posting of COVID-19-related research should be re-evaluated, and standards for evaluation and sharing of unverified studies should be developed. These may include specifications about what information is included in pre-prints and requirements for certain data quality standards (e.g., automated review of images and tables); requirements for code release and sharing; and limiting early postings to methods, results, and limitations sections.
Academic publishing needs to innovate and improve, but assessments of evidence quality remains a critical part of the scientific discovery and dissemination process.
Abstract: Open science practices such as posting data or code and pre-registering analyses are increasingly prescribed and debated in the applied sciences, but the actual popularity and lifetime usage of these practices remain unknown. This study provides an assessment of attitudes toward, use of, and perceived norms regarding open science practices from a sample of authors published in top-10 (most-cited) journals and PhD students in top-20 ranked North American departments from four major social science disciplines: economics, political science, psychology, and sociology. We observe largely favorable private attitudes toward widespread lifetime usage (meaning that a researcher has used a particular practice at least once) of open science practices. As of 2020, nearly 90% of scholars had ever used at least one such practice. Support for posting data or code online is higher (88% overall support and nearly at the ceiling in some fields) than support for pre-registration (58% overall). With respect to norms, there is evidence that the scholars in our sample appear to underestimate the use of open science practices in their field. We also document that the reported lifetime prevalence of open science practices increased from 49% in 2010 to 87% a decade later.
“Are you interested in reproducible code and Open Science? Then we have the perfect opportunity for you!
As part of a pilot project between TU Delft and CODECHECK, we are organising a codechecking hackathon on 18th September 2023! During this hackathon, you will learn the concept behind codechecking, and practise related skills to check whether available code and data can reproduce the results in a paper, preprint or project. More information about the codechecking process can be found here.
Would you like to participate as a codechecker, and help promote reproducible code and Open Science? Register via this page, and save the date! The hackathon will take place over two sessions, in the morning and afternoon. Details of the programme will be released in early September.
PhD candidates at TU Delft are eligible for 0.5 Graduate School credits, provided they attend the entire session (morning and afternoon), and write a short reflection (between 300-350 words) on the skills they learned during the codechecking session, to be uploaded on their DMA profiles. To confirm their eligibility for GS credits, PhD candidates must seek approval from their supervisors and their Faculty Graduate Schools in advance of the session. If confirmation of attendance is required from the organisers, please let us know beforehand….”
Abstract: Data-driven computational analysis is becoming increasingly important in biomedical research, as the amount of data being generated continues to grow. However, the lack of practices of sharing research outputs, such as data, source code and methods, affects transparency and reproducibility of studies, which are critical to the advancement of science. Many published studies are not reproducible due to insufficient documentation, code, and data being shared. We conducted a comprehensive analysis of 453 manuscripts published between 2016-2021 and found that 50.1% of them fail to share the analytical code. Even among those that did disclose their code, a vast majority failed to offer additional research outputs, such as data. Furthermore, only one in ten papers organized their code in a structured and reproducible manner. We discovered a significant association between the presence of code availability statements and increased code availability (p=2.71×10?9). Additionally, a greater proportion of studies conducting secondary analyses were inclined to share their code compared to those conducting primary analyses (p=1.15*10?07). In light of our findings, we propose raising awareness of code sharing practices and taking immediate steps to enhance code availability to improve reproducibility in biomedical research. By increasing transparency and reproducibility, we can promote scientific rigor, encourage collaboration, and accelerate scientific discoveries. We must prioritize open science practices, including sharing code, data, and other research products, to ensure that biomedical research can be replicated and built upon by others in the scientific community.
“On March 28, 2023, the US Department of Transportation (DOT) released a request for information on “NASA’S Public Access Plan: Increasing Access to the Results of Scientific Research.” The Association of Research Libraries (ARL) is pleased to offer the following comments in response to this request….”
Abstract: Biologists increasingly rely on computer code, reinforcing the importance of published code for transparency, reproducibility, training, and a basis for further work. Here we conduct a literature review examining temporal trends in code sharing in ecology and evolution publications since 2010, and test for an influence of code sharing on citation rate. We find that scientists are overwhelmingly (95%) failing to publish their code and that there has been no significant improvement over time, but we also find evidence that code sharing can considerably improve citations, particularly when combined with open access publication.
“UCL Office for Open Science and Scholarship and the local chapter of the UK Reproducibility Network are excited to announce the first Open Science and Scholarship Awards at UCL. UCL has been a pioneer in promoting open science practices, which include Open Access Publishing, Open Data and Software, Transparency, Reproducibility and other Open Methodologies, as well as the creation and use of Open Educational Resources, Citizen Science, Public Involvement, Co-production and Communication.
With these awards, we want to recognise and celebrate all UCL students and staff who embrace, advance, and promote open science….”
Abstract: The definition of scholarly content has expanded to include the data and source code that contribute to a publication. While major archiving efforts to preserve conventional scholarly content, typically in PDFs (e.g., LOCKSS, CLOCKSS, Portico), are underway, no analogous effort has yet emerged to preserve the data and code referenced in those PDFs, particularly the scholarly code hosted online on Git Hosting Platforms (GHPs). Similarly, the Software Heritage Foundation is working to archive public source code, but there is value in archiving the issue threads, pull requests, and wikis that provide important context to the code while maintaining their original URLs. In current implementations, source code and its ephemera are not preserved, which presents a problem for scholarly projects where reproducibility matters. To understand and quantify the scope of this issue, we analyzed the use of GHP URIs in the arXiv and PMC corpora from January 2007 to December 2021. In total, there were 253,590 URIs to GitHub, SourceForge, Bitbucket, and GitLab repositories across the 2.66 million publications in the corpora. We found that GitHub, GitLab, SourceForge, and Bitbucket were collectively linked to 160 times in 2007 and 76,746 times in 2021. In 2021, one out of five publications in the arXiv corpus included a URI to GitHub. The complexity of GHPs like GitHub is not amenable to conventional Web archiving techniques. Therefore, the growing use of GHPs in scholarly publications points to an urgent and growing need for dedicated efforts to archive their holdings in order to preserve research code and its scholarly ephemera.
We aimed to assess the adherence to five transparency practices (data availability, code availability, protocol registration and conflicts of interest (COI), and funding disclosures) from open access Coronavirus disease 2019 (COVID-19) related articles.
We searched and exported all open access COVID-19-related articles from PubMed-indexed journals in the Europe PubMed Central database published from January 2020 to June 9, 2022. With a validated and automated tool, we detected transparent practices of three paper types: research articles, randomized controlled trials (RCTs), and reviews. Basic journal- and article-related information were retrieved from the database. We used R for the descriptive analyses.
The total number of articles was 258,678, of which we were able to retrieve full texts of 186,157 (72%) articles from the database Over half of the papers (55.7%, n = 103,732) were research articles, 10.9% (n = 20,229) were review articles, and less than one percent (n = 1,202) were RCTs. Approximately nine-tenths of articles (in all three paper types) had a statement to disclose COI. Funding disclosure (83.9%, confidence interval (CI): 81.7–85.8 95%) and protocol registration (53.5%, 95% CI: 50.7–56.3) were more frequent in RCTs than in reviews or research articles. Reviews shared data (2.5%, 95% CI: 2.3–2.8) and code (0.4%, 95% CI: 0.4–0.5) less frequently than RCTs or research articles. Articles published in 2022 had the highest adherence to all five transparency practices. Most of the reviews (62%) and research articles (58%) adhered to two transparency practices, whereas almost half of the RCTs (47%) adhered to three practices. There were journal- and publisher-related differences in all five practices, and articles that did not adhere to transparency practices were more likely published in lowest impact journals and were less likely cited.
While most articles were freely available and had a COI disclosure, adherence to other transparent practices was far from acceptable. A much stronger commitment to open science practices, particularly to protocol registration, data and code sharing, is needed from all stakeholders.
“A cohort of HELIOS member representatives have joined with other open source experts to author a PLOS Biology perspective, “Policy recommendations to ensure that research software is openly accessible and reusable”. The piece provides policymaking guidance to federal agencies on leveraging research software to maximize research equity, transparency, and reproducibility. It makes the affirmative case that to accurately be able to replicate and reproduce results and build on shared data, we must not only have access to the data themselves, but also understand exactly how they were used and analyzed. To this end, federal agencies in the midst of developing their responses to the White House Office of Science and Technology Policy (OSTP) memorandum on “Ensuring Free, Immediate, and Equitable Access to Federally Funded Research” can and should ensure that research software is elevated as a core component of the scientific endeavor….”
Abstract: Research is facing a reproducibility crisis, in which the results and findings of many studies are difficult or even impossible to reproduce. This is also the case in machine learning (ML) and artificial intelligence (AI) research. Often, this is the case due to unpublished data and/or source-code, and due to sensitivity to ML training conditions. Although different solutions to address this issue are discussed in the research community such as using ML platforms, the level of reproducibility in ML-driven research is not increasing substantially. Therefore, in this mini survey, we review the literature on reproducibility in ML-driven research with three main aims: (i) reflect on the current situation of ML reproducibility in various research fields, (ii) identify reproducibility issues and barriers that exist in these research fields applying ML, and (iii) identify potential drivers such as tools, practices, and interventions that support ML reproducibility. With this, we hope to contribute to decisions on the viability of different solutions for supporting ML reproducibility.
As part of their updated policy plans submitted in response to the 2022 OSTP memo, US federal agencies should, at a minimum, articulate a pathway for developing guidance on research software sharing, and, at a maximum, incorporate research software sharing requirements as a necessary extension of any data sharing policy and a critical strategy to make data truly FAIR (as these principles have been adapted to apply to research software ).
As part of sharing requirements, federal agencies should specify that research software should be deposited in trusted, public repositories that maximize discovery, collaborative development, version control, long-term preservation, and other key elements of the National Science and Technology Council’s “Desirable Characteristics of Data Repositories for Federally Funded Research” , as adapted to fit the unique considerations of research software.
US federal agencies should encourage grantees to use non-proprietary software and file formats, whenever possible, to collect and store data. We realize that for some research areas and specialized techniques, viable non-proprietary software may not exist for data collection. However, in many cases, files can be exported and shared using non-proprietary formats or scripts can be provided to allow others to open files.
Consistent with the US Administration’s approach to cybersecurity , federal agencies should provide clear guidance on measures grantees are expected to undertake to ensure the security and integrity of research software. This guidance should encompass the design, development, dissemination, and documentation of research software. Examples include the National Institute of Standards and Technology’s secure software development framework and Linux Foundation’s open source security foundation.
As part of the allowable costs that grantees can request to help them meet research sharing requirements, US federal agencies should include reasonable costs associated with developing and maintaining research software needed to maximize data accessibility and reusability for as long as it is practical. Federal agencies should ensure that such costs are additive to proposal budgets, rather than consuming funds that would otherwise go to the research itself.
US federal agencies should encourage grantees to apply licenses to their research software that facilitate replication, reuse, and extensibility, while balancing individual and institutional intellectual property considerations. Agencies can point grantees to guidance on desirable criteria for distribution terms and approved licenses from the Open Source Initiative.
In parallel with the actions listed above that can be immediately incorporated into new public access plans, US federal agencies should also explore long-term strategies to elevate research software to co-equal research outputs and further incentivize its maintenance and sharing to improve research reproducibility, replicability, and integrity….”
“The role of research software in science is more important now than ever. Researchers often develop their own software and use it to generate, process, or analyze results. Plus, software can be a valuable research output on its own. Other researchers can improve or adapt the software for new purposes, or apply it in their own projects whilst giving credit to the creators. Yet, when it comes to publishing research software, researchers in the software community can face several challenges.
In this webinar, Demitra Ellina (Associate Publisher, F1000) and Joseph Dunn (Assistant Editor, F1000) share how researchers can publish their research software in ways that maximize the benefits for themselves, the wider research community, and other users. Register now to join the session, where you’ll uncover: ? How to share your research software through non-traditional article types ? How to boost the visibility, reach, and reproducibility of your work ? Guidance on software data sharing ? Relevant case studies from F1000…”