TU Delft and CODECHECK Hackathon

“Are you interested in reproducible code and Open Science? Then we have the perfect opportunity for you!

As part of a pilot project between TU Delft and CODECHECK, we are organising a codechecking hackathon on 18th September 2023! During this hackathon, you will learn the concept behind codechecking, and practise related skills to check whether available code and data can reproduce the results in a paper, preprint or project. More information about the codechecking process can be found here.

Would you like to participate as a codechecker, and help promote reproducible code and Open Science? Register via this page, and save the date! The hackathon will take place over two sessions, in the morning and afternoon. Details of the programme will be released in early September.

Familiarity with a current programming or data analysis language (e.g., R, Python, JavaScript, Julia) is beneficial.

PhD candidates at TU Delft are eligible for 0.5 Graduate School credits, provided they attend the entire session (morning and afternoon), and write a short reflection (between 300-350 words) on the skills they learned during the codechecking session, to be uploaded on their DMA profiles. To confirm their eligibility for GS credits, PhD candidates must seek approval from their supervisors and their Faculty Graduate Schools in advance of the session. If confirmation of attendance is required from the organisers, please let us know beforehand….”

PsyArXiv Preprints | ReproduceMe: lessons from a pilot project on computational reproducibility

Abstract:  If a scientific paper is computationally reproducible, the analyses it reports can be repeated independently by others. At the present time most papers are not reproducible. However, the tools to enable computational reproducibility are now widely available, using free and open source software. We conducted a pilot study in which we offered ‘reproducibility as a service’ within a UK psychology department for a period of 6 months. Our rationale was that most researchers lack either the time or expertise to make their own work reproducible, but might be willing to allow this to be done by an independent team. Ten papers were converted into reproducible format using R markdown, such that all analyses were conducted by a single script that could download raw data from online platforms as required, generate figures, and produce a pdf of the final manuscript. For some studies this involved reproducing analyses originally conducted using commercial software. The project was an overall success, with strong support from the contributing authors who saw clear benefit from this work, including greater transparency and openness, and ease of use for the reader. Here we describe our framework for reproducibility, summarise the specific lessons learned during the project, and discuss the future of computational reproducibility. Our view is that computationally reproducible manuscripts embody many of the core principles of open science, and should become the default format for scientific communication.

PsyArXiv Preprints | Concerns about Replicability, Theorizing, Applicability, Generalizability, and Methodology across Two Crises in Social Psychology

Abstract:  Twice in the history of social psychology has there been a crisis of confidence. The first started in the 1960s and lasted until the end of the 1970s, and the second crisis dominated the 2010s. In both these crises, researchers discussed fundamental concerns about the replicability of findings, the strength of theories in the field, the societal relevance of research, the generalizability of effects, and problematic methodological and statistical practices. On the basis of extensive quotes drawn from articles published during both crises, I explore the similarities and differences in discussions across both crises in social psychology.

Analytical code sharing practices in biomedical research | bioRxiv

Abstract:  Data-driven computational analysis is becoming increasingly important in biomedical research, as the amount of data being generated continues to grow. However, the lack of practices of sharing research outputs, such as data, source code and methods, affects transparency and reproducibility of studies, which are critical to the advancement of science. Many published studies are not reproducible due to insufficient documentation, code, and data being shared. We conducted a comprehensive analysis of 453 manuscripts published between 2016-2021 and found that 50.1% of them fail to share the analytical code. Even among those that did disclose their code, a vast majority failed to offer additional research outputs, such as data. Furthermore, only one in ten papers organized their code in a structured and reproducible manner. We discovered a significant association between the presence of code availability statements and increased code availability (p=2.71×10?9). Additionally, a greater proportion of studies conducting secondary analyses were inclined to share their code compared to those conducting primary analyses (p=1.15*10?07). In light of our findings, we propose raising awareness of code sharing practices and taking immediate steps to enhance code availability to improve reproducibility in biomedical research. By increasing transparency and reproducibility, we can promote scientific rigor, encourage collaboration, and accelerate scientific discoveries. We must prioritize open science practices, including sharing code, data, and other research products, to ensure that biomedical research can be replicated and built upon by others in the scientific community.

 

[2308.07333] Computational reproducibility of Jupyter notebooks from biomedical publications

Abstract:  Jupyter notebooks facilitate the bundling of executable code with its documentation and output in one interactive environment, and they represent a popular mechanism to document and share computational workflows. The reproducibility of computational aspects of research is a key component of scientific reproducibility but has not yet been assessed at scale for Jupyter notebooks associated with biomedical publications. We address computational reproducibility at two levels: First, using fully automated workflows, we analyzed the computational reproducibility of Jupyter notebooks related to publications indexed in PubMed Central. We identified such notebooks by mining the articles full text, locating them on GitHub and re-running them in an environment as close to the original as possible. We documented reproduction success and exceptions and explored relationships between notebook reproducibility and variables related to the notebooks or publications. Second, this study represents a reproducibility attempt in and of itself, using essentially the same methodology twice on PubMed Central over two years. Out of 27271 notebooks from 2660 GitHub repositories associated with 3467 articles, 22578 notebooks were written in Python, including 15817 that had their dependencies declared in standard requirement files and that we attempted to re-run automatically. For 10388 of these, all declared dependencies could be installed successfully, and we re-ran them to assess reproducibility. Of these, 1203 notebooks ran through without any errors, including 879 that produced results identical to those reported in the original notebook and 324 for which our results differed from the originally reported ones. Running the other notebooks resulted in exceptions. We zoom in on common problems, highlight trends and discuss potential improvements to Jupyter-related workflows associated with biomedical publications.

AIMOS Tip Talk – YouTube

“We often hear that practising open and reproducible research is hard; it takes too much time to learn the new ways of working.

But we know there are tips and hacks that our colleagues have accumulated over years of practice. Simple things that can make big improvements to research quality. It could be some new software that makes reproducible research easier, a great method that you’ve found to be robust, or just a different way of thinking about research, like “randomise everything!”. In this one-hour online session we gathered these great tips from our meta-research community….”

Code sharing increases citations, but remains uncommon | Research Square

Abstract:  Biologists increasingly rely on computer code, reinforcing the importance of published code for transparency, reproducibility, training, and a basis for further work. Here we conduct a literature review examining temporal trends in code sharing in ecology and evolution publications since 2010, and test for an influence of code sharing on citation rate. We find that scientists are overwhelmingly (95%) failing to publish their code and that there has been no significant improvement over time, but we also find evidence that code sharing can considerably improve citations, particularly when combined with open access publication.

 

Announcing: the inaugural UCL Open Science & Scholarship Awards! | UCL Open@UCL Blog

“UCL Office for Open Science and Scholarship and the local chapter of the UK Reproducibility Network are excited to announce the first Open Science and Scholarship Awards at UCL. UCL has been a pioneer in promoting open science practices, which include Open Access Publishing, Open Data and Software, Transparency, Reproducibility and other Open Methodologies, as well as the creation and use of Open Educational Resources, Citizen Science, Public Involvement, Co-production and Communication.

With these awards, we want to recognise and celebrate all UCL students and staff who embrace, advance, and promote open science….”

AI and Publishing: Moving forward requires looking backward – TL;DR – Digital Science

“One area where the use of generative AI to write research papers poses a serious challenge is open research. The proprietary nature of the most widely used tools means the underlying model used is not available for independent inspection or verification. This lack of disclosure of the material on which the model has trained “threatens hard-won progress on research ethics and the reproducibility of results”.

In addition, the current inability for AI to document the provenance of its data sources through citation, and lack of identifiers of those data sources means there is no ability to replicate the ‘findings’ that have been generated by AI. This has raised calls for the development of a formal specification or standard for AI documentation that is backed by a robust data model. Our current publishing environment does not prioritise reproducibility, with code sharing optional and a slow uptake of requirements to share data. In this environment, the generation of fake data is of particular concern. However, ChatGPT “is not the creator of these issues; it instead enables this problem to exist at a much larger scale”.

And that leads me to my provocation – In the same way that a decade ago, open access was a scapegoat for scholarly communication*, now generative AI is a scapegoat for the research assessment system. Let me explain….”

Metascience Since 2012: A Personal History – by Stuart Buck

“This essay is a personal history of the $60+ million I allocated to metascience starting in 2012 while working for the Arnold Foundation (now Arnold Ventures).

Click and keep reading if you want to know:

How the Center for Open Science started

How I accidentally up working with the John Oliver show

What kept PubPeer from going under in 2014

How a new set of data standards in neuroimaging arose

How a future-Nobel economist got started with a new education research organization

How the most widely-adopted set of journal standards came about

Why so many journals are offering registered reports

How writing about ideas on Twitter could fortuitously lead to a multi-million grant

Why we should reform graduate education in quantitative disciplines so as to include published replications

When meetings are useful (or not)

Why we need a new federal data infrastructure

I included lots of pointed commentary throughout, on issues like how to identify talent, how government funding should work, and how private philanthropy can be more effective. The conclusion is particularly critical of current grantmaking practices, so keep reading (or else skip ahead)….”

Data sharing: putting Nature’s policy to the test

“Policies for sharing research data promote reproducibility of published results by supporting independent verification of raw data, methods and conclusions (see, for example, go.nature.com/3oinwy4). Confirmation validates the efforts of the original researchers, reassures the scientific community and encourages others to build on the findings (see go.nature.com/3om9ken). Here we recount our experience of accessing data provided by the authors of two prominent Nature papers.

Our investigations, which took 12 people roughly a year, upheld the conclusions of both papers (V. L. Li et al. Nature 606, 785–790 (2022); T. Iram et al. Nature 605, 509–515; 2022). In each case, we found most of the data online and successfully reproduced most findings after discussion with the authors. When we had difficulty reproducing analyses on the basis of publicly available data and materials alone, the authors provided clarification about data and methods, which resolved most discrepancies.

This positive experience prompted us to generate a checklist to help researchers to facilitate reproducibility of their published findings through sharing of data and statistical methods (see https://osf.io/ps3y9).”

 

Reply to: Recognizing and marshalling the pre-publication error correction potential of open data for more reproducible science | Nature Ecology & Evolution

“In response to our paper, Chen et al.2 highlighted that mandatory open data policies also increase opportunities for detecting and correcting errors pre-publication. We welcome Chen et al.’s comment and acknowledge that we omitted discussing the important, positive impact that mandatory open data policies can have on various pre-publication processes. Our study design and the interpretation of our results were probably influenced by our prior experience of reporting data anomalies and research misconduct to journals, and witnessing first-hand the challenges of post-publication error correction3,4,5. As long-standing advocates of transparency and reproducibility in research, we would celebrate empirical evidence that data sharing mandates increase pre-publication error detection….”

FORRT and the Center for Open Science Join Forces to Foster Open and Reproducible Research Training

“The Framework for Open and Reproducible Research Training (FORRT) and the Center for Open Science (COS) are thrilled to announce an exciting partnership, representing a significant step forward in our shared mission to use education to promote open and reproducible research practices. The partnership enables the scientific community to build upon each other’s work and advance knowledge collaboratively.”

The replication crisis has led to positive structural, procedural, and community changes | Communications Psychology

Abstract:  The emergence of large-scale replication projects yielding successful rates substantially lower than expected caused the behavioural, cognitive, and social sciences to experience a so-called ‘replication crisis’. In this Perspective, we reframe this ‘crisis’ through the lens of a credibility revolution, focusing on positive structural, procedural and community-driven changes. Second, we outline a path to expand ongoing advances and improvements. The credibility revolution has been an impetus to several substantive changes which will have a positive, long-term impact on our research environment.

From the body of the article: “An academic movement collectively known as open scholarship (incorporating Open Science and Open Research) has driven constructive change by accelerating the uptake of robust research practices while concomitantly championing a more diverse, equitable, inclusive, and accessible psychological science….”

Recognizing and marshalling the pre-publication error correction potential of open data for more reproducible science | Nature Ecology & Evolution

“We enthusiastically applaud Berberi and Roche’s1 effort to evaluate the effects of journals’ mandatory open data policies on the error correction potential of science. Berberi and Roche conclude that at present there is “no evidence that mandatory open data policies increase error correction”. This may come as a surprise and a disappointment to advocates of open science. However, we suggest that by only addressing effects on post-publication error correction, Berberi and Roche miss the crucial dimension of pre-publication error correction potential in their assessment and may therefore substantially underestimate the true merits of mandatory open data policies….”