Abstract: If a scientific paper is computationally reproducible, the analyses it reports can be repeated independently by others. At the present time most papers are not reproducible. However, the tools to enable computational reproducibility are now widely available, using free and open source software. We conducted a pilot study in which we offered ‘reproducibility as a service’ within a UK psychology department for a period of 6 months. Our rationale was that most researchers lack either the time or expertise to make their own work reproducible, but might be willing to allow this to be done by an independent team. Ten papers were converted into reproducible format using R markdown, such that all analyses were conducted by a single script that could download raw data from online platforms as required, generate figures, and produce a pdf of the final manuscript. For some studies this involved reproducing analyses originally conducted using commercial software. The project was an overall success, with strong support from the contributing authors who saw clear benefit from this work, including greater transparency and openness, and ease of use for the reader. Here we describe our framework for reproducibility, summarise the specific lessons learned during the project, and discuss the future of computational reproducibility. Our view is that computationally reproducible manuscripts embody many of the core principles of open science, and should become the default format for scientific communication.
Abstract: Research software plays a crucial role in advancing scientific knowledge, but ensuring its sustainability, maintainability, and long-term viability is an ongoing challenge. To address these concerns, the Sustainable Research Software Institute (SRSI) Model presents a comprehensive framework designed to promote sustainable practices in the research software community. This white paper provides an in-depth overview of the SRSI Model, outlining its objectives, services, funding mechanisms, collaborations, and the significant potential impact it could have on the research software community. It explores the wide range of services offered, diverse funding sources, extensive collaboration opportunities, and the transformative influence of the SRSI Model on the research software landscape
“We are network of collaborators trying to keep track and curate interesting open source projects related to neurosciences. If you have a project that you’d like to see listed here or if you know of a project that should be listed, drop us a line, via E-mail, or Twitter.”
Abstract: Data-driven computational analysis is becoming increasingly important in biomedical research, as the amount of data being generated continues to grow. However, the lack of practices of sharing research outputs, such as data, source code and methods, affects transparency and reproducibility of studies, which are critical to the advancement of science. Many published studies are not reproducible due to insufficient documentation, code, and data being shared. We conducted a comprehensive analysis of 453 manuscripts published between 2016-2021 and found that 50.1% of them fail to share the analytical code. Even among those that did disclose their code, a vast majority failed to offer additional research outputs, such as data. Furthermore, only one in ten papers organized their code in a structured and reproducible manner. We discovered a significant association between the presence of code availability statements and increased code availability (p=2.71×10?9). Additionally, a greater proportion of studies conducting secondary analyses were inclined to share their code compared to those conducting primary analyses (p=1.15*10?07). In light of our findings, we propose raising awareness of code sharing practices and taking immediate steps to enhance code availability to improve reproducibility in biomedical research. By increasing transparency and reproducibility, we can promote scientific rigor, encourage collaboration, and accelerate scientific discoveries. We must prioritize open science practices, including sharing code, data, and other research products, to ensure that biomedical research can be replicated and built upon by others in the scientific community.
“Digital infrastructure is the code, policies, and standards powering the technology that permeates every aspect of life, such as hospitals, banking, and social media.
This infrastructure is under-maintained and undermined in ways that often favor corporate and government interests over the needs of the public….
We’re creating a community of researchers and practitioners to better understand the problem and to work together toward our common goal: a commons of technology, sustainably developed and maintained, for the benefit of everyone.
Our partners fund work in this space regularly. If you’d like to propose a project or join our funding partners, contact us using the form below, and we’ll be in touch.
“The D//F (Digital Infrastructure Insights Fund) is a multi-funder initiative by Ford Foundation, Alfred P. Sloan Foundation, Omidyar Network, Schmidt Futures and Open Collective sustaining a platform for researchers and practitioners to better understand how open digital infrastructure is built and deployed.
We’re creating a body of research and implementation insights that advance our goal to ensure a public commons of technology, sustainably developed and maintained, for the benefit of everyone….
More insights are needed to distinguish how this digital public good (=open code, policies and standards) and its creators can be supported best….
we are looking for analyses on how underlying free and open-source software (FOSS) interacts with politics, sovereign responsibilities, diverse economic sectors, and the advancement of knowledge in the sciences and beyond.
we aim to back the development of pertinent work that examines the convergence of open-source software and digital infrastructure with social movements focused on democracy, rights, justice, the environment and scientific research.
we seek to investigate the issue of under-maintenance and occasional undermining of FOSS, as well as explore any geographical or other disparities within the communities responsible for providing and sustaining these software components amid evolving regulatory and socio-technical circumstances….”
“The Open Library Foundation (OLF) is introducing the Open Resource Sharing Coalition (OpenRS), a resource sharing initiative created in partnership with library consortia, open source developers, and vendors. OpenRS is a heterogeneous resource sharing system that is ILS and Discovery agnostic and accommodates the full spectrum of mediated and unmediated resource sharing.
OpenRS acts upon a “consortia first” mentality, striving to provide libraries with the tools needed for robust and extended functionality for resource sharing. The project will focus on developing and implementing software systems, protocols, and best practices that foster collaboration and support various library services, including seamless unmediated intra-consortial borrowing functionality and expanded sharing across multiple consortia. The software will provide a containerized code base configured for ease of deployment, maintenance, and upgrades. Libraries and consortia can choose to host the service locally or with a third party.
Project governance will be centralized in a governing board elected by contributing partners and will also rely on feedback from a wide community of project adopters and investors. The coalition recognizes that the project will only succeed if all stakeholders’ needs – whether libraries, consortia, developers, or vendors – are heard, validated, and addressed. Coalition governance will be based on open source principles and rely on trust, transparency, agility, and a welcoming community.
OpenRS will be an Open Library Foundation (OLF) project which operates with an open, transparent approach, emphasizing the best practices for open source governance and DevSecOps. The OpenRS software is built and maintained by Knowledge Integration, with support from EBSCO Information Services (EBSCO). Additional OpenRS Community members include representatives from the MOBIUS consortium, GALILEO/University System of Georgia (USG), Marmot Library Network, Boston Library Consortium, Colorado Alliance of Research Libraries, and others.”
Abstract: Biologists increasingly rely on computer code, reinforcing the importance of published code for transparency, reproducibility, training, and a basis for further work. Here we conduct a literature review examining temporal trends in code sharing in ecology and evolution publications since 2010, and test for an influence of code sharing on citation rate. We find that scientists are overwhelmingly (95%) failing to publish their code and that there has been no significant improvement over time, but we also find evidence that code sharing can considerably improve citations, particularly when combined with open access publication.
“PUBLISSO offers a range of publishing platforms for publishing work and research data Open Access and permanently – following the spirit of Open Science. All publications receive a Digital Object Identifier (DOI) and are archived for the long term.
PUBLISSO provides several services for this purpose….”
Abstract: This paper examines ‘open’ AI in the context of recent attention to open and open source AI systems. We find that the terms ‘open’ and ‘open source’ are used in confusing and diverse ways, often constituting more aspiration or marketing than technical descriptor, and frequently blending concepts from both open source software and open science. This complicates an already complex landscape, in which there is currently no agreed on definition of ‘open’ in the context of AI, and as such the term is being applied to widely divergent offerings with little reference to a stable descriptor.
So, what exactly is ‘open’ about ‘open’ AI, and what does ‘open’ AI enable? To better answer these questions we begin this paper by looking at the various resources required to create and deploy AI systems, alongside the components that comprise these systems. We do this with an eye to which of these can, or cannot, be made open to scrutiny, reuse, and extension. What does ‘open’ mean in practice, and what are its limits in the context of AI? We find that while a handful of maximally open AI systems exist, which offer intentional and extensive transparency, reusability, and extensibility– the resources needed to build AI from scratch, and to deploy large AI systems at scale, remain ‘closed’—available only to those with significant (almost always corporate) resources. From here, we zoom out and examine the history of open source, its cleave from free software in the mid 1990s, and the contested processes by which open source has been incorporated into, and instrumented by, large tech corporations. As a current day example of the overbroad and ill-defined use of the term by tech companies, we look at ‘open’ in the context of OpenAI the company. We trace its moves from a humanity-focused nonprofit to a for-profit partnered with Microsoft, and its shifting position on ‘open’ AI. Finally, we examine the current discourse around ‘open’ AI–looking at how the term and the (mis)understandings about what ‘open’ enables are being deployed to shape the public’s and policymakers’ understanding about AI, its capabilities, and the power of the AI industry. In particular, we examine the arguments being made for and against ‘open’ and open source AI, who’s making them, and how they are being deployed in the debate over AI regulation.
Taken together, we find that ‘open’ AI can, in its more maximal instantiations, provide transparency, reusability, and extensibility that can enable third parties to deploy and build on top of powerful off-the-shelf AI models. These maximalist forms of ‘open’ AI can also allow some forms of auditing and oversight. But even the most open of ‘open’ AI systems do not, on their own, ensure democratic access to or meaningful competition in AI, nor does openness alone solve the problem of oversight and scrutiny. While we recognize that there is a vibrant community of earnest contributors building and contributing to ‘open’ AI efforts in the name of expanding access and insight, we also find that marketing around openness and investment in (somewhat) open AI systems is being leveraged by powerful companies to bolster their positions in the face of growing interest in AI regulation. And that some companies have moved to embrace ‘open’ AI as a mechanism to entrench dominance, using the rhetoric of ‘open’ AI to expand market power while investing in ‘open’ AI efforts in ways that allow them to set standards of development while benefiting from the free labor of open source contributors.
by Simon Bowie
Looking back at how the COPIM project protected the privacy of our website users and rethought the typical technical model for gathering web analytics
Location: Birkbeck, University of London, & streaming online
Dates: 07/09/2023 – 08/09/2023
Registration: https://thelowerdecks.janeway.systems/signup (closes on 22 August)
Conference programme: https://thelowerdecks.janeway.systems/programme
In celebration of the ten-year anniversary since the launch of the project of the Open Library of Humanities (OLH), an award-winning, academic-led, diamond open access journal publisher, and the five-year anniversary of Janeway, its ground-breaking open-source scholarly publishing platform, we are holding a symposium to explore future directions for Janeway engineering and open access publishing.
Since its launch, the Janeway and OLH team has built an international, award-winning, and critically acclaimed platform and is widely recognised to be one of the foremost academic-led publishers of open access scholarship in the humanities. As we look forward to the next five years, we aspire to consolidate our position as a leading open source scholarly publishing platform, innovate our software in line with user needs, and bring together our community to both increase visibility and make Janeway the very best platform of its kind. Accordingly, Janeway and OLH staff are hosting a symposium which will include presentations on best practice, future developments and breakout sessions to hear from our community as we work together to make these a reality. You can check the conference programme here
Im Zuge der Open-Access-Transformation sind wissenschaftsgeleitete Zeitschriften mit zahlreichen Herausforderungen konfrontiert: Neben finanzieller und infrastruktureller Unterstützung brauchen diese Zeitschriften ein „capacity building“, also die Hilfe zur Selbsthilfe, insbesondere, um Wissenslücken im Bereich des wissenschaftlichen Publizierens zu schließen. Die vorliegenden Handreichungen sind ein Beitrag zu diesem „capacity building“: Angelegt als praktische Ressource, sollen sie Zeitschriften und herausgebende Einrichtungen bedarfsorientiert anleiten und bei der Weiterentwicklung, Professionalisierung und Verstetigung der Publikationstätigkeit unterstützen. Das Set der sechs Handreichungen ist dabei das zentrale Ergebnis des Projektes „Scholar-led Plus“ am Alexander von Humboldt Institut für Internet und Gesellschaft und gefördert vom Bundesministerium für Bildung und Forschung. Neben den praktischen Ressourcen hat das Projekt, aufbauend auf einer mehrstufigen Delphi-Befragung, strategische Empfehlungen erarbeitet, die das Feld des wissenschaftsgeleiteten Publizierens prospektiv konturieren. Um eine größtmögliche Nutzbarkeit durch wissenschaftsgeleitete Zeitschriften und Projekte zu gewährleisten, sind die Handreichungen in Zusammenarbeit mit Expert*innen aus der Publikationspraxis konzipiert und geschrieben worden.
Das Gesamtdokument als Summe seiner Teile vermittelt grundsätzliches Wissen zu technischen Abläufen, Tools und Infrastrukturen, verknüpft dies aber auch mit Hinweisen zu urheberrechtlichen Aspekten und dem Anspruch auf Datenschutz. Es betont die Relevanz der redaktionellen Arbeit und gibt Empfehlungen zur Optimierung der Prozesse, wobei die bisher vielfach unterrepräsentierten Bereiche der wissenschaftlichen Kommunikation und Verbreitung der Inhalte gesondert betrachtet werden. Nicht zuletzt werden administrative Vorgänge adressiert: Neben den Kosten für Zeitschriften und Möglichkeiten der Finanzierung und Förderung werden auch Strategien guter Governance für Zeitschriften beschrieben.
Das Set an Handreichungen wird herausgegeben von Marcel Wrzesinski (Projektleitung “Scholar-led Plus”).
Technik und Infrastrukturen: Eichler, Frederik, Eppelin, Anita, Kampkaspar, Dario, Schrader, Antonia C., Söllner, Konstanze, Vierkant, Paul, & Withanage, Dulip. (2023). Handreichung Technik und Infrastrukturen. In Wissenschaftsgeleitetes Publizieren. Sechs Handreichungen mit Praxistipps und Perspektiven (pp. 7–18). Alexander von Humboldt Institut für Internet und Gesellschaft. https://doi.org/10.5281/zenodo.8208578
Urheberrecht und Datenschutz: Blumtritt, Ute, Euler, Ellen, Fadeeva, Yuliya, Pohle, Jörg, & Rack, Fabian. (2023). Handreichung Urheberrecht und Datenschutz. In Wissenschaftsgeleitetes Publizieren. Sechs Handreichungen mit Praxistipps und Perspektiven (pp. 19–34). Alexander von Humboldt Institut für Internet und Gesellschaft. https://doi.org/10.5281/zenodo.8208582
Arbeitsabläufe und Workflows: Bergmann, Max, Dalkilic, Evin, Ganz, Kathrin, Heinig, Julia, Kaden, Ben, Kalte, Isabella, & Junker, Judith. (2023). Handreichung Arbeitsabläufe und Workflows. In Wissenschaftsgeleitetes Publizieren. Sechs Handreichungen mit Praxistipps und Perspektiven (pp. 35–54). Alexander von Humboldt Institut für Internet und Gesellschaft. https://doi.org/10.5281/zenodo.8208678
Kommunikation und Distribution: Efferenn, Frederik, Ferguson, Lea Maria, Herb, Ulrich, Neufend, Maike, Schmitz, Jasmin, Siegfried, Doreen, & Taubert, Niels. (2023). Handreichung Kommunikation und Distribution. In Wissenschaftsgeleitetes Publizieren. Sechs Handreichungen mit Praxistipps und Perspektiven (pp. 55–68). Alexander von Humboldt Institut für Internet und Gesellschaft. https://doi.org/10.5281/zenodo.8208711
Kostenstrukturen und Geschäftsmodelle: Arning, Ursula, Barbers, Irene, Benz, Martina, Dellatorre, Margit, Finger, Juliane, Gast, Konstantin, Gebert, Agathe, Geuenich, Michael, Hahn, Daniela, Rieck, Katharina, & Sänger, Astrid. (2023). Handreichung Kostenstrukturen und Geschäftsmodelle. In Wissenschaftsgeleitetes Publizieren. Sechs Handreichungen mit Praxistipps und Perspektiven (pp. 69–82). Alexander von Humboldt Institut für Internet und Gesellschaft. https://doi.org/10.5281/zenodo.8210924
Governance und Rechtsform: Dalkilic, Evin, Hacker, Andrea, Hesse, Cindy, Jobmann, Alexandra, Kirchner, Andreas, Pampel, Heinz, Siegert, Olaf, & Steiner, Tobias. (2023). Handreichung Governance und Rechtsform. In Wissenschaftsgeleitetes Publizieren. Sechs Handreichungen mit Praxistipps und Perspektiven (pp. 83–96)
“The objective of New Alexandria is to develop and nurture an open environment for learning, teaching, and research about premodern civilizations that is inclusive, collaborative, and restlessly innovative. Instead of flattening the differences between ancient and current ways of representing the world, we seek to see more clearly the lively unfamiliarity of the ancient way. Instead of selectively extracting elements of ancient life from their historical contexts, we seek a holistic approach that is interdisciplinary and that integrates anthropological and other socially-informed methodologies. The basic rationale is that cultural and human differences are “good to think with,” vital for humanism and even for humanity….
Our aim is to take the best that we know and think about premodern civilizations and to make it available to anyone with access to the internet by way of a phone, a tablet, or a computer, at home, in a library, in a park, or on a bus. As our data will be free and open for all to use and engage with, so also must our interpretations of it be free and open, and so also the software that we create for accessing it and analyzing it must be free and open….”
“NASA estimates that its Earth science missions will generate around a quarter million terabytes of data in 2024 alone. In order for climate scientists and the research community efficiently dig through these reams of raw satellite data, IBM, HuggingFace and NASA have collaborated to build an open-source geospatial foundation model that will serve as the basis for a new class of climate and Earth science AIs that can track deforestation, predict crop yields and rack greenhouse gas emissions.
For this project, IBM leveraged its recently-released Watsonx.ai to serve as the foundational model using a year’s worth of NASA’s Harmonized Landsat Sentinel-2 satellite data (HLS). That data is collected by the ESA’s pair of Sentinel-2 satellites, which are built to acquire high resolution optical imagery over land and coastal regions in 13 spectral bands.
For it’s part, HuggingFace is hosting the model on its open-source AI platform. According to IBM, by fine-tuning the model on “labeled data for flood and burn scar mapping,” the team was able to improve the model’s performance 15 percent over the current state of the art using half as much data….”