Twenty-Fifth Year Reflections on PKP – Public Knowledge Project

“In 1998, I initiated a project that set out to make research a greater part of what constituted public knowledge. I called it a Public Knowledge project. That is, before PKP was PKP, it was PKp. The initial project arose out of a modest gift to the University of British Columbia from Pacific Press, the company that owned Vancouver’s two newspapers, the Vancouver Sun and Province. On learning of this gift to UBC, where I served as a Faculty of Education professor, I proposed that this new Pacific Press Professorship explore how the internet, with all its early promise as an “information highway,” could increase public access to research and scholarship. This would complement the Pacific Press’ journalism, I suggested, as well as advance educational goals, by expanding the storehouse of public knowledge….”



European Commission provides funding to improve Open Access publishing landscape

“From January 2023, the University of Coimbra will be involved in another Open Science project: the CRAFT-OA project (“Creating a Robust Accessible Federated Technology for Open Access”) involves 23 partners in 14 European countries and will last for 36 months. The project is funded under the Horizon Europe framework programme, aiming to evolve and strengthen the institutional publishing landscape of Diamond Open Access (Diamond OA): no fees for authors or readers.

By offering tangible services and tools for the entire journal publishing lifecycle, CRAFT-OA will empower local and regional platforms and service providers to extend, professionalise and achieve greater interoperability with other scientific information systems for content and platforms. These developments will help researchers and publishers involved in publishing.

The project focuses on four action strands to improve the Diamond OA model:

(1) Providing technical improvements for journal platforms and journal software.

(2) Building communities of practice to promote overall infrastructure improvement

(3) Increase the visibility, discoverability and recognition of Diamond OA publishing

(4) Integrate Diamond OA publishing with the European Open Science Cloud (EOSC) and other large-scale data aggregators….”

bookdown: Authoring Books and Technical Documents with R Markdown | 2023-01-09 | Yihui Xie

“This short book introduces an R package, bookdown, to change your workflow of writing books. It should be technically easy to write a book, visually pleasant to view the book, fun to interact with the book, convenient to navigate through the book, straightforward for readers to contribute or leave feedback to the book author(s), and more importantly, authors should not always be distracted by typesetting details. The bookdown package is built on top of R Markdown (, and inherits the simplicity of the Markdown syntax (you can learn the basics in five minutes; see Section 2.1), as well as the possibility of multiple types of output formats (PDF/HTML/Word/…). It has also added features like multi-page HTML output, numbering and cross-referencing figures/tables/sections/equations, inserting parts/appendices, and imported the GitBook style ( to create elegant and appealing HTML book pages. This book itself is an example of how you can produce a book from a series of R Markdown documents, and both the printed version and the online version can look professional. You can find more examples at….”

GitHub is Sued, and We May Learn Something About Creative Commons Licensing – The Scholarly Kitchen

“I have had people tell me with doctrinal certainty that Creative Commons licenses allow text and data mining, and insofar as license terms are observed, I agree. The making of copies to perform text and data mining, machine learning, and AI training (collectively “TDM”) without additional licensing is authorized for commercial and non-commercial purposes under CC BY, and for non-commercial purposes under CC BY-NC. (Full disclosure: CCC offers RightFind XML, a service that supports licensed commercial access to full-text articles for TDM with value-added capabilities.)

I have long wondered, however, about the interplay between the attribution requirement (i.e., the “BY” in CC BY) and TDM. After all, the bargain with those licenses is that the author allows reuse, typically at no cost, but requires attribution. Attribution under the CC licenses may be the author’s primary benefit and motivation, as few authors would agree to offer the licenses without credit.

In the TDM context, this raises interesting questions:

Does the attribution requirement mean that the author’s information may not be removed as a data element from the content, even if inclusion might frustrate the TDM exercise or introduce noise into the system?
Does the attribution need to be included in the data set at every stage?
Does the result of the mining need to include attribution, even if hundreds of thousands of CC BY works were mined and the output does not include content from individual works?

While these questions may have once seemed theoretical, that is no longer the case. An analogous situation involving open software licenses (GNU and the like) is now being litigated….”

Fermilab/CERN recommendation for Linux distribution

“CERN and Fermilab jointly plan to provide AlmaLinux as the standard distribution for experiments at our facilities, reflecting recent experience and discussions with experiments and other stakeholders. AlmaLinux has recently been gaining traction among the community due to its long life cycle for each major version, extended architecture support, rapid release cycle, upstream community contributions, and support for security advisory metadata. In testing, it has demonstrated to be perfectly compatible with the other rebuilds and Red Hat Enterprise Linux.

CERN and, to a lesser extent, Fermilab, will also use Red Hat Enterprise Linux (RHEL) for some services and applications within the respective laboratories. Scientific Linux 7, at Fermilab, and CERN CentOS 7, at CERN, will continue to be supported for their remaining life, until June 2024….”

CERN and Fermilab Opt for AlmaLinux as Standard for Big Science

“CERN and Fermilab will make AlmaLinux the standard distribution for experiments at their facilities based on feedback from stakeholders.

Following CentOS’s withdrawal from the enterprise server distribution market, AlmaLinux and Rocky Linux have emerged as the two best RHEL-based derivatives in this segment. As a result, it is not surprising that when looking for a free alternative to Red Hat Enterprise Linux, the choice frequently comes down to one of the two.

Probably two of the world’s leading scientific laboratories, the Swiss-based CERN and the US-based Fermilab, faced a similar dilemma….


Unfortunately, CERN and Fermilab do not disclose any additional details about the nature of the tests or the alternatives that led to the final choice to adopt AlmaLinux exclusively….”

Research Software vs. Research Data II: Protocols… | F1000Research

Abstract:  Background: Open Science seeks to render research outputs visible, accessible and reusable. In this context, Research Data and Research Software sharing and dissemination issues provide real challenges to the scientific community, as consequence of recent progress in political, legal and funding requirements.

Methods: We take advantage from the approach we have developed in a precedent publication, in which we have highlighted the similarities between the Research Data and Research Software definitions.

Results: The similarities between Research Data and Research Software definitions can be extended to propose protocols for Research Data dissemination and evaluation derived from those already proposed for Research Software dissemination and evaluation. We also analyze FAIR principles for these outputs.

Conclusions: Our proposals here provide concrete instructions for Research Data and Research Software producers to make them more findable and accessible, as well as arguments to choose suitable dissemination platforms to complete the FAIR framework. Future work could analyze the potential extension of this parallelism to other kinds of research outputs that are disseminated under similar conditions to those of Research Data and Research Software, that is, without widely accepted publication procedures involving editors or other external actors and where the dissemination is usually restricted through the hands of the production team.

NASA Releases Updated Scientific Information Policy for Science Mission Directorate

“On December 8, NASA’s Science Mission Directorate (SMD) released an important update to its comprehensive Scientific Information Policy (SPD-41a), which represents a significant percentage of NASA’s research expenditures. While this policy does not serve as NASA’s official response to the OSTP Nelson Memorandum, it is a good indication of what we are likely to ultimately see in NASA’s agency-wide public access plan, which is due out in February 2023….

Requires that peer-reviewed publications be made openly available with no embargo period via deposit in an agency-approved repository….

Requires that research data be shared at the time of publication or the end of the funding award….

Requires mission software to be developed and shared openly….

Requires that the proceedings of SMD-sponsored meetings and workshops be held openly to enable broad participation….”

NASA Releases Updated Scientific Information Policy for Science Mission Directorate

“On December 8, NASA’s Science Mission Directorate (SMD) released an important update to its comprehensive Scientific Information Policy (SPD-41a), which represents a significant percentage of NASA’s research expenditures. While this policy does not serve as NASA’s official response to the OSTP Nelson Memorandum, it is a good indication of what we are likely to ultimately see in NASA’s agency-wide public access plan, which is due out in February 2023….

Requires that peer-reviewed publications be made openly available with no embargo period via deposit in an agency-approved repository….

Requires that research data be shared at the time of publication or the end of the funding award….

Requires mission software to be developed and shared openly….

Requires that the proceedings of SMD-sponsored meetings and workshops be held openly to enable broad participation….”

The Collection Management System Collection · BLOG Progress Process

“It seems like every couple of months, I get asked for advice on picking a Collection Management System (or maybe referred to as a digital repository, or something else) for use in an archive, special collection library, museum, or another small “GLAMorous” institution. The acronym is CMS, which is not to be confused with Content Management System (which is for your blog). This can be for collection management, digital asset management, collection description, digital preservation, public access and request support, or combinations of all of the above. And these things have to fit into an existing workflow/system, or maybe replace an old system and require a data migration component. And on top of that, there are so many options out there! This can be overwhelming!

What factors do you use in making a decision? I tried to put together some crucial components to consider, while keeping it as simple as possible (if 19 columns can be considered simple). I also want to be able to answer questions with a strong yes/no, to avoid getting bogged down in “well, kinda…” For example, I had a “Price” category and a “Handles complex media?” category but I took them away because it was too subjective of an issue to be able to give an easy answer. A lot of these are still going to be “well, kinda” and in that case, we should make a generalization. (Ah, this is where the “simple” part comes in!)

In the end, though, it is really going to depend on the unique needs of your institution, so the answer is always going to be “well, kinda?” But I hope this spreadsheet can be used as a starting point for those preparing to make a decision, or those who need to jog their memory with “Can this thing do that?”…”

Introducing the FAIR Principles for research software | Scientific Data

Abstract:  Research software is a fundamental and vital part of research, yet significant challenges to discoverability, productivity, quality, reproducibility, and sustainability exist. Improving the practice of scholarship is a common goal of the open science, open source, and FAIR (Findable, Accessible, Interoperable and Reusable) communities and research software is now being understood as a type of digital object to which FAIR should be applied. This emergence reflects a maturation of the research community to better understand the crucial role of FAIR research software in maximising research value. The FAIR for Research Software (FAIR4RS) Working Group has adapted the FAIR Guiding Principles to create the FAIR Principles for Research Software (FAIR4RS Principles). The contents and context of the FAIR4RS Principles are summarised here to provide the basis for discussion of their adoption. Examples of implementation by organisations are provided to share information on how to maximise the value of research outputs, and to encourage others to amplify the importance and impact of this work.


Nine best practices for research software registries and repositories [PeerJ]

Abstract:  Scientific software registries and repositories improve software findability and research transparency, provide information for software citations, and foster preservation of computational methods in a wide range of disciplines. Registries and repositories play a critical role by supporting research reproducibility and replicability, but developing them takes effort and few guidelines are available to help prospective creators of these resources. To address this need, the FORCE11 Software Citation Implementation Working Group convened a Task Force to distill the experiences of the managers of existing resources in setting expectations for all stakeholders. In this article, we describe the resultant best practices which include defining the scope, policies, and rules that govern individual registries and repositories, along with the background, examples, and collaborative work that went into their development. We believe that establishing specific policies such as those presented here will help other scientific software registries and repositories better serve their users and their disciplines.


Author interview: Nine best practices for software repositories and registries

PeerJ talks to Daniel Garijo about the recently published PeerJ Computer Science article Nine best practices for research software registries and repositories. The article is featured in the PeerJ Software Citation, Indexing, and Discoverability Special Issue.


Can you tell us a bit about yourself?

This work would not have been possible without the SciCodes community, the participants of the 2019 Scientific Software Registry Collaboration Workshop and the FORCE11 Software Citation Implementation Working Group. It all started when a task force of that working group undertook the initial work that is detailed in the paper, and then formed SciCodes to continue working together. We are a group of software enthusiasts who maintain and curate research software repositories and registries from different disciplines, including geosciences, neuroscience, biology, and astronomy (currently more than 20 resources and 30 worldwide participants are members of the initiative) 


Can you briefly explain the research you published in PeerJ?

In examining the literature, we found best practices and policy suggestions for many different aspects of science, software, and data, but none that specifically addressed software repositories and registries. Our goal was to examine our own and other similar resources, share practices, discuss common challenges, and develop a set of basic best practices for these resources.  


What did you find? and how do these practices have such an impact?

We were surprised to find a lot of diversity between our resources. We expected that  our  domains, missions, and types of software in our collections would be different but we expected more commonality in the software metadata our  different resources collect! We had far  fewer fields in common than expected. For example, some resources might collect information on what operating system a software package runs on, other resources may not. In retrospect, this makes sense, since disciplines have different goals and expectations for sharing and reusability  of research software and different heterogeneities (or not) in technology used.


The practices outlined in our work aim to strengthen registries and repositories by including enacting policies that make our resources more transparent to our users and encourage us to think more about the long-term availability of software entries. They also provide a way for us to work cooperatively to establish a way for our metadata to be searched, as software that is useful in one field may have application in another.  

Our proposed practices are already having an impact. They have helped member registries audit their practices and start enacting policies and procedures to strengthen their practices. By doing so, they encourage long-term success for their communities. Through this paper, we hope that other registries find these useful in improving their practices and just maybe, contribute to the conversation by joining SciCodes.


What kinds of lessons do you hope your readers take away from the research?

We hope the proposed practices will help new and existing resources consider key aspects of their maintainability, metadata and future availability. We expected that the process of converging in common practices would be easy but developing policies and practices that cover a wide range of disciplines and missions was challenging. We are grateful to our funders that we could convene such a great group of experts together and of course, to the experts for contributing their time in helping make our initial draft better.


How did you first hear about PeerJ, and what persuaded you to submit to us?

An editor of this special issue on software citation, indexing and discoverability (

mentioned that this would be an interesting paper for the community. While not fitting neatly into this category, we felt that workshop discussions and resulting best practices contribute substantially to the software citation ecosystem as repositories and registries are a mechanism to promote discovery, reuse, and credit for software.


You can find more PeerJ author interviews here.

How does research software fit with the open-source software community? – Daniel S. Katz’s blog

“While at the Chan-Zuckerberg Initiative’s Open Science 2022 Annual Meeting a couple of weeks ago, I was struck by a comment from Demetris Cheatham about how she hadn’t known about the scientific open-source community until she was introduced to it fairly recently, even though she has a huge amount of experience with the larger open-source community. This was especially confounding when she shared that she realized upon learning about us that the voice of our community was missing from her work to create a more inclusive environment within open source….

While open-source is the dominant type of license, and the one I generally prefer, there is also important research software, particularly at the disciplinary level, that is not open source….”

CERN publishes comprehensive open science policy | CERN

CERN’s core values include making research open and accessible for everyone. A new policy now brings together existing open science initiatives to ensure a bright future based on transparency and collaboration at CERN.