Datasheets for Digital Cultural Heritage Datasets – Journal of Open Humanities Data

Abstract:  Sparked by issues of quality and lack of proper documentation for datasets, the machine learning community has begun developing standardised processes for establishing datasheets for machine learning datasets, with the intent to provide context and information on provenance, purposes, composition, the collection process, recommended uses or societal biases reflected in training datasets. This approach fits well with practices and procedures established in GLAM institutions, such as establishing collections’ descriptions. However, digital cultural heritage datasets are marked by specific characteristics. They are often the product of multiple layers of selection; they may have been created for different purposes than establishing a statistical sample according to a specific research question; they change over time and are heterogeneous. Punctuated by a series of recommendations to create datasheets for digital cultural heritage, the paper addresses the scope and characteristics of digital cultural heritage datasets; possible metrics and measures; lessons from concepts similar to datasheets and/or established workflows in the cultural heritage sector. This paper includes a proposal for a datasheet template that has been adapted for use in cultural heritage institutions, and which proposes to incorporate information on the motivation and selection criteria, digitisation pipeline, data provenance, the use of linked open data, and version information.

The Redemption of Plan S | Jeff Pooley

“On Tuesday—Halloween here in the US—cOAlition S released a new open access blueprint, one that, in effect, proposes to dismantle the prevailing journal system. Under an anodyne title (“Toward Responsible Publishing”), the group of (mostly) European state funders and foundations endorsed a future for scholarly communication in which publishers are recast as competing service providers. It’s also in basic alignment with the movement to shift peer review to a post-publication phase—with curation and discoverability detached from the per-title, periodic-release journal system. The third major pillar of the plan is to de-throne the version-of-record article (and, implicitly, the monograph), by granting other outputs (like datasets and reviews) equal footing in the realm of recognition.

The plan, to borrow a phrase from Joe Biden, is a BFD.

In this post, I want to make three quick points, which I hope to expand on soon. The first is that the Plan S initiative represents an uneasy convergence between two strands of the nonprofit, mission-driven OA world: between (1) those who’ve championed scholar-submitted preprints and post-prints to open repositories, coupled with an emergent post-release review ecosystem; and (2) advocates of nonprofit, fee-free OA publishing, who tend to employ the traditional version-of-record journal and book formats. The distinction, in the bizarre lingo we’ve inherited, is green versus diamond.

I don’t want to exaggerate the differences between these two approaches. There’s a shared belief, most crucially, that the academic community should restore custody over the scholarly publishing system—wrench it back, that is, from the oligopolists. A second shared tenet is that an OA system based on APCs (or their read-and-publish equivalent) is arguably worse than the tolled system it seeks to replace. APC-based OA trades barriers to readers for barriers to authors, with the right to publish meted out according to institutional wealth or national origin. So that’s a lot of agreement: a nonprofit, community-led system that doesn’t exclude authors.

Still, the differences are important. The green route—sometimes termed Publish, Review, Curate (PRC), in that order—aims to replace the journal system altogether. The diamond route, by contrast, seeks to fix that system.1

The rethought Plan S leans green….”

Open-access reformers launch next bold publishing plan

“The group behind the radical open-access initiative Plan S has announced its next big plan to shake up research publishing — and this one could be bolder than the first. It wants all versions of an article and its associated peer-review reports to be published openly from the outset, without authors paying any fees, and for authors, rather than publishers, to decide when and where to first publish their work….”

Introducing the “Towards Responsible Publishing” proposal from cOAlition S

“Driven by the same “duty of care for the good functioning of the science system” that inspired Plan S, the funders forming cOAlition S are now exploring a new vision for scholarly communication; a vision that holds the promise of being more effective, affordable, and equitable, ultimately benefiting society as a whole.


Our vision is a community-based scholarly communication system fit for open science in the 21st, that empowers scholars to share the full range of their research outputs and to participate in new quality control mechanisms and evaluation standards for these outputs….

To address these and other shortcomings, the new proposal is anchored in two key concepts that extend Plan S:


1. Authors, not third-party suppliers, decide when and what to publish.



In such a ‘scholar-led’ publishing system, third-party suppliers can still offer and charge for services that facilitate peer review, publication and preservation. However, they will not block scholars from sharing their work at any stage during the research and dissemination process.



2. The scholarly record includes the full range of outputs created during the research cycle, and not just the final journal-accepted version.



By making early article versions and peer review feedback critical elements of the scholarly record, a future scholarly communication system can capture research ‘in the act’. Shining a light on how research progresses towards increasingly trustworthy knowledge creation offers opportunities for reviewing and filtering scholarly outputs for the purposes of curation and research assessment….”

ResearchGate Newsroom | ResearchGate and Taylor & Francis partner to help researchers discover journals and access articles more easily

“ResearchGate, the professional network for researchers, and Taylor & Francis, a world-renowned academic publisher, today announced a new partnership, with 200 Taylor & Francis journals now available for researchers to discover on ResearchGate. 

All 200 titles will benefit from enhanced visibility and engagement through ResearchGate’s innovative Journal Home offering. Each journal will have a dedicated profile, accessible throughout the ResearchGate platform, and will be prominently represented on associated article pages and relevant touch points across the network.

Researchers will also be able read more than 60,000 version-of-record open access articles directly on the ResearchGate platform. Additional articles from 80 fully open access Taylor & Francis journals will continue to be added to this number as they are published in the future….”

What Is A Repository For? – Building the Commons

“If you haven’t heard, in 2024 Humanities Commons will be launching a completely reimagined open-access repository. It’s currently under heavy construction. So we’ve been asking ourselves: Why does the Commons have a repository in the first place? At our heart we are a social network, a hub for scholarly exchange. Most of us don’t think “repository” when we think about social networks like Mastodon or Instagram or Facebook. So what exactly is a repository? And why will the new repository be so vital to the life of the Commons?…

How will the new Commons repository broadcast researchers’ work? Reaching an audience is partly about open access. This is not just a matter of letting visitors view the works on the repository site free-of-charge. It is also about letting other open access services and sites “re-broadcast” works from the Commons collection. So we will offer free access to the Commons repository in the formats that other tools and aggregators can use: a REST API, OAI-PMH streams, and (later on) the COAR Notify protocol. And we will embed data about each work in its repository page so that it is catalogued by services like Google Scholar. This extends the audience for members’ work far beyond the circle of people who visit the Commons….”

OSF Preprints | Identification and categorization of iterative preprints in the life sciences

Abstract:  Preprints, manuscripts posted online prior to journal-organized peer-review, are an alternative to the traditional, slow, expensive, and inequitable journal publication system. They enable earlier sharing of research outcomes and can even be used to obtain early feedback on work in progress. We aim to identify such alternative uses of preprints, including works-in-progress that are deposited and updated as preprints (iterative preprints). We rely first on a computational approach to identify alternative preprints that we then qualitatively assess. We aim to communicate our approach and results to the community as an iterative preprint itself. In this version, we present our computational approach and initial exploratory results as we seek feedback on our methodology.

ACS Publications provides a new option to support zero-embargo green open access – American Chemical Society

“Beginning Oct. 1, 2023, the Publications Division of the American Chemical Society (ACS) will provide authors with a new option to satisfy funder requirements for zero-embargo green open access. Through this pathway, authors will be able to post accepted manuscripts with a CC BY license in open access repositories immediately upon acceptance.

To ensure a sustainable model of delivering services from submission to final editorial decision, ACS Publications is introducing an article development charge (ADC) as part of this new zero-embargo green open access option. The ADC covers the cost of ACS’ publishing services through the final editorial decision….”

Zero-Embargo Green Open Access – ACS Open Science

“A number of funders and institutions require authors to retain the right to post their accepted manuscripts immediately upon acceptance for publication in a journal, sometimes referred to as zero-embargo green open access (OA). More than 90% of ACS authors under these mandates have a simple and funded pathway to publish gold OA in ACS journals.

For those not covered by an institutional read and publish agreement or through other types of funding, ACS offers the option to post their accepted manuscripts with a CC BY license in open access repositories immediately upon acceptance. This option expands this small subset of authors’ choices beyond the existing option to wait 12 months to post at no cost.


An article development charge (ADC) will be applied if the zero-embargo green OA route is requested and the manuscript is recommended to be sent out for peer review. The ADC covers the cost of ACS’ publishing services through the final editorial decision….”

PreprintResolver: Improving Citation Quality by Resolving Published Versions of ArXiv Preprints using Literature Databases

Abstract:  The growing impact of preprint servers enables the rapid sharing of time-sensitive research. Likewise, it is becoming increasingly difficult to distinguish high-quality, peer-reviewed research from preprints. Although preprints are often later published in peer-reviewed journals, this information is often missing from preprint servers. To overcome this problem, the PreprintResolver was developed, which uses four literature databases (DBLP, SemanticScholar, OpenAlex, and CrossRef / CrossCite) to identify preprint-publication pairs for the arXiv preprint server. The target audience focuses on, but is not limited to inexperienced researchers and students, especially from the field of computer science. The tool is based on a fuzzy matching of author surnames, titles, and DOIs. Experiments were performed on a sample of 1,000 arXiv-preprints from the research field of computer science and without any publication information. With 77.94 %, computer science is highly affected by missing publication information in arXiv. The results show that the PreprintResolver was able to resolve 603 out of 1,000 (60.3 %) arXiv-preprints from the research field of computer science and without any publication information. All four literature databases contributed to the final result. In a manual validation, a random sample of 100 resolved preprints was checked. For all preprints, at least one result is plausible. For nine preprints, more than one result was identified, three of which are partially invalid. In conclusion the PreprintResolver is suitable for individual, manually reviewed requests, but less suitable for bulk requests. The PreprintResolver tool (this https URL, Available from 2023-08-01) and source code (this https URL, Accessed: 2023-07-19) is available online.

Detecting duplicate records and manuscript versions in your repository – CORE

“To assist the community with this challenge, we have developed the new CORE Dashboard Versions and Duplicates module. This provides a simple interface for identifying versions and duplicates in your repository. Our system pinpoints different versions of your articles allowing you to easily review them side-by-side and mark them using the widely used NISO Journal Article Versions (JAV) taxonomy. Exact duplicates can also be reviewed and marked for removal from your repository. The marking can then be exported from the Dashboard into .csv format enabling automation in your repository. The duplicates check runs periodically every time CORE indexes content from your repository….”

STM Trends 2026 : The Beauty of Open at Scale

“Throughout the next three to five years, there will be a sharp rise of Open Access within scholarly communications. In an eco-system that is Open-at-Scale, there will be many new opportunities for scalable tools for knowledge discovery on massively available content. We expect that this will likely have a significant impact on the ecosystem of scholarly communications, most likely in a very positive and beautiful way – and, as the motto says: it will change things At Scale….

One of our future forecasts is also that in an Open Access world the competition for the best authors and peer reviewers will intensify….

A world of Open Access needs a new locus of trust. Information will appear in many places and in many versions. We need to secure the Version of Record that was peer-reviewed….”