Community Call: Introducing the 2022 Project TARA tools to support responsible research assessment | DORA

“Join DORA for a community call to introduce two new responsible research evaluation tools and provide feedback on future tool development. The toolkit is part of Project TARA, which aims to identify, understand, and make visible the criteria and standards universities use to make hiring, promotion, and tenure decisions. This interactive call will explore these new tools, which were each created to help community members who are seeking:

Strategies on how to debias committees and deliberative processes: It is increasingly recognized that more diverse decision-making panels make better decisions. Learn how to debias your committees and decision-making processes with this one-page brief.
Ideas on how to incorporate a wider range of contributions in their evaluation policies and practices: Capturing scholarly “impact” often relies on familiar suspects like h-index, JIF, and citations, despite evidence that these indicators are narrow, often misleading, and generally insufficient to capture the full richness of scholarly work. Learn how to consider a wider breadth of contributions in assessing the value of academic activities with this one-page brief….”

Similarity search on millions of books, in-browser / Benjamin Schmidt / Observable

“Keyword search remains dominant for books, but at some point, whether they know it or not, everyone will probably be searching vectorized representations. This notebook tries out some methods for textual similarity search across a large corpus of books based on vectorized representations.

Back in 2018, it took me a lot of effort to set up an approximate nearest-neighbors search on a server. Now in 2022, new technologies and new tricks that make it possible to search across 2 million+ books in dozens of languages without even having a server. In this demo notebook, I load exactly 2 million books; it would be quite easy to scale up significantly higher, although it might take a minute to download representations of ten to twenty million books….”

Scholia for Software

Abstract:  Scholia for Software is a project to add software profiling features to Scholia, which is a scholarly profiling service from the Wikimedia ecosystem and integrated with Wikipedia and Wikidata. This document is an adaptation of the funded grant proposal. We are sharing it for several reasons, including research transparency, our wish to encourage the sharing of research proposals for reuse and remixing in general, to assist others specifically in making proposals that would complement our activities, and because sharing this proposal helps us to tell the story of the project to community stakeholders.

A “scholarly profiling service” is a tool which assists the user in accessing data on some aspect of scholarship, usually in relation to research. Typical features of such services include returning the biography of academic publications for any given researcher, or providing a list of publications by topic. Scholia already exists as a Wikimedia platform tool built upon Wikidata and capable of serving these functions. This project will additionally add software-related data to Wikidata, develop Scholia’s own code, and address some ethical issues in diversity and representation around these activities. The end result will be that Scholia will have the ability to report what software a given researcher has described using in their publications, what software is most used among authors publishing on a given topic or in a given journal, what papers describe projects which use some given software, and what software is most often co-used in projects which use a given software.



Hello World, From Wikimedia Enterprise | 21 Jun 2022

“We launched Wikimedia Enterprise last year with a goal of making it easy to programmatically access data from across the Wikimedia Foundation projects. Since then, we have been busy building a product that can serve the needs of commercial users of any size. Today, we are thrilled to share some of the first customers using this product, in addition to new features that make it easy for anyone to start using Wikimedia Enterprise.  Today, we are excited to announce that: Google has become the very first customer of Wikimedia Enterprise. The Internet Archive will receive full access to Enterprise’s feature set, at no cost, for use in furthering their mission of archiving the Web. Self-service trial accounts are available to anyone to try out Wikimedia Enterprise for their own use. Trial accounts include unlimited free access to a monthly snapshot of the entire Wikimedia Enterprise project archive and 10,000 free requests from our On-Demand API. New product and pricing details are now available, including a pricing calculator to estimate usage cost after a trial, as well as comprehensive product documentation, and a customer service portal with detailed FAQs. We have also added a news page (you are reading it!) to better communicate updates and announcements to current and potential customers….”

Sci-Hub: The Largest Scientific Papers Library and Alternatives

“Sci-Hub is a library of scientific papers and journals that anyone can access for free. The site contains over 64 million papers from over 24,000 journals, making it one of the largest scientific libraries in the world. Anyone can search for and download papers from Sci-Hub, without needing a subscription or login. This makes it an invaluable resource for students and researchers who would otherwise have difficulty accessing this information. While some publishers have raised concerns about copyright infringement, Sci-Hub provides a valuable service by making knowledge more accessible to everyone.

The best alternative is Library Genesis, which is free. Other great sites and apps similar to Sci-Hub are Z-Library, Project Gutenberg, and Ebook3000.

Sci-Hub alternatives are mainly eBook Libraries but may also be Torrent Search Engines or Paywall Remover Tools….”

COPIM’s toolkit for running an Opening the Future programme at an academic press · Community-led Open Publication Infrastructures for Monographs (COPIM)

“In spring 2020, COPIM Work Package 3 started work on devising a new revenue model for university presses and open access books. Through a series of fact-finding meetings, workshops and reports the team gathered lots of information on the business models of scholarly presses with the aim of creating a sustainable revenue stream that would allow presses to publish their books openly, without using unaffordable book processing charges.

That research led to us devising and launching an innovative revenue model called Opening the Future in October 2020 with our first partner publisher Central European University (CEU) Press. In essence, it is a library subscription membership programme whereby the press provides term access to portions of their (closed) backlist books at a special price, and then uses the revenue from members’ subscriptions to allow the frontlist to be OA from the date of publication. This model presents a potential route for the mass and sustainable transition to OA of many small-to-mid sized university presses. Liverpool University Press (LUP), joined as our second project partner with their own Opening the Future initiative in June 2021. The programme is proving to be a success and, to date, the two presses have together accrued enough library funding to produce 10+ new OA monographs. Opening the Future continues to grow with both publishers. …”

Design and development of an open-source framework for citizen-centric environmental monitoring and data analysis | Scientific Reports

Abstract:  Cities around the world are struggling with environmental pollution. The conventional monitoring approaches are not effective for undertaking large-scale environmental monitoring due to logistical and cost-related issues. The availability of low-cost and low-power Internet of Things (IoT) devices has proved to be an effective alternative to monitoring the environment. Such systems have opened up environment monitoring opportunities to citizens while simultaneously confronting them with challenges related to sensor accuracy and the accumulation of large data sets. Analyzing and interpreting sensor data itself is a formidable task that requires extensive computational resources and expertise. To address this challenge, a social, open-source, and citizen-centric IoT (Soc-IoT) framework is presented, which combines a real-time environmental sensing device with an intuitive data analysis and visualization application. Soc-IoT has two main components: (1) CoSense Unit—a resource-efficient, portable and modular device designed and evaluated for indoor and outdoor environmental monitoring, and (2) exploreR—an intuitive cross-platform data analysis and visualization application that offers a comprehensive set of tools for systematic analysis of sensor data without the need for coding. Developed as a proof-of-concept framework to monitor the environment at scale, Soc-IoT aims to promote environmental resilience and open innovation by lowering technological barriers.


OpenAlex: An open and comprehensive index of scholarly works, citations, authors, and institutions

“OpenAlex is a free and open Scientific Knowledge Graph (SKG).  It contains information describing approximately 230M scholarly works, drawn from both structured (eg: Crossref) and unstructured (eg: institutional repositories, publisher websites) sources, clustered/merged into distinct records, and linked by citations. By parsing work metadata and enriching it with external PID sources (ROR, ORCID, ISSN Network, PubMed, Wikidata, etc), OpenAlex describes and links (approximately) 200M author clusters, 100k institutions, and100k venues (journals and repositories). Using a neural-net classifier, we assign one or more of 50k Wikidata concepts to each work. All source code and ML models are available openly, and data is freely available via a high-performance API, a complete database dump, and a search-engine-style web interface. This talk will describe the construction of OpenAlex, compare it to other SKGs (eg Scopus, MAG), and discuss plans for the future.”

Analyzing repositories of OER using web analytics and accessibility tools | SpringerLink

Abstract:  Open Educational Resources (OER) provide learning opportunities for all. Usually, OER and links to OER are curated in Repositories of OER (ROER) for open access and use by anyone, including people with disabilities, at any place at any time. This study analyzes the reputation/ authoritativeness, usage, and accessibility of thirteen popular ROER for teaching and learning using three Web Analytics and five Web Accessibility tools. A high difference among the ROER was observed in almost every metric. Millions of users visit some of these ROER every month and on average stay 2–26 min per visit and view 1.1–8.5 pages per visit. Although in many ROER most of their visitors come from the country where the ROER hosting institute operates, other ROER (such as DOER, MIT OCW, and OpenLearn) have managed to attract visitors from all over the world. In some ROER, their visitors come directly to their website while in a few other ROER visitors are coming after visiting a search engine. Although most ROER are accessible by users with disabilities, the Web Accessibility tools revealed several errors in few ROER. In most ROER, less than one third of the traffic is coming from mobile devices although almost everyone has a mobile phone nowadays. Finally, the study makes suggestions to ROER administrators such as interconnecting their ROER, collaborating, exchanging good practices (such as Commons and MIT OCW), improving their website accessibility and mobile-optimized design, as well as promoting their ROER to libraries, educational institutes, and organizations.


Facilitating open science without sacrificing IP rights: A novel tool for improving replicability of published research: EMBO reports: Vol 0, No 0

“Various factors contribute to the restricted access to materials: avoiding criticism, fear of falsification and retraction, or a desire to stay ahead of peers. Commercial and proprietary concerns also play a significant role in the decision of scientists and organizations to conceal replication materials (Campbell & Bendavid,?2002; Hong & Walsh,?2009). Such motivations are more prominent as the line between academic and commercially oriented research becomes blurred. Nowadays, commercial firms commonly publish in scientific journals, whereas scientists, universities, and research institutions benefit from the commercialization of research findings and often seek patent protection. All of this cultivates an environment of secrecy, in contrast with the scientific tradition of openness and sharing (Merton,?1942)….

Instead of choosing between IP rights and replicability, we suggest an inclusive approach that facilitates replications without depriving scientists of IP rights. Our proposal is to implement a new policy tool: the Conditional Access Agreement (CAA). Recall that it is public access to replication materials that jeopardizes both the prospect of securing patent protection (as novelty and non-obviousness are examined vis-à-vis the public prior art) and trade secret protection (since the pertinent information must be kept out of the public domain). Access, however, does not have to be public. This is precisely the gist of the CAA mechanism—establishing a private, controlled channel of communication for the transfer of replication materials between authors and replicators….

The CAA mechanism would work as follows (Fig?1): When submitting a paper for publication, an author would execute an agreement vis-à-vis the journal, pledging to provide full access to replication materials upon demand. The agreement would specify that anyone requesting access to the materials can only obtain it upon signing a non-disclosure agreement (NDA). Under an NDA, the receiving party commits to use the information disclosed by the other party only for a limited purpose while keeping it confidential. …”

How the OA Switchboard fits into the ecosystem with collaboration and transparency built-in (PART 1)

“As research funders and institutions are expanding open access requirements, and business models become increasingly complex and diverse, how do stakeholders navigate the open access research and publishing maze??

How do publishers support a smooth and compliant author journey and report on open access publication output?

How do funders demonstrate the extent and impact of their research funding and deliver on their commitment to open access?

How do research institutions, and their libraries, connect with their research and simplify their workflows?

The OA Switchboard is a mission-driven, community led initiative designed to simplify the sharing of information between stakeholders about open access publications throughout the whole publication journey. It provides a standardised messaging protocol and shared infrastructure that is designed to operate and integrate with all stakeholder systems, and can help with these challenges above. It is built by and for the people who use it, and is leveraged with existing PIDs.?

Who are these stakeholders, systems and identifiers, and how do we fit into the overall open access (OA) publishing ecosystem?


This post, the first in a series to answer that question, is about the ‘intermediary’ concept….”

Journal transparency – the new Journal Comparison Service from PlanS | Maverick Publishing Specialists

“At a recent STM Association webinar, Robert Kiley, Head of Open Research at the Wellcome Trust, presented an informative overview of the new Journal Comparison Service from PlanS. He stated that the goal of this new tool is to meet the needs of the research community who “have called for greater transparency regarding the services publishers provide and the fees they charge. Many publishers are willing to be responsive to this need, but until now there was no standardised or secure way for publishers to share this information with their customers.” Publishers of scholarly journals are invited to upload data on their journals – one data set for each journal. The cOAlition S Publisher’s Guide  points out that the data is all information that publishers already have in some form, and it will need to be uploaded every year for the previous year.

There are two versions of data that can be supplied and I took a look at the version developed by Information Power (see for the details and an FAQ). There are 34 fields, including basic journal identifiers plus additional information in three broad categories: prices (APC data; subscription prices plus discount policies); editorial data (acceptance rates, peer review times, Counter 5 data); and costs (price and service information)….

As a previous publisher of a portfolio of journals, I know that allocating these kinds of costs back to a specific journal is at best a guesstimate and very unlikely to be accurate and comparable.

The webinar included a contribution from Rod Cookson, CEO of International Water Association (IWA) Publishing.  Rod has been an advocate for transparency and helped to create the tool kit for publishers who want to negotiate transformative agreements ( Rod reported that it had taken 6 people 2-3 months to gather the data to complete the 34 fields in the comparison tool.  IWA Publishing publishes 14 journals….”


Frontiers | neuPrint: An open access tool for EM connectomics

Abstract:  Due to advances in electron microscopy and deep learning, it is now practical to reconstruct a connectome, a description of neurons and the chemical synapses between them, for significant volumes of neural tissue. Smaller past reconstructions were primarily used by domain experts, could be handled by downloading data, and performance was not a serious problem. But new and much larger reconstructions upend these assumptions. These networks now contain tens of thousands of neurons and tens of millions of connections, with yet larger reconstructions pending, and are of interest to a large community of non-specialists. Allowing other scientists to make use of this data needs more than publication—it requires new tools that are publicly available, easy to use, and efficiently handle large data. We introduce neuPrint to address these data analysis challenges. Neuprint contains two major components—a web interface and programmer APIs. The web interface is designed to allow any scientist worldwide, using only a browser, to quickly ask and answer typical biological queries about a connectome. The neuPrint APIs allow more computer-savvy scientists to make more complex or higher volume queries. NeuPrint also provides features for assessing reconstruction quality. Internally, neuPrint organizes connectome data as a graph stored in a neo4j database. This gives high performance for typical queries, provides access though a public and well documented query language Cypher, and will extend well to future larger connectomics databases. Our experience is also an experiment in open science. We find a significant fraction of the readers of the article proceed to examine the data directly. In our case preprints worked exactly as intended, with data inquiries and PDF downloads starting immediately after pre-print publication, and little affected by formal publication later. From this we deduce that many readers are more interested in our data than in our analysis of our data, suggesting that data-only papers can be well appreciated and that public data release can speed up the propagation of scientific results by many months. We also find that providing, and keeping, the data available for online access imposes substantial additional costs to connectomics research.


The State of Unpaywall: Analyzing the Consistency of Open Access Data | Zenodo

Abstract:  These result highlight the difficulties of identifying the open access status of a publication. Especially for less rigid OA subgroups like Hybrid and Bronze OA the classification task is a process of iterating over improved algorithms. Generally, it can be assumed that these iterations lead towards a more accurate reflection of the true OA status. This process, however, has implications for the academic users of Unpaywall data. Studies that use these data to analyze OA status and especially OA subgroups should be aware that the reliability of the data and reproducibility of the results are dependent on time and infrastructural design choice. This observation poses essential background information for OA studies that rely on Unpaywall data at a single point in time.

For the OA transformation, the results also highlight the importance of author-choice based contributions. Publisher-choice based contributions appear to be harder to identify but also volatile in their status over time. For Open Access studies, these findings provide empirical reasons for caution when including data on Bronze OA into their analysis. For the OA transformation in general, the findings highlight authors as the key contributors to a successful transformation.

OA Publishing Platform | Scholarly Community | ReachOA

“Amnet, through its collaboration with Coko, has developed an OA publishing platform ReachOA, which is aimed at addressing present scholarly publishing challenges and at helping the scholarly community do more with less. Our focus is to offer the best open access tools for the scholarly publishers the world over and to help members explore the potential of open access journals and take advantage of them.

ReachOA powered by Kotahi, is a full-featured modern publishing platform designed to digitize content publishing including science journals, micropublications, preprints and more. Its range of rich features and optimized workflow simplify the end-to-end process, making it easy for article publishers to create, review, revise, and publish their works….”