Starting fresh: the Wikibase REST API – Wikimedia Tech News

“Wikidata has 10 years under its big belt; Wikibase, the free knowledge graph software that powers Wikidata, has been a product in its own right for almost half that time. But despite Wikidata’s footprint and Wikibase’s powerful functionality, the various API calls for both products have been shoehorned for almost all that time into the venerable Action API. 

That state of affairs has recently changed—for the better. Starting in 2022, Wikimedia Deutschland began work on a Wikibase-specific REST API. Why? We saw a growing need to expose Wikibase’s functionality through a modern, RESTful, OpenAPI-compliant interface that’s fully dedicated to Wikibase. As of this writing, we’ve made a promising start and implemented a lot of basic calls already, as you can see in our OpenAPI documentation. If you’re running your own Wikibase, you can easily enable the new REST API for your Wikibase instance….”

Velez-Estevez et. al. (2023) New trends in bibliometric APIs: A comparative analysis | Information Processing & Management

Velez-Estevez, A., I. J. Perez, P. García-Sánchez, J. A. Moral-Munoz, and M. J. Cobo. ‘New Trends in Bibliometric APIs: A Comparative Analysis’. Information Processing & Management 60, no. 4 (1 July 2023): 103385. https://doi.org/10.1016/j.ipm.2023.103385.

Abstract:

The science of science practice requires the analysis of large and complex bibliometric data. Traditional data exporting from companies’ websites is not sufficient, so APIs are used to access a larger corpus. Therefore, this study aims not only to establish a taxonomy but also to offer a comparative analysis of 44 bibliographic APIs from various non-profit and commercial organizations, analyzing their characteristics and metadata with descriptive analysis, their possible bibliometric analyses, and the interoperability of the APIs across four different data categories: general, content, search, and query modes. The study found that Clarivate Analytics and Elsevier offer highly versatile APIs, while non-profit organizations, such as OpenCitations and OurResearch promote the Open Science philosophy. Most organizations offer free access to APIs for non-commercial purposes, but some have limitations on metadata retrieval. However, CrossRef, OpenCitations, or OpenAlex have no restrictions on the metadata retrieval. Co-author analysis using author names and bibliometric evaluation using citations are the types of analyses that can be done with the data provided by most APIs. DOI, PubMedID, and PMCID are the most versatile identifiers for extending metadata in the APIs. Semantic Scholar, Dimensions, ORCID, and Embase are the APIs that offer the most extensibility. Considering the obtained results, there is no single API that gathers all the information needed to perform any bibliometric analysis. Combining two or more APIs may be the most appropriate option to cover as much information as possible and enrich reports and analyses. This study contributes to advancing the understanding and use of APIs in research practice.

Restricting Reddit Data Access Threatens Online Safety & Public-Interest Research

“Last week, soon after Reddit announced plans to restrict free access to the Reddit API, the company cut off access to Pushshift, a data resource widely used by communities, journalists, and thousands of academics worldwide (see Pushshift’s official response).

We are writing to express concern about this sudden disruption to critical resources, and the uncertainty about the future it has created. We are asking for clarification and a meeting about the best ways to restore essential functionality for the communities that power your platform and the researchers who rely on your platform for essential public-interest work. To support that dialogue, we are coordinating a survey of the impact.

By preventing communities from accessing the very data they generate, Reddit has severely disrupted the safety and functionality of your platform. As you know, Reddit relies on volunteers to create moderation technologies and to do moderation labor that costs your competitors hundreds of millions of dollars per year. Tens of thousands of volunteers protect children’s safety, manage sensitive mental health support, and mediate some of the world’s largest conversation spaces for constructive civic discourse.

To succeed at their role, these unpaid leaders and workers need to access historical and contemporary community data to moderate a conversation space with over 1.5 billion active users. For many years, Reddit has relied on volunteer labor and computing infrastructure from Pushshift to provide communities with essential data services. You have now cut that off without warning to communities and haven’t offered alternatives, which will degrade safety protections across Reddit….”

Some Misconceptions about Software in the Copyright Literature by Joshua Bloch, Pamela Samuelson :: SSRN

Abstract:  The technical complexity and functionality of computer programs have made it difficult for courts to apply conventional copyright concepts, such as the idea/expression distinction, in the software copyright case law. This has created fertile ground for significant misconceptions. In this paper, we identify fourteen such misconceptions that arose during the lengthy course of the Google v. Oracle litigation. Most of these misconceptions concern application programming interfaces (APIs). We explain why these misconceptions were strategically significant in Oracle’s lawsuit, rebut them, and urge lawyers and computer scientists involved in software copyright litigation to adopt and insist on the use of terminology that is technically sound and unlikely to perpetuate these misconceptions.

 

Shutting down our preprint bots | Feb 21, 2023 | Liberate Science

“We started running Twitter bots in 2017, when Liberate Science was only a side project. First we launched the PsyArxiv bot. Later, we launched bots for the MetaArxiv (2020) and EdArxiv (2021) preprint servers. Six years in, we are shutting down these Twitter bots. You may have already noticed they are no longer posting any new preprints since February 13th (previously 9th). There are several things that motivate us to stop the preprint bots’ operations. It includes the exodus from Twitter overall; it includes the recent announcement that Twitter API access is no longer free. It includes that the community has taken it upon itself to offer replacement bots on Mastodon.?? We offered preprint bots for free all these years, but that does not mean it was free to run this. We had to run a custom RSS feed service (based on Jeff Spies’ osfpreprints-feed; run on Glitch for $99/year). Automating a bot is free and easy if there is relatively little volume. Especially for PsyArxiv, the amount of preprints grew so rapidly that we had to upgrade our automation and costs went up to ~$600 per year (using Zapier). This is also why the 1,500 free post limit proves too uncertain in the long run….”

Home – Data Commons

“Publicly available data from open sources (census.gov, cdc.gov, data.gov, etc.) are vital resources for students and researchers in a variety of disciplines. Unfortunately, processing these datasets is often tedious and cumbersome. Organizations follow distinctive practices for codifying datasets. Combining data from different sources requires mapping common entities (city, county, etc.) and resolving different types of keys/identifiers. This process is time consuming, tedious and done over and over. Our goal with Data Commons is to address this problem.

Data Commons synthesizes a single graph from these different data sources. It links references to the same entities (such as cities, counties, organizations, etc.) across different datasets to nodes on the graph, so that users can access data about a particular entity aggregated from different sources without data cleaning or joining. We hope the data contained within Data Commons will be useful to students, researchers, and enthusiasts across different disciplines….

Data Commons can be accessed by anyone via the tools available on datacommons.org. Students, researchers and developers can use the REST, Python and Google Sheets APIs, all of which are free for educational, academic and journalistic research purposes….”

August OpenCon Library Community Call on Using the OpenAlex API | August 9th, 2022

“Inspired by the ancient Library of Alexandria, OpenAlex indexes the world of scholarly research, including works, citations, authors, journals, and institutions. OpenAlex data is completely free and open to all via a web interface, API, and database snapshot. Join us to learn how to use the OpenAlex API for your scholcomm research needs. OpenAlex was created by OurResearch, a nonprofit that makes open scholarly infrastructure including Unpaywall (an index of the world’s Open Access research literature) and Unsub (a tool to help librarians eliminate toll-access journal subscriptions). …”

New OpenAlex API features! – OurResearch blog

“We’ve got a ton of great API improvements to report! If you’re an API user, there’s a good chance there’s something in here you’re gonna love.

Search

You can now search both titles and abstracts. We’ve also implemented stemming, so a search for “frogs” now automatically gets your results mentioning “frog,” too. Thanks to these changes, searches for works now deliver around 10x more results. This can all be accessed using the new search query parameter.

 

New entity filters

We’ve added support for tons of new filters, which are documented here. You can now:

get all of a work’s outgoing citations (ie, its references section) with a single query. 
search within each work’s raw affiliation data to find an arbitrary string (eg a specific department within an organization)
filter on whether or not an entity has a canonical external ID (works: has_doi, authors: has_orcid, etc) ….”

Usability and Accessibility of Publicly Available Patient Sa… : Journal of Patient Safety

Abstract:  Objectives 

The aims of the study were to identify publicly available patient safety report databases and to determine whether these databases support safety analyst and data scientist use to identify patterns and trends.

Methods 

An Internet search was conducted to identify publicly available patient safety databases that contained patient safety reports. Each database was analyzed to identify features that enable patient safety analyst and data scientist use of these databases.

Results 

Seven databases (6 hosted by federal agencies, 1 hosted by a nonprofit organization) containing more than 28.3 million safety reports were identified. Some, but not all, databases contained features to support patient safety analyst use: 57.1% provided the ability to sort/compare/filter data, 42.9% provided data visualization, and 85.7% enabled free-text search. None of the databases provided regular updates or monitoring and only one database suggested solutions to patient safety reports. Analysis of features to support data scientist use showed that only 42.9% provided an application programing interface, most (85.7%) provided batch downloading, all provided documentation about the database, and 71.4% provided a data dictionary. All databases provided open access. Only 28.6% provided a data diagram.

Conclusions 

Patient safety databases should be improved to support patient safety analyst use by, at a minimum, allowing for data to be sorted/compared/filtered, providing data visualization, and enabling free-text search. Databases should also enable data scientist use by, at a minimum, providing an application programing interface, batch downloading, and a data dictionary.

Analyzing Institutional Publishing Output-A Short Course – Google Docs

“This short course provides training materials about how to create a set of publication data, gather additional information about the data through an API (Application Programming Interface), clean the data, and analyze the data in various ways. The API that we’ll use is from Unpaywall and helps gather information related to the open access (OA) status of the item. This short course was created for the Scholarly Communication Notebook. If open access is new to you, we recommend checking out Peter Suber’s book Open Access. It’s concise and well written. Although things have changed since it was published in 2012, it’s a great place to start….”