An open dataset of scholars on Twitter

Abstract:  The role played by research scholars in the dissemination of scientific knowledge on social media has always been a central topic in social media metrics (altmetrics) research. Different approaches have been implemented to identify and characterize active scholars on social media platforms like Twitter. Some limitations of past approaches were their complexity and, most importantly, their reliance on licensed scientometric and altmetric data. The emergence of new open data sources like OpenAlex or Crossref Event Data provides opportunities to identify scholars on social media using only open data. This paper presents a novel and simple approach to match authors from OpenAlex with Twitter users identified in Crossref Event Data. The matching procedure is described and validated with ORCID data. The new approach matches nearly 500,000 matched scholars with their Twitter accounts with a level of high precision and moderate recall. The dataset of matched scholars is described and made openly available to the scientific community to empower more advanced studies of the interactions of research scholars on Twitter.

OpenAlex: An open and comprehensive index of scholarly works, citations, authors, and institutions

“OpenAlex is a free and open Scientific Knowledge Graph (SKG).  It contains information describing approximately 230M scholarly works, drawn from both structured (eg: Crossref) and unstructured (eg: institutional repositories, publisher websites) sources, clustered/merged into distinct records, and linked by citations. By parsing work metadata and enriching it with external PID sources (ROR, ORCID, ISSN Network, PubMed, Wikidata, etc), OpenAlex describes and links (approximately) 200M author clusters, 100k institutions, and100k venues (journals and repositories). Using a neural-net classifier, we assign one or more of 50k Wikidata concepts to each work. All source code and ML models are available openly, and data is freely available via a high-performance API, a complete database dump, and a search-engine-style web interface. This talk will describe the construction of OpenAlex, compare it to other SKGs (eg Scopus, MAG), and discuss plans for the future.”

News & Views: Publishers and Market Consolidation – Part 2 of 2 – Delta Think

Last month we examined the large degree of consolidation in journals publishing. We saw that 95% of publishers publish 10 journals or fewer, but account for barely one fifth of articles published. Meanwhile, half of total scholarly output is published by just 10 publishers, those with the largest numbers of journals.

We can further analyze the market’s consolidation by comparing annual growth rates in the numbers of publishers, journals and articles….

By looking at the trends, some clear patterns emerge.

The numbers of publishers (in blue) grew more quickly in the mid-teens than before or since. This is consistent with the S-shaped curve in the numbers of publishers we noted last month. So it seems the market showed signs of fragmentation in the mid-teens, followed by consolidation more recently.
Growth in numbers of journals (in orange) accelerated until about 2017, then started to fall off. This happened in tandem with the slowing growth in the numbers of publishers.
The rate of growth in numbers of articles (in grey) seems to run counter to the trends above. On average it was flat (at around 5%-6%) until 2018/2019, but then it accelerated. We think much this is because of the unusually high levels of submission in the wake of COVID (as we discussed in our market sizing analysis last year)….

The data also suggest that growth in publisher and journal numbers has slowed, while growth in output has accelerated. Over the last few years – irrespective of Covid effects – it seems the larger publishers are producing larger journals, and the smaller publishers smaller ones. Larger organizations may be able to produce things more efficiently than smaller ones. Meanwhile, the rise of Open Access and reduction in reliance on print works removes constraints on publication sizes….”

August OpenCon Library Community Call on Using the OpenAlex API | August 9th, 2022

“Inspired by the ancient Library of Alexandria, OpenAlex indexes the world of scholarly research, including works, citations, authors, journals, and institutions. OpenAlex data is completely free and open to all via a web interface, API, and database snapshot. Join us to learn how to use the OpenAlex API for your scholcomm research needs. OpenAlex was created by OurResearch, a nonprofit that makes open scholarly infrastructure including Unpaywall (an index of the world’s Open Access research literature) and Unsub (a tool to help librarians eliminate toll-access journal subscriptions). …”

News & Views: Publishers and Market Consolidation – Part 1 of 2 – Delta Think

“That our market is highly consolidated is probably not surprising. But the extent of the polarization – and the length of the long tail – might be. Half of total scholarly output is published by just 10 publishers, each of whom publish 400 or more journals. 80% of that is accounted for by the top 5.”

How open are hybrid journals included in nationwide transformative agreements in Germany?

We present hoaddata, an experimental R package that combines open scholarly data from the German Open Access Monitor, Crossref and OpenAlex. Using this package, we illustrate the progress made in publishing open access content in hybrid journals included in nationwide transformative agreements in Germany across journal portfolios and countries.

About The Lens » Release 8.5

“With this release, we are pleased to announce the initial integration of OpenAlex data into The Lens. Developed by the team at OurResearch, who also provide UnPaywall, ImpactStory and other open tools for the research community, OpenAlex was initiated to provide a replacement for Microsoft Academic Graph (MAG, see The Lens Scholarly MetaRecord Strategy: Beyond Microsoft Academic Graph).

In this initial phase of OpenAlex integration, we have started ingesting the additional scholarly works that were not present in MAG, as well as supplementing some of the metadata gaps left after the retirement of MAG including Fields of Study and Open Access information. This has resulted in the addition of nearly 6M records in The Lens now including OpenAlex identifiers.

In future phases, we will be expanding the coverage of OpenAlex in The Lens as the OpenAlex dataset matures and the MetaRecord merging logic is established….

With the addition of OpenAlex, we have also added open access information from OpenAlex as a new open access data source (e.g. open_access.source:openalex). Still in beta, open access information from OpenAlex will be merged with open access evidence from other sources to improve open access information. The data sources for open access evidence include: doaj, pmc-nih, core, unpaywall, openalex and rxiv….”

New OpenAlex API features! – OurResearch blog

“We’ve got a ton of great API improvements to report! If you’re an API user, there’s a good chance there’s something in here you’re gonna love.

Search

You can now search both titles and abstracts. We’ve also implemented stemming, so a search for “frogs” now automatically gets your results mentioning “frog,” too. Thanks to these changes, searches for works now deliver around 10x more results. This can all be accessed using the new search query parameter.

 

New entity filters

We’ve added support for tons of new filters, which are documented here. You can now:

get all of a work’s outgoing citations (ie, its references section) with a single query. 
search within each work’s raw affiliation data to find an arbitrary string (eg a specific department within an organization)
filter on whether or not an entity has a canonical external ID (works: has_doi, authors: has_orcid, etc) ….”

Massive open index of scholarly papers launches

“An ambitious free index of more than 200 million scientific documents that catalogues publication sources, author information and research topics, has been launched.

The index, called OpenAlex after the ancient Library of Alexandria in Egypt, also aims to chart connections between these data points to create a comprehensive, interlinked database of the global research system, say its founders. The database, which launched on 3 January, is a replacement for Microsoft Academic Graph (MAG), a free alternative to subscription-based platforms such as Scopus, Dimensions and Web of Science that was discontinued at the end of 2021.

“It’s just pulling lots of databases together in a clever way,” says Euan Adie, founder of Overton, a London-based firm that tracks the research cited in policy documents. Overton had been getting its data from various sources, including MAG, ORCID, Crossref and directly from publishers, but has now switched to using only OpenAlex, in the hope of making the process easier….”

OpenAlex launch! – OurResearch blog

“OpenAlex launched this week! (January 3rd 2022 for those reading from the future)

As expected:

We’re now pulling in new content on our own. Until now, we’ve been getting new works, authors, and other entities from MAG. Now that MAG is gone, we’re gathering all of our own data from the big wide internet.

The new REST API is launched! This is a much faster and easier way to access the OpenAlex database than downloading and installing the snapshot. It’s completely open and free–you don’t even need a user account or token.

We’ve now got oodles of new documentation here: https://docs.openalex.org/

Slight change of plan:

The MAG Format snapshot is now hosted for free, thanks to the AWS Open Data program. This will cover the data transfer fees (which turned out to be $70!) so you don’t have to. Here are the new instructions on how to download the MAG format snapshot to your machine.

We are extending the beta period for OpenAlex; we’ll emerge from beta in February. This is mostly in response to discovering issues with the coverage and structure of existing data sources including MAG. Extending the beta reflects the fact that the data will improve significantly between now and February.

Huge exciting news:

OpenAlex was built to offer a drop-in replacement for MAG. We’re doing that. But today, we’re also unveiling some moves toward a more innovative future for Openalex:

We’ve now built around a simple new five-entity model: works, authors, venues (journals and repositories), institutions, and concepts. Everything in OpenAlex is one of these entities, or a connection between them. Each type of entity has its own API endpoint.

We’ve got a new Standard Format for the snapshot, one that’s closely tied to both the five-entity model the API. In the future, this will become the only supported format. The MAG format is now deprecated and will go away on July 1, 2022. …”