“Justin Barrett, Lead Machine Learning Engineer for OpenAlex at OurResearch, talks with ROR Technical Community Manager Amanda French and ROR Curation Lead Adam Buttrick about using ROR both as an identifier for institutions in OpenAlex and as a dataset for training machine learning models that enrich OpenAlex metadata….”
Abstract: As part of the data-driven paradigm and open science movement, the data paper is becoming a popular way for researchers to publish their research data, based on academic norms that cross knowledge domains. Data journals have also been created to host this new academic genre. The growing number of data papers and journals has made them an important large-scale data source for understanding how research data is published and reused in our research system. One barrier to this research agenda is a lack of knowledge as to how data journals and their publications are indexed in the scholarly databases used for quantitative analysis. To address this gap, this study examines how a list of 18 exclusively data journals (i.e., journals that primarily accept data papers) are indexed in four popular scholarly databases: the Web of Science, Scopus, Dimensions, and OpenAlex. We investigate how comprehensively these databases cover the selected data journals and, in particular, how they present the document type information of data papers. We find that the coverage of data papers, as well as their document type information, is highly inconsistent across databases, which creates major challenges for future efforts to study them quantitatively. As a result, we argue that efforts should be made by data journals and databases to improve the quality of metadata for this emerging genre.
“The Curtin Open Knowledge Initiative has just released a long awaited update to the Open Access dashboard. The migration from Microsoft Academic Graph (MAG) to OpenAlex is now complete. The (currently available) data for research outputs published in 2022 has also been released.
The Open Access dashboard provides information on the OA status of research outputs by country and by institution. At the core of this is assigning research outputs to institutions and MAG and now OpenAlex are our core source for this. The top level message is that OpenAlex is offering a big jump forward in coverage and completeness, and we know the team there are working hard on making it even better. Currently we’ve seen a huge improvement in the tracking of open access outputs in our dashboard, with 14,477 institutions covered, up from 7,701 previously.
The big good news story is the increase in the countries covered, with 221 now included, up from 189 previously. In particular this has seen a big increase in our coverage of African countries with an additional 16 countries now included, which is exciting given the inclusion of the COKI dashboard as a source on the AfricArXiv country pages.
There is a lot of detail to work through, and below we dig into the details. If you’re more keen to go straight to the data you can check out the main Open Access Website and we’ve set up a comparison dashboard that will allow you to compare the differences for your country or institution….”
New research from the Max Planck Institute for Demographic Research analyzes global migration of scholars, using bibliometric data. They do a side-by-side comparison of this analysis between Scopus and OpenAlex data.
Counts of scholars by country are highly correlated between Scopus and OpenAlex.
Migration events are less correlated between the two, but trends in migration between top pairs of countries are consistent between them. There is higher correlation with Western countries, and OpenAlex has more coverage of non-Western countries.
OpenAlex is open. Scopus is not. This puts limits on how researchers can perform and share this type of analysis….”
“openalexR helps you interface with the OpenAlex API to retrieve bibliographic infomation about publications, authors, venues, institutions and concepts with 4 main functions:
oa_query(): generates a valid query, written following the OpenAlex API syntax, from a set of arguments provided by the user.
oa_request(): downloads a collection of entities matching the query created by oa_query() or manually written by the user, and returns a JSON object in a list format.
oa2df(): converts the JSON object in classical bibliographic tibble/data frame.
oa_fetch(): composes three functions above so the user can execute everything in one step, i.e., oa_query |> oa_request |> oa2df…”
“It’s a new year and at OurResearch we’re starting off 2023 full steam ahead! We’ve revamped the OpenAlex documentation so that it’s easier to get started, and easier to find the fields and filters that are available in the OpenAlex API. It should take less “clicks” to find what you need. Poised for growth The major change we made was to highlight the core entities (works, authors, etc) in OpenAlex, giving them their own up-front space. OpenAlex grew considerably in 2022, not only in number records, but also by the number of ways that you can filter, group, and search scholarly data. This new approach provides more room to add and document filters. We can better describe the unique search capabilities available in each entity. Overall, it sets us up to grow again in 2023….”
Abstract: By analyzing 25,671 journals largely absent from common journal counts, as well as Web of Science and Scopus, this study demonstrates that scholarly communication is more of a global endeavor than is commonly credited. These journals, employing the open source publishing platform Open Journal Systems (OJS), have published 5.8 million items; they are in 136 countries, with 79.9% in the Global South and 84.2% following the OA diamond model (charging neither reader nor author). A substantial proportion of journals operate in more than one language (48.3%), with research published in a total of 60 languages (led by English, Indonesian, Spanish, and Portuguese). The journals are distributed across the social sciences (45.9%), STEM (40.3%), and the humanities (13.8%). For all their geographic, linguistic, and disciplinary diversity, 1.2% are indexed in the Web of Science and 5.7% in Scopus. On the other hand, 1.0% are found in Cabells Predatory Reports, while 1.4% show up in Beall’s questionable list. This paper seeks to both contribute and historically situate expanded scale and diversity of scholarly publishing in the hope that this recognition may assist humankind in taking full advantage of what is increasingly a global research enterprise.
Abstract: The role played by research scholars in the dissemination of scientific knowledge on social media has always been a central topic in social media metrics (altmetrics) research. Different approaches have been implemented to identify and characterize active scholars on social media platforms like Twitter. Some limitations of past approaches were their complexity and, most importantly, their reliance on licensed scientometric and altmetric data. The emergence of new open data sources like OpenAlex or Crossref Event Data provides opportunities to identify scholars on social media using only open data. This paper presents a novel and simple approach to match authors from OpenAlex with Twitter users identified in Crossref Event Data. The matching procedure is described and validated with ORCID data. The new approach matches nearly 500,000 matched scholars with their Twitter accounts with a level of high precision and moderate recall. The dataset of matched scholars is described and made openly available to the scientific community to empower more advanced studies of the interactions of research scholars on Twitter.
“OpenAlex is a free and open Scientific Knowledge Graph (SKG). It contains information describing approximately 230M scholarly works, drawn from both structured (eg: Crossref) and unstructured (eg: institutional repositories, publisher websites) sources, clustered/merged into distinct records, and linked by citations. By parsing work metadata and enriching it with external PID sources (ROR, ORCID, ISSN Network, PubMed, Wikidata, etc), OpenAlex describes and links (approximately) 200M author clusters, 100k institutions, and100k venues (journals and repositories). Using a neural-net classifier, we assign one or more of 50k Wikidata concepts to each work. All source code and ML models are available openly, and data is freely available via a high-performance API, a complete database dump, and a search-engine-style web interface. This talk will describe the construction of OpenAlex, compare it to other SKGs (eg Scopus, MAG), and discuss plans for the future.”
Last month we examined the large degree of consolidation in journals publishing. We saw that 95% of publishers publish 10 journals or fewer, but account for barely one fifth of articles published. Meanwhile, half of total scholarly output is published by just 10 publishers, those with the largest numbers of journals.
We can further analyze the market’s consolidation by comparing annual growth rates in the numbers of publishers, journals and articles….
By looking at the trends, some clear patterns emerge.
The numbers of publishers (in blue) grew more quickly in the mid-teens than before or since. This is consistent with the S-shaped curve in the numbers of publishers we noted last month. So it seems the market showed signs of fragmentation in the mid-teens, followed by consolidation more recently.
Growth in numbers of journals (in orange) accelerated until about 2017, then started to fall off. This happened in tandem with the slowing growth in the numbers of publishers.
The rate of growth in numbers of articles (in grey) seems to run counter to the trends above. On average it was flat (at around 5%-6%) until 2018/2019, but then it accelerated. We think much this is because of the unusually high levels of submission in the wake of COVID (as we discussed in our market sizing analysis last year)….
The data also suggest that growth in publisher and journal numbers has slowed, while growth in output has accelerated. Over the last few years – irrespective of Covid effects – it seems the larger publishers are producing larger journals, and the smaller publishers smaller ones. Larger organizations may be able to produce things more efficiently than smaller ones. Meanwhile, the rise of Open Access and reduction in reliance on print works removes constraints on publication sizes….”
“Inspired by the ancient Library of Alexandria, OpenAlex indexes the world of scholarly research, including works, citations, authors, journals, and institutions. OpenAlex data is completely free and open to all via a web interface, API, and database snapshot. Join us to learn how to use the OpenAlex API for your scholcomm research needs. OpenAlex was created by OurResearch, a nonprofit that makes open scholarly infrastructure including Unpaywall (an index of the world’s Open Access research literature) and Unsub (a tool to help librarians eliminate toll-access journal subscriptions). …”
“That our market is highly consolidated is probably not surprising. But the extent of the polarization – and the length of the long tail – might be. Half of total scholarly output is published by just 10 publishers, each of whom publish 400 or more journals. 80% of that is accounted for by the top 5.”
We present hoaddata, an experimental R package that combines open scholarly data from the German Open Access Monitor, Crossref and OpenAlex. Using this package, we illustrate the progress made in publishing open access content in hybrid journals included in nationwide transformative agreements in Germany across journal portfolios and countries.
“With this release, we are pleased to announce the initial integration of OpenAlex data into The Lens. Developed by the team at OurResearch, who also provide UnPaywall, ImpactStory and other open tools for the research community, OpenAlex was initiated to provide a replacement for Microsoft Academic Graph (MAG, see The Lens Scholarly MetaRecord Strategy: Beyond Microsoft Academic Graph).
In this initial phase of OpenAlex integration, we have started ingesting the additional scholarly works that were not present in MAG, as well as supplementing some of the metadata gaps left after the retirement of MAG including Fields of Study and Open Access information. This has resulted in the addition of nearly 6M records in The Lens now including OpenAlex identifiers.
In future phases, we will be expanding the coverage of OpenAlex in The Lens as the OpenAlex dataset matures and the MetaRecord merging logic is established….
With the addition of OpenAlex, we have also added open access information from OpenAlex as a new open access data source (e.g. open_access.source:openalex). Still in beta, open access information from OpenAlex will be merged with open access evidence from other sources to improve open access information. The data sources for open access evidence include: doaj, pmc-nih, core, unpaywall, openalex and rxiv….”
OpenAlex is a new, fully-open scientific knowledge graph (SKG), launched to replace the discontinued Microsoft Academic Graph (MAG). It contains metadata for 209M works (journal articles, books, etc); 2013M disambiguated authors; 124k venues (places that host works, such as journals and online repositories); 109k institutions; and 65k Wikidata concepts (linked to works via an automated hierarchical multi-tag classifier). The dataset is fully and freely available via a web-based GUI, a full data dump, and high-volume REST API. The resource is under active development and future work will improve accuracy and coverage of citation information and author/institution parsing and deduplication.