Exploring the Dimensions of Scientific Impact: A Comprehensive Bibliometric Analysis Investigating the Influence of Gender, Mobility, and Open Access

Abstract: The Science of Science field advances the measurement, evaluation, and prediction of scientific outcomes through the study of extensive scholarly data. For these purposes, bibliometrics is an appropriate approach that studies large volumes of scientific data using mathematical and statistical methods, and is widely used to assess the impact of papers and authors within a specific field or community. However, conducting bibliometric analyses poses several methodological, technical, and informational challenges (e.g., collecting and cleaning data, calculating indicators) which need to be addressed. This thesis aims to tackle some of these challenges and shed light on the factors influencing scientific impact, specifically focusing on open access publishing, international mobility, and influential factors on the h-index. This thesis tackles methodological contributions, such as author disambiguation and co-authorship network analysis, as they provide insights into methodological and informational challenges within bibliometric analysis. Another methodological challenge addressed in this research is the inference of gender for a significant number of authors to obtain gender-related insights. By employing gender inference techniques, the research explores gender as an influential factor in scientific impact, shedding light on potential gender inequalities within the scholarly community. The research employs a bibliometric approach and utilizes mainly Scopus, a comprehensive dataset encompassing various disciplines to make the following contributions:

• We explore the impact of publishing behavior, particularly the adoption of open access practices, on knowledge dissemination and scholarly communication. With this intention, we investigate the impact of journals flipping from closed access to open access publishing models [74]. Changes in publication volumes and citation impact are analyzed, demonstrating an overall increase in publication output and improved citation metrics following the transition to open access. However, the magnitude of changes varies across scientific disciplines. In another study [76], we utilize a dataset of articles published by Springer Nature and employ correlation and regression analyses to examine the relationship between authors’ country affiliations, publishing models, and citation impact. Utilizing machine learning approach, we estimate the publishing model of papers based on different factors. The findings reveal different patterns in authors’ choices of publishing models based on income levels, availability of Article Processing Charges waivers, and journal rank. The study highlights potential inequalities in access to open access publishing and its citation advantage.

• We investigate the association between scholars’ mobility patterns, socio-demographic characteristics, and their scientific activity and impact. By utilizing network and regression analyses, along with various statistical techniques, we investigate the international mobility of researchers. Furthermore, we conduct a comparative analysis of scientific outcomes, considering factors such as publications, citations, and measures of co-authorship network centrality. The findings reveal gender inequalities in mobility across scientific fields and countries and positive correlations between mobility and scientific success.

• Centered on the prediction of scholars’ h-index as a metric of scientific impact, another one of our studies [77] employs machine learning techniques. We examine author, coauthorship, paper, and venue-specific characteristics, in addition to prior impact-based features. The results emphasize the significance of non-prior impact-based features, particularly for early-career scholars in the long term, while also revealing the limited influence of gender on h-index prediction. 

The findings of this research hold implications for researchers, academic institutions, and policymakers aiming to advance scientific knowledge and foster equitable practices. By unviii covering the influential factors that shape scientific impact and addressing potential gender disparities, this research contributes to the broader objective of promoting diversity, inclusivity, and excellence within the scholarly community. 

A Bibliometric Study of Open Educational Resources, Open Textbooks, and Academic Librarianship | Journal of Open Educational Resources in Higher Education

Abstract:  Open Educational Resources (OER) play a key role in reducing the financial burden and increasing the accessibility of learning for students in higher education. OER can be considered an important field of research for academic librarians and supports the democratic mission of academic libraries. This study aimed to track the publication of scholarly literature about OER and higher education from 2002 to 2022 using a bibliometric research methodology. In addition, this research sought to assess the productivity of Library and Information Science (LIS) scholarship on this topic and investigate research trends, like open textbooks. Web of Science (WOS) was searched for publications and the search results were mapped to determine publication productivity, core authors, core journals, and research topics in the scholarly literature about OER and higher education. Research on OER has been steadily increasing since 2002, and this study indicates that research has increased significantly on the topic in the last six years. The data in this study support that most productivity in research on this topic is in the field of Education, but also found a presence of scholarship on the topic in the field of LIS.

[2311.09657] Open Access in Ukraine: characteristics and evolution from 2012 to 2021

This study investigates development of open access (OA) to publications produced by authors affiliated with Ukrainian universities and research organisations in the period 2012-2021. In order to get a comprehensive overview we assembled data from three popular databases: Dimensions, Web of Science (WoS) and Scopus. Our final dataset consisted of 187,135 records. To determine the OA status of each article, this study utilised Unpaywall data which was obtained via API. It was determined that 71.5% of all considered articles during the observed period were openly available at the time of analysis. Our findings show that gold OA was the most prevalent type of OA through a 10 years studied period. We also took a look at how OA varies by research fields, how dominant large commercial publishers are in disseminating national research and the preferences of authors regarding where to self-archive articles versions. We concluded that Ukraine needs to be thoughtful with engagement with large publishers and make sure academics control publishing, not for profit companies, which would monopolise research output distribution, leaving national publishers behind. Beyond that we put a special emphasis on the importance of FAIRness of national scholarly communication infrastructure in monitoring OA uptake.

Wikipedia as a tool for contemporary history of science: A case study on CRISPR | PLOS ONE

Abstract:  Rapid developments and methodological divides hinder the study of how scientific knowledge accumulates, consolidates and transfers to the public sphere. Our work proposes using Wikipedia, the online encyclopedia, as a historiographical source for contemporary science. We chose the high-profile field of gene editing as our test case, performing a historical analysis of the English-language Wikipedia articles on CRISPR. Using a mixed-method approach, we qualitatively and quantitatively analyzed the CRISPR article’s text, sections and references, alongside 50 affiliated articles. These, we found, documented the CRISPR field’s maturation from a fundamental scientific discovery to a biotechnological revolution with vast social and cultural implications. We developed automated tools to support such research and demonstrated its applicability to two other scientific fields–coronavirus and circadian clocks. Our method utilizes Wikipedia as a digital and free archive, showing it can document the incremental growth of knowledge and the manner scientific research accumulates and translates into public discourse. Using Wikipedia in this manner compliments and overcomes some issues with contemporary histories and can also augment existing bibliometric research.


Velez-Estevez et. al. (2023) New trends in bibliometric APIs: A comparative analysis | Information Processing & Management

Velez-Estevez, A., I. J. Perez, P. García-Sánchez, J. A. Moral-Munoz, and M. J. Cobo. ‘New Trends in Bibliometric APIs: A Comparative Analysis’. Information Processing & Management 60, no. 4 (1 July 2023): 103385. https://doi.org/10.1016/j.ipm.2023.103385.


The science of science practice requires the analysis of large and complex bibliometric data. Traditional data exporting from companies’ websites is not sufficient, so APIs are used to access a larger corpus. Therefore, this study aims not only to establish a taxonomy but also to offer a comparative analysis of 44 bibliographic APIs from various non-profit and commercial organizations, analyzing their characteristics and metadata with descriptive analysis, their possible bibliometric analyses, and the interoperability of the APIs across four different data categories: general, content, search, and query modes. The study found that Clarivate Analytics and Elsevier offer highly versatile APIs, while non-profit organizations, such as OpenCitations and OurResearch promote the Open Science philosophy. Most organizations offer free access to APIs for non-commercial purposes, but some have limitations on metadata retrieval. However, CrossRef, OpenCitations, or OpenAlex have no restrictions on the metadata retrieval. Co-author analysis using author names and bibliometric evaluation using citations are the types of analyses that can be done with the data provided by most APIs. DOI, PubMedID, and PMCID are the most versatile identifiers for extending metadata in the APIs. Semantic Scholar, Dimensions, ORCID, and Embase are the APIs that offer the most extensibility. Considering the obtained results, there is no single API that gathers all the information needed to perform any bibliometric analysis. Combining two or more APIs may be the most appropriate option to cover as much information as possible and enrich reports and analyses. This study contributes to advancing the understanding and use of APIs in research practice.

Governance by output reduces humanities scholarship to monologue | Impact of Social Sciences

Drawing on a large-scale comparative study of scholars in the UK and Germany on how pressure to publish is experienced across research careers, Marcel Knöchelmann, argues that the structural incentive to publish inherent to research assessment in the UK shapes a research culture focused on output and monologue at the expense of an engaged public dialogue.


Open Access Advantages as a Function of the Discipline: Mixed-methods Study – ScienceDirect

Abstract:  Purpose

This mixed-methods study integrates bibliometric and altmetric investigation with a qualitative method in order to assess the prevalence and societal-impact of Open-Access (OA) publications, and to reveal the considerations behind researchers’ decision to publish articles in closed and open-access.


The bibliometric-altmetric study analyzed 584 OA and closed publications published between 2014 and 2019 by 40 Israeli researchers: 20 from STEM (Science, Technology, Engineering, Math) and 20 from SSH (Social Sciences and Humanities) discipline. We used a multistage cluster sampling method to select a representative sample for the STEM disciplines group (engineering, computer science, biology, mathematics, and physics), and for the SSH disciplines group (sociology, economics, psychology, political science, and history). Required data were extracted from Scopus and Unpaywall databases, and the PlumX-platform. Among the 40 researchers who were selected for the bibliometric-altmetric study, 20 researchers agreed to be interviewed for this study.


Comparing bibliometrics and altmetrics for the general publications did not reveal any significant differences between OA and closed publications. These were found only when comparing OA and closed publications across disciplines. STEM-researchers published 59 % of their publications in OA, compared to just 29 % among those in SSH, and they received significantly more bibliometric and altmetric citations from SSH OA publications and from their own closed-access publications. The altmetrics findings indicate that researchers are well acquainted and active in social media. However, according to the interviewees, there is no academic contribution for sharing research findings on social-media; it is viewed as a “public-service”. Researchers’ primary consideration for publishing in closed or OA was the journal impact-factor.

Research limitations/implications

Our findings contribute to the increasing body of research that addresses OA citations and societal-impact advantages. The findings suggest the need to adopt an OA-policy after a thorough assessment of the consequences for SSH disciplines.

Bibliometrics Methods in Detecting Citations to Questionable Journals – ScienceDirect

Abstract:  In recent times, there has been a proliferation of questionable practices in research publishing, for example, via predatory journals, hijacked journals, plagiarism, tortured phrases and paper mills. This paper intends to analyse whether journals that had been removed from the Directory of Open Access Journals (DOAJ) in 2018 due to suspected misconduct were cited within journals indexed in the Scopus database. Our analysis showed that Scopus contained over 15 thousand references to the removed journals identified. The majority of the publications citing these journals came from the area of Engineering. It is important to note that although we cannot assume that all the journals removed followed unethical practices, it is still essential that researchers are aware of the issues around citing journals that have been suspected of misconduct. We suggest that research libraries play a crucial role in training, advising and providing information to researchers about these ethical issues of publication malpractice and misconduct.


Laakso (2023) Open access books through open data sources: assessing prevalence, providers, and preservation | Emerald Insight

Laakso, M. (2023), “Open access books through open data sources: assessing prevalence, providers, and preservation”, Journal of Documentation, Vol. ahead-of-print No. ahead-of-print. https://doi.org/10.1108/JD-02-2023-0016


Science policy and practice for open access (OA) books is a rapidly evolving area in the scholarly domain. However, there is much that remains unknown, including how many OA books there are and to what degree they are included in preservation coverage. The purpose of this study is to contribute towards filling this knowledge gap in order to advance both research and practice in the domain of OA books.


This study utilized open bibliometric data sources to aggregate a harmonized dataset of metadata records for OA books (data sources: the Directory of Open Access Books, OpenAIRE, OpenAlex, Scielo Books, The Lens, and WorldCat). This dataset was then cross-matched based on unique identifiers and book titles to openly available content listings of trusted preservation services (data sources: Cariniana Network, CLOCKSS, Global LOCKSS Network, and Portico). The web domains of the OA books were determined by querying the web addresses or digital object identifiers provided in the metadata of the bibliometric database entries.


In total, 396,995 unique records were identified from the OA book bibliometric sources, of which 19% were found to be included in at least one of the preservation services. The results suggest reason for concern for the long tail of OA books distributed at thousands of different web domains as these include volatile cloud storage or sometimes no longer contained the files at all.

Research limitations/implications

Data quality issues, varying definitions of OA across services and inconsistent implementation of unique identifiers were discovered as key challenges. The study includes recommendations for publishers, libraries, data providers and preservation services for improving monitoring and practices for OA book preservation.


This study provides methodological and empirical findings for advancing the practices of OA book publishing, preservation and research.


[2006.14830] Metrics and peer review agreement at the institutional level

Abstract:  In the past decades, many countries have started to fund academic institutions based on the evaluation of their scientific performance. In this context, post-publication peer review is often used to assess scientific performance. Bibliometric indicators have been suggested as an alternative to peer review. A recurrent question in this context is whether peer review and metrics tend to yield similar outcomes. In this paper, we study the agreement between bibliometric indicators and peer review based on a sample of publications submitted for evaluation to the national Italian research assessment exercise (2011–2014). In particular, we study the agreement between bibliometric indicators and peer review at a higher aggregation level, namely the institutional level. Additionally, we also quantify the internal agreement of peer review at the institutional level. We base our analysis on a hierarchical Bayesian model using cross-validation. We find that the level of agreement is generally higher at the institutional level than at the publication level. Overall, the agreement between metrics and peer review is on par with the internal agreement among two reviewers for certain fields of science in this particular context. This suggests that for some fields, bibliometric indicators may possibly be considered as an alternative to peer review for the Italian national research assessment exercise. Although results do not necessarily generalise to other contexts, it does raise the question whether similar findings would obtain for other research assessment exercises, such as in the United Kingdom.


Twenty years of Wikipedia in scholarly publications: a bibliometric network analysis of the thematic and citation landscape | SpringerLink

Abstract:  Wikipedia has grown to be the biggest online encyclopedia in terms of comprehensiveness, reach and coverage. However, although different websites and social network platforms have received considerable academic attention, Wikipedia has largely gone unnoticed. In this study, we fill this research gap by investigating how Wikipedia is used in scholarly publications since its launch in 2001. More specifically, we review and analyze the intellectual structure of Wikipedia’s scholarly publications based on 3790 Web of Science core collection documents written by 10,636 authors from 100 countries over two decades (2001–2021). Results show that the most influential outlets publishing Wikipedia research include journals such as Plos one, Nucleic Acids Research, the Journal of the Association for Information Science and Technology, the Journal of the American Society for Information Science and Technology, IEEE Access, and Information Processing and Management. Results also show that the author collaboration network is very sparsely connected, indicating the absence of close collaboration among the authors in the field. Furthermore, results reveal that the Wikipedia research institutions’ collaboration network reflects a North–South divide as very limited cooperation occurs between developed and developing countries’ institutions. Finally, the multiple correspondence analysis applied to obtain the Wikipedia research conceptual map reveals the breadth, diversity, and intellectual thrust of the Wikipedia’s scholarly publications. Our analysis has far-reaching implications for aspiring researchers interested in Wikipedia research as we retrospectively trace the evolution in research output over the last two decades, establish linkages between the authors and articles, and reveal trending topics/hotspots within the broad theme of Wikipedia research.


Escaping ‘bibliometric coloniality’, ‘epistemic inequality’

“Africa’s scholarly journals compete on an unequal playing field because of a lack of funding and the struggle to sustain academic credibility.

“These inequalities are exacerbated by the growing influence of the major citation indexes, leading to what we have called bibliometric coloniality,” say the authors of the book, Who Counts? Ghanaian academic publishing and global science, published by African Minds at the start of 2023.

“The rules of the game continue to be defined outside the continent. We hope that, in some small way, this book contributes to the renaissance and renewal of African-centred research and publishing infrastructures,” the authors say….”

“Superior identification index – Quantifying the capability of academic journals to recognize good research

Abstract:  In this paper we present “superior identification index” (SII), a metric to quantify the capability of academic journals to recognize top papers restricted by specific time window and study field. Intuitively, SII is the percentage of papers from a journal in the top p% papers in the field. SII provides flexible framework to make trade-offs on journal quality and quantity, as p rises it puts more weight on quantity and less weight on quality. Concerns on the p selection are discussed, and extended metrics of SII, including superior identification efficiency (SIE) and paper rank percentile (PRP), were proposed to sketch other dimensions of journal performance. Based on bibliometric data from ecological field, we find that as p increases, the correlation between SIE and JIF first rises then drops, indicating that JIF might most likely reflect “how well a journal identifies the top 26~34% papers in the field”. Hopefully, the new proposed SII metric and its extensions could promote the quality awareness and provide flexible tools for research evaluation.

Promoting Open Science through bibliometrics | LIBER Quarterly: The Journal of the Association of European Research Libraries

Abstract:  In order to assess the progress of Open Science in France, the French Ministry of Higher Education, Research and Innovation published the French Open Science Monitor in 2019. Even if this tool has a bias, for only the publications with a DOI can be considered, thus promoting article-dominant research communities, its indicators are trustworthy and reliable. The University of Lorraine was the very first institution to reuse the National Monitor in order to create a new version at the scale of one university in 2020. Since its release, the Lorraine Open Science Monitor has been reused by many other institutions. In 2022, the French Open Science Monitor further evolved, enabling new insights on open science. The Lorraine Open Science Monitor has also evolved since it began. This paper details how the initial code for the Lorraine Open Science Monitor was developed and disseminated. It then outlines plans for development in the next few years.