Comparing different search methods for the open access journal recommendation tool B!SON | SpringerLink

Abstract:  Finding a suitable open access journal to publish academic work is a complex task: Researchers have to navigate a constantly growing number of journals, institutional agreements with publishers, funders’ conditions and the risk of predatory publishers. To help with these challenges, we introduce a web-based journal recommendation system called B!SON. A systematic requirements analysis was conducted in the form of a survey. The developed tool suggests open access journals based on title, abstract and references provided by the user. The recommendations are built on open data, publisher-independent and work across domains and languages. Transparency is provided by its open source nature, an open application programming interface (API) and by specifying which matches the shown recommendations are based on. The recommendation quality has been evaluated using two different evaluation techniques, including several new recommendation methods. We were able to improve the results from our previous paper with a pre-trained transformer model. The beta version of the tool received positive feedback from the community and in several test sessions. We developed a recommendation system for open access journals to help researchers find a suitable journal. The open tool has been extensively tested, and we found possible improvements for our current recommendation technique. Development by two German academic libraries ensures the longevity and sustainability of the system.

Journal of Medical Internet Research – Evaluating the Ability of Open-Source Artificial Intelligence to Predict Accepting-Journal Impact Factor and Eigenfactor Score Using Academic Article Abstracts: Cross-sectional Machine Learning Analysis

Abstract:  Strategies to improve the selection of appropriate target journals may reduce delays in disseminating research results. Machine learning is increasingly used in content-based recommender algorithms to guide journal submissions for academic articles.

Objective:

We sought to evaluate the performance of open-source artificial intelligence to predict the impact factor or Eigenfactor score tertile using academic article abstracts.

Methods:

PubMed-indexed articles published between 2016 and 2021 were identified with the Medical Subject Headings (MeSH) terms “ophthalmology,” “radiology,” and “neurology.” Journals, titles, abstracts, author lists, and MeSH terms were collected. Journal impact factor and Eigenfactor scores were sourced from the 2020 Clarivate Journal Citation Report. The journals included in the study were allocated percentile ranks based on impact factor and Eigenfactor scores, compared with other journals that released publications in the same year. All abstracts were preprocessed, which included the removal of the abstract structure, and combined with titles, authors, and MeSH terms as a single input. The input data underwent preprocessing with the inbuilt ktrain Bidirectional Encoder Representations from Transformers (BERT) preprocessing library before analysis with BERT. Before use for logistic regression and XGBoost models, the input data underwent punctuation removal, negation detection, stemming, and conversion into a term frequency-inverse document frequency array. Following this preprocessing, data were randomly split into training and testing data sets with a 3:1 train:test ratio. Models were developed to predict whether a given article would be published in a first, second, or third tertile journal (0-33rd centile, 34th-66th centile, or 67th-100th centile), as ranked either by impact factor or Eigenfactor score. BERT, XGBoost, and logistic regression models were developed on the training data set before evaluation on the hold-out test data set. The primary outcome was overall classification accuracy for the best-performing model in the prediction of accepting journal impact factor tertile.

Results:

There were 10,813 articles from 382 unique journals. The median impact factor and Eigenfactor score were 2.117 (IQR 1.102-2.622) and 0.00247 (IQR 0.00105-0.03), respectively. The BERT model achieved the highest impact factor tertile classification accuracy of 75.0%, followed by an accuracy of 71.6% for XGBoost and 65.4% for logistic regression. Similarly, BERT achieved the highest Eigenfactor score tertile classification accuracy of 73.6%, followed by an accuracy of 71.8% for XGBoost and 65.3% for logistic regression.

Conclusions:

Open-source artificial intelligence can predict the impact factor and Eigenfactor score of accepting peer-reviewed journals. Further studies are required to examine the effect on publication success and the time-to-publication of such recommender systems.

[2302.07302] CiteSee: Augmenting Citations in Scientific Papers with Persistent and Personalized Historical Context

Abstract:  When reading a scholarly article, inline citations help researchers contextualize the current article and discover relevant prior work. However, it can be challenging to prioritize and make sense of the hundreds of citations encountered during literature reviews. This paper introduces CiteSee, a paper reading tool that leverages a user’s publishing, reading, and saving activities to provide personalized visual augmentations and context around citations. First, CiteSee connects the current paper to familiar contexts by surfacing known citations a user had cited or opened. Second, CiteSee helps users prioritize their exploration by highlighting relevant but unknown citations based on saving and reading history. We conducted a lab study that suggests CiteSee is significantly more effective for paper discovery than three baselines. A field deployment study shows CiteSee helps participants keep track of their explorations and leads to better situational awareness and increased paper discovery via inline citation when conducting real-world literature reviews.

 

Scientific paper recommendation systems: a literature review of recent publications | SpringerLink

Abstract:  Scientific writing builds upon already published papers. Manual identification of publications to read, cite or consider as related papers relies on a researcher’s ability to identify fitting keywords or initial papers from which a literature search can be started. The rapidly increasing amount of papers has called for automatic measures to find the desired relevant publications, so-called paper recommendation systems. As the number of publications increases so does the amount of paper recommendation systems. Former literature reviews focused on discussing the general landscape of approaches throughout the years and highlight the main directions. We refrain from this perspective, instead we only consider a comparatively small time frame but analyse it fully. In this literature review we discuss used methods, datasets, evaluations and open challenges encountered in all works first released between January 2019 and October 2021. The goal of this survey is to provide a comprehensive and complete overview of current paper recommendation systems.

 

[2208.08426] “We Need a Woman in Music”: Exploring Wikipedia’s Values on Article Priority

Abstract:  Wikipedia — like most peer production communities — suffers from a basic problem: the amount of work that needs to be done (articles to be created and improved) exceeds the available resources (editor effort). Recommender systems have been deployed to address this problem, but they have tended to recommend work tasks that match individuals’ personal interests, ignoring more global community values. In English Wikipedia, discussion about Vital articles constitutes a proxy for community values about the types of articles that are most important, and should therefore be prioritized for improvement. We first analyzed these discussions, finding that an article’s priority is considered a function of 1) its inherent importance and 2) its effects on Wikipedia’s global composition. One important example of the second consideration is balance, including along the dimensions of gender and geography. We then conducted a quantitative analysis evaluating how four different article prioritization methods — two from prior research — would affect Wikipedia’s overall balance on these two dimensions; we found significant differences among the methods. We discuss the implications of our results, including particularly how they can guide the design of recommender systems that take into account community values, not just individuals’ interests.

 

[2208.08426] “We Need a Woman in Music”: Exploring Wikipedia’s Values on Article Priority

Abstract:  Wikipedia — like most peer production communities — suffers from a basic problem: the amount of work that needs to be done (articles to be created and improved) exceeds the available resources (editor effort). Recommender systems have been deployed to address this problem, but they have tended to recommend work tasks that match individuals’ personal interests, ignoring more global community values. In English Wikipedia, discussion about Vital articles constitutes a proxy for community values about the types of articles that are most important, and should therefore be prioritized for improvement. We first analyzed these discussions, finding that an article’s priority is considered a function of 1) its inherent importance and 2) its effects on Wikipedia’s global composition. One important example of the second consideration is balance, including along the dimensions of gender and geography. We then conducted a quantitative analysis evaluating how four different article prioritization methods — two from prior research — would affect Wikipedia’s overall balance on these two dimensions; we found significant differences among the methods. We discuss the implications of our results, including particularly how they can guide the design of recommender systems that take into account community values, not just individuals’ interests.

 

Personalised publication recommendation service for open-access digital archives – Ahmet Aníl Müngen, 2022

Abstract:  Increase in the number of open-access academic publications and open-access institutional academic archives led more researchers use these archives. No model offers personalised publication suggestions in academic archives. A central service architecture has been proposed towards personalised academic article recommendations for open-access digital archives. Thus, it has been possible to make personalised suggestions for open-access digital archives and enable researchers to discover new publications. A service based on the centralised micro-service architecture model was proposed in the study. Also, TF-IDF and article classification methods were used together for the personalised publication suggestion system. For the first time globally, the proposed method was used with 1464 real users in 49 open-access archives. F-measure success value was found to be higher than 0.8 for recommended publications.

 

Project – BISON

“The B!SON project, realized by TIB and SLUB Dresden, implements a recommendation service for quality-assured open access journals. From the large amount of open access journals available, this system will create a list of suitable journals sorted according to thematic relevance. For this purpose, in addition to common bibliometric methods of determining similarity, machine learning methods are used, which can determine relevant publication venues based on the semantic similarity of the title or abstract of an article to be published.

The partners cooperate with OpenCitations and DOAJ (Directory of Open Access Journals) and strive for a close exchange with institutions that advise authors. Several scientific institutions support the project.

While open access publishing requirements are steadily increasing and there is a growing number of open access journals, authors often lack knowledge of relevant, quality-assured open access journals that would be suitable for publishing their own research. A freely accessible tool that can be linked to local support structures will help to make the transition to open access successful….”

Harnessing Scholarly Literature as Data to Curate, Explore, and Evaluate Scientific Research

Abstract:  There currently exist hundreds of millions of scientific publications, with more being created at an ever-increasing rate. This is leading to information overload: the scale and complexity of this body of knowledge is increasing well beyond the capacity of any individual to make sense of it all, overwhelming traditional, manual methods of curation and synthesis. At the same time, the availability of this literature and surrounding metadata in structured, digital form, along with the proliferation of computing power and techniques to take advantage of large-scale and complex data, represents an opportunity to develop new tools and techniques to help people make connections, synthesize, and pose new hypotheses. This dissertation consists of several contributions of data, methods, and tools aimed at addressing information overload in science. My central contribution to this space is Autoreview, a framework for building and evaluating systems to automatically select relevant publications for literature reviews, starting from small sets of seed papers. These automated methods have the potential to help researchers save time and effort when keeping up with relevant literature, as well as surfacing papers that more manual methods may miss. I show that this approach can work to recommend relevant literature, and can also be used to systematically compare different features used in the recommendations. I also present the design, implementation, and evaluation of several visualization tools. One of these is an animated network visualization showing the influence of a scholar over time. Another is SciSight, an interactive system for recommending new authors and research by finding similarities along different dimensions. Additionally, I discuss the current state of available scholarly data sets; my work curating, linking, and building upon these data sets; and methods I developed to scale graph clustering techniques to very large networks.

 

Open Book Genome Project: Planning Document – Google Docs

“Mission

To create an open, community-powered Book Genome Project which produces open standards, data, and services to enable deeper, faster and more holistic understanding of a book’s unique characteristics.

Background

In ~2003, Aaron Stanton co-founded a project called the Book Genome Project (based on Pandora’s music genome project) to “identify, track, measure, and study the multitude of features that make up a book”. This taxonomic engine could be applied on a book to surface unique patterns and insights which predict its structure, themes, age-appropriateness, and even pace. The group behind the Book Genome Project used this technology to power a user-facing website called BookLamp to power book recommendations based on quantifiable similarities in their contents. The project was acquired by Apple circa 2014 along with its patents and discontinued.

Opportunity

Today, the world has at its fingertips a powerful non-profit, open-source project called OpenLibrary.org which enables book lovers to access more than 4 million readable books online through the Internet Archive’s controlled digital lending library program. Similar to Goodreads, Open Library serves as a catalog of 19 million book records which readers may use to discover recommendations and track books they want to read. Similar to BookLamp, Open Library has great potential to bring book insights to its audience of more than 3M international book lovers….”

The Open Book Genome Project

“Nine years later, Will Glaser & Tim Westergren drew inspiration from HGP and launched a similar effort called the Music Genome Project, using trained experts to classify and label music according to a taxonomy of characteristics, like genre and tempo. This system became the engine which powers song recommendations for Pandora Radio.

Circa 2003, Aaron Stanton, Matt Monroe, Sidian Jones, and Dan Bowen adapted the idea of Pandora to books, creating a book recommendation service called BookLamp. Under the hood, they devised a Book Genome Project which combined computers and crowds to “identify, track, measure, and study the multitude of features that make up a book”….

In 2006, a project called the Open Music Genome Project attempted to create a public, open, community alternative to Pandora’s Music Genome Project. We thought this was a beautiful gesture and a great opportunity for Open Library; perhaps we could facilitate public book insights which any project in the ecosystem could use to create their own answer for, “what is a book?”. We also found inspiration from complimentary projects like StoryGraph, which elegantly crowd sources book tags from patrons to help you, “choose your next book based on your mood and your favorite topics and themes”, HaithiTrust Research Center (HTRC) which has led the way in making book data available to researchers, and the Open Syllabus Project which is surfacing useful academic books based on their usage across college curriculum….

Our hope is that this Open Book Genome Project will help responsibly make book data more useful and accessible to the public: to power book recommendations, to compare books based on their similarities and differences, to produce more accurate summaries, to calculate reading levels to match audiences to books, to surface citations and urls mentioned within books, and more….”

Words Algorithm Collection – finding closely related open access books using text mining techniques | LIBER Quarterly: The Journal of the Association of European Research Libraries

Open access platforms and retail websites are both trying to present the most relevant offerings to their patrons. Retail websites deploy recommender systems that collect data about their customers. These systems are successful but intrude on privacy. As an alternative, this paper presents an algorithm that uses text mining techniques to find the most important themes of an open access book or chapter. By locating other publications that share one or more of these themes, it is possible to recommend closely related books or chapters.

The algorithm splits the full text in trigrams. It removes all trigrams containing words that are commonly used in everyday language and in (open access) book publishing. The most occurring remaining trigrams are distinctive to the publication and indicate the themes of the book. The next step is finding publications that share one or more of the trigrams. The strength of the connection can be measured by counting – and ranking – the number of shared trigrams. The algorithm was used to find connections between 10,997 titles: 67% in English, 29% in German and 6% in Dutch or a combination of languages. The algorithm is able to find connected books across languages.

It is possible use the algorithm for several use cases, not just recommender systems. Creating benchmarks for publishers or creating a collection of connected titles for libraries are other possibilities. Apart from the OAPEN Library, the algorithm can be applied to other collections of open access books or even open access journal articles. Combining the results across multiple collections will enhance its effectiveness.

Project MUSE introduces AI-based links, powered by UNSILO, for related content

“Project MUSE is partnering with UNSILO, a Cactus Communications (CACTUS) brand that develops artificial intelligence(AI)-powered solutions for publishers, to implement robust new AI-driven content recommendations throughout its massive collection of books and journals in the humanities and social sciences. UNSILO recently completed the initial indexing of the Project MUSE content collection and enhanced related content recommendations appear throughout the platform.

The UNSILO Recommender API automatically identifies links to relevant content from the MUSE content corpus for any selected document (book chapter or journal article). The indexing is updated every 24 hours as new content is added to MUSE. Links are delivered to the platform in real time, enriching the user experience and providing relevance-ranked discovery that augments the learning experience. Over 250 concepts are extracted from every document, and then matched by rank with related material. …”

Experience of using CORE Recommender – an interview – Research

“Making the repository experience more rewarding for users is a continual endeavour for repository managers, and the CORE Recommender is designed to provide a simple and fast solution to help researchers discover relevant further reading. The CORE Recommender is a plugin for repositories, journals and web interfaces that provides article suggestions closely related to the articles that the user is actively reading.  The source of recommended data is the base of CORE, which consists of over 25 million full texts from CORE….”