Project – BISON

“The B!SON project, realized by TIB and SLUB Dresden, implements a recommendation service for quality-assured open access journals. From the large amount of open access journals available, this system will create a list of suitable journals sorted according to thematic relevance. For this purpose, in addition to common bibliometric methods of determining similarity, machine learning methods are used, which can determine relevant publication venues based on the semantic similarity of the title or abstract of an article to be published.

The partners cooperate with OpenCitations and DOAJ (Directory of Open Access Journals) and strive for a close exchange with institutions that advise authors. Several scientific institutions support the project.

While open access publishing requirements are steadily increasing and there is a growing number of open access journals, authors often lack knowledge of relevant, quality-assured open access journals that would be suitable for publishing their own research. A freely accessible tool that can be linked to local support structures will help to make the transition to open access successful….”

Harnessing Scholarly Literature as Data to Curate, Explore, and Evaluate Scientific Research

Abstract:  There currently exist hundreds of millions of scientific publications, with more being created at an ever-increasing rate. This is leading to information overload: the scale and complexity of this body of knowledge is increasing well beyond the capacity of any individual to make sense of it all, overwhelming traditional, manual methods of curation and synthesis. At the same time, the availability of this literature and surrounding metadata in structured, digital form, along with the proliferation of computing power and techniques to take advantage of large-scale and complex data, represents an opportunity to develop new tools and techniques to help people make connections, synthesize, and pose new hypotheses. This dissertation consists of several contributions of data, methods, and tools aimed at addressing information overload in science. My central contribution to this space is Autoreview, a framework for building and evaluating systems to automatically select relevant publications for literature reviews, starting from small sets of seed papers. These automated methods have the potential to help researchers save time and effort when keeping up with relevant literature, as well as surfacing papers that more manual methods may miss. I show that this approach can work to recommend relevant literature, and can also be used to systematically compare different features used in the recommendations. I also present the design, implementation, and evaluation of several visualization tools. One of these is an animated network visualization showing the influence of a scholar over time. Another is SciSight, an interactive system for recommending new authors and research by finding similarities along different dimensions. Additionally, I discuss the current state of available scholarly data sets; my work curating, linking, and building upon these data sets; and methods I developed to scale graph clustering techniques to very large networks.


Open Book Genome Project: Planning Document – Google Docs


To create an open, community-powered Book Genome Project which produces open standards, data, and services to enable deeper, faster and more holistic understanding of a book’s unique characteristics.


In ~2003, Aaron Stanton co-founded a project called the Book Genome Project (based on Pandora’s music genome project) to “identify, track, measure, and study the multitude of features that make up a book”. This taxonomic engine could be applied on a book to surface unique patterns and insights which predict its structure, themes, age-appropriateness, and even pace. The group behind the Book Genome Project used this technology to power a user-facing website called BookLamp to power book recommendations based on quantifiable similarities in their contents. The project was acquired by Apple circa 2014 along with its patents and discontinued.


Today, the world has at its fingertips a powerful non-profit, open-source project called which enables book lovers to access more than 4 million readable books online through the Internet Archive’s controlled digital lending library program. Similar to Goodreads, Open Library serves as a catalog of 19 million book records which readers may use to discover recommendations and track books they want to read. Similar to BookLamp, Open Library has great potential to bring book insights to its audience of more than 3M international book lovers….”

The Open Book Genome Project

“Nine years later, Will Glaser & Tim Westergren drew inspiration from HGP and launched a similar effort called the Music Genome Project, using trained experts to classify and label music according to a taxonomy of characteristics, like genre and tempo. This system became the engine which powers song recommendations for Pandora Radio.

Circa 2003, Aaron Stanton, Matt Monroe, Sidian Jones, and Dan Bowen adapted the idea of Pandora to books, creating a book recommendation service called BookLamp. Under the hood, they devised a Book Genome Project which combined computers and crowds to “identify, track, measure, and study the multitude of features that make up a book”….

In 2006, a project called the Open Music Genome Project attempted to create a public, open, community alternative to Pandora’s Music Genome Project. We thought this was a beautiful gesture and a great opportunity for Open Library; perhaps we could facilitate public book insights which any project in the ecosystem could use to create their own answer for, “what is a book?”. We also found inspiration from complimentary projects like StoryGraph, which elegantly crowd sources book tags from patrons to help you, “choose your next book based on your mood and your favorite topics and themes”, HaithiTrust Research Center (HTRC) which has led the way in making book data available to researchers, and the Open Syllabus Project which is surfacing useful academic books based on their usage across college curriculum….

Our hope is that this Open Book Genome Project will help responsibly make book data more useful and accessible to the public: to power book recommendations, to compare books based on their similarities and differences, to produce more accurate summaries, to calculate reading levels to match audiences to books, to surface citations and urls mentioned within books, and more….”

Words Algorithm Collection – finding closely related open access books using text mining techniques | LIBER Quarterly: The Journal of the Association of European Research Libraries

Open access platforms and retail websites are both trying to present the most relevant offerings to their patrons. Retail websites deploy recommender systems that collect data about their customers. These systems are successful but intrude on privacy. As an alternative, this paper presents an algorithm that uses text mining techniques to find the most important themes of an open access book or chapter. By locating other publications that share one or more of these themes, it is possible to recommend closely related books or chapters.

The algorithm splits the full text in trigrams. It removes all trigrams containing words that are commonly used in everyday language and in (open access) book publishing. The most occurring remaining trigrams are distinctive to the publication and indicate the themes of the book. The next step is finding publications that share one or more of the trigrams. The strength of the connection can be measured by counting – and ranking – the number of shared trigrams. The algorithm was used to find connections between 10,997 titles: 67% in English, 29% in German and 6% in Dutch or a combination of languages. The algorithm is able to find connected books across languages.

It is possible use the algorithm for several use cases, not just recommender systems. Creating benchmarks for publishers or creating a collection of connected titles for libraries are other possibilities. Apart from the OAPEN Library, the algorithm can be applied to other collections of open access books or even open access journal articles. Combining the results across multiple collections will enhance its effectiveness.

Project MUSE introduces AI-based links, powered by UNSILO, for related content

“Project MUSE is partnering with UNSILO, a Cactus Communications (CACTUS) brand that develops artificial intelligence(AI)-powered solutions for publishers, to implement robust new AI-driven content recommendations throughout its massive collection of books and journals in the humanities and social sciences. UNSILO recently completed the initial indexing of the Project MUSE content collection and enhanced related content recommendations appear throughout the platform.

The UNSILO Recommender API automatically identifies links to relevant content from the MUSE content corpus for any selected document (book chapter or journal article). The indexing is updated every 24 hours as new content is added to MUSE. Links are delivered to the platform in real time, enriching the user experience and providing relevance-ranked discovery that augments the learning experience. Over 250 concepts are extracted from every document, and then matched by rank with related material. …”

Experience of using CORE Recommender – an interview – Research

“Making the repository experience more rewarding for users is a continual endeavour for repository managers, and the CORE Recommender is designed to provide a simple and fast solution to help researchers discover relevant further reading. The CORE Recommender is a plugin for repositories, journals and web interfaces that provides article suggestions closely related to the articles that the user is actively reading.  The source of recommended data is the base of CORE, which consists of over 25 million full texts from CORE….”

Experience of using CORE Recommender – CORE

“CORE Recommender is a plugin for repositories, journals and web interfaces that provides suggestions on relevant articles to the article a user is looking for. The source of recommended data is the base of CORE, which consists of over 25 million full texts from CORE. Today we have interviewed George Macgregor, Scholarly Publications & Research Data Manager at the University of Strathclyde, responsible for the Strathprints institutional repository.  Read about his experience of using CORE Recommender on the Jisc Research blog….”

CORE Recommender installation for DSpace – CORE

“The CORE Recommender is a large and complex service, while its main purpose is to advance a repository by recommending similar articles. This blog post reviews only the plugin for a dspace/jspui based repository. The source of recommended data is the base of CORE, which consists of metadata descriptions and full texts. In addition, this plugin can recommend articles from the same repository as well.

To install the CORE Recommender, first of all, you should read a description of the service and register. It is possible in manual mode or via the CORE Repository Dashboard. I recommend that you use the CORE Discovery Dashboard which allows you not only to have access to CORE services but also to control and monitor the harvesting process….”

The Citation Advantage of Promoted Articles in a Cross?Publisher Distribution Platform: A 12?Month Randomized Controlled Trial – Kudlow – – Journal of the Association for Information Science and Technology – Wiley Online Library

Abstract:  There is currently a paucity of evidence?based strategies that have been shown to increase citations of peer?reviewed articles following their publication. We conducted a 12?month randomized controlled trial to examine whether the promotion of article links in an online cross?publisher distribution platform (TrendMD) affects citations. In all, 3,200 articles published in 64 peer?reviewed journals across eight subject areas were block randomized at the subject level to either the TrendMD group (n = 1,600) or the control group (n = 1,600) of the study. Our primary outcome compares the mean citations of articles randomized to TrendMD versus control after 12 months. Articles randomized to TrendMD showed a 50% increase in mean citations relative to control at 12 months. The difference in mean citations at 12 months for articles randomized to TrendMD versus control was 5.06, 95% confidence interval [2.87, 7.25], was statistically significant (p?<?.001) and found in three of eight subject areas. At 6 months following publication, articles randomized to TrendMD showed a smaller, yet statistically significant (p = .005), 21% increase in mean citations, relative to control. To our knowledge, this is the first randomized controlled trial to demonstrate how an intervention can be used to increase citations of peer?reviewed articles after they have been published.