[2208.08426] “We Need a Woman in Music”: Exploring Wikipedia’s Values on Article Priority

Abstract:  Wikipedia — like most peer production communities — suffers from a basic problem: the amount of work that needs to be done (articles to be created and improved) exceeds the available resources (editor effort). Recommender systems have been deployed to address this problem, but they have tended to recommend work tasks that match individuals’ personal interests, ignoring more global community values. In English Wikipedia, discussion about Vital articles constitutes a proxy for community values about the types of articles that are most important, and should therefore be prioritized for improvement. We first analyzed these discussions, finding that an article’s priority is considered a function of 1) its inherent importance and 2) its effects on Wikipedia’s global composition. One important example of the second consideration is balance, including along the dimensions of gender and geography. We then conducted a quantitative analysis evaluating how four different article prioritization methods — two from prior research — would affect Wikipedia’s overall balance on these two dimensions; we found significant differences among the methods. We discuss the implications of our results, including particularly how they can guide the design of recommender systems that take into account community values, not just individuals’ interests.

 

[2208.08426] “We Need a Woman in Music”: Exploring Wikipedia’s Values on Article Priority

Abstract:  Wikipedia — like most peer production communities — suffers from a basic problem: the amount of work that needs to be done (articles to be created and improved) exceeds the available resources (editor effort). Recommender systems have been deployed to address this problem, but they have tended to recommend work tasks that match individuals’ personal interests, ignoring more global community values. In English Wikipedia, discussion about Vital articles constitutes a proxy for community values about the types of articles that are most important, and should therefore be prioritized for improvement. We first analyzed these discussions, finding that an article’s priority is considered a function of 1) its inherent importance and 2) its effects on Wikipedia’s global composition. One important example of the second consideration is balance, including along the dimensions of gender and geography. We then conducted a quantitative analysis evaluating how four different article prioritization methods — two from prior research — would affect Wikipedia’s overall balance on these two dimensions; we found significant differences among the methods. We discuss the implications of our results, including particularly how they can guide the design of recommender systems that take into account community values, not just individuals’ interests.

 

Personalised publication recommendation service for open-access digital archives – Ahmet Aníl Müngen, 2022

Abstract:  Increase in the number of open-access academic publications and open-access institutional academic archives led more researchers use these archives. No model offers personalised publication suggestions in academic archives. A central service architecture has been proposed towards personalised academic article recommendations for open-access digital archives. Thus, it has been possible to make personalised suggestions for open-access digital archives and enable researchers to discover new publications. A service based on the centralised micro-service architecture model was proposed in the study. Also, TF-IDF and article classification methods were used together for the personalised publication suggestion system. For the first time globally, the proposed method was used with 1464 real users in 49 open-access archives. F-measure success value was found to be higher than 0.8 for recommended publications.

 

Project – BISON

“The B!SON project, realized by TIB and SLUB Dresden, implements a recommendation service for quality-assured open access journals. From the large amount of open access journals available, this system will create a list of suitable journals sorted according to thematic relevance. For this purpose, in addition to common bibliometric methods of determining similarity, machine learning methods are used, which can determine relevant publication venues based on the semantic similarity of the title or abstract of an article to be published.

The partners cooperate with OpenCitations and DOAJ (Directory of Open Access Journals) and strive for a close exchange with institutions that advise authors. Several scientific institutions support the project.

While open access publishing requirements are steadily increasing and there is a growing number of open access journals, authors often lack knowledge of relevant, quality-assured open access journals that would be suitable for publishing their own research. A freely accessible tool that can be linked to local support structures will help to make the transition to open access successful….”

Harnessing Scholarly Literature as Data to Curate, Explore, and Evaluate Scientific Research

Abstract:  There currently exist hundreds of millions of scientific publications, with more being created at an ever-increasing rate. This is leading to information overload: the scale and complexity of this body of knowledge is increasing well beyond the capacity of any individual to make sense of it all, overwhelming traditional, manual methods of curation and synthesis. At the same time, the availability of this literature and surrounding metadata in structured, digital form, along with the proliferation of computing power and techniques to take advantage of large-scale and complex data, represents an opportunity to develop new tools and techniques to help people make connections, synthesize, and pose new hypotheses. This dissertation consists of several contributions of data, methods, and tools aimed at addressing information overload in science. My central contribution to this space is Autoreview, a framework for building and evaluating systems to automatically select relevant publications for literature reviews, starting from small sets of seed papers. These automated methods have the potential to help researchers save time and effort when keeping up with relevant literature, as well as surfacing papers that more manual methods may miss. I show that this approach can work to recommend relevant literature, and can also be used to systematically compare different features used in the recommendations. I also present the design, implementation, and evaluation of several visualization tools. One of these is an animated network visualization showing the influence of a scholar over time. Another is SciSight, an interactive system for recommending new authors and research by finding similarities along different dimensions. Additionally, I discuss the current state of available scholarly data sets; my work curating, linking, and building upon these data sets; and methods I developed to scale graph clustering techniques to very large networks.

 

Open Book Genome Project: Planning Document – Google Docs

“Mission

To create an open, community-powered Book Genome Project which produces open standards, data, and services to enable deeper, faster and more holistic understanding of a book’s unique characteristics.

Background

In ~2003, Aaron Stanton co-founded a project called the Book Genome Project (based on Pandora’s music genome project) to “identify, track, measure, and study the multitude of features that make up a book”. This taxonomic engine could be applied on a book to surface unique patterns and insights which predict its structure, themes, age-appropriateness, and even pace. The group behind the Book Genome Project used this technology to power a user-facing website called BookLamp to power book recommendations based on quantifiable similarities in their contents. The project was acquired by Apple circa 2014 along with its patents and discontinued.

Opportunity

Today, the world has at its fingertips a powerful non-profit, open-source project called OpenLibrary.org which enables book lovers to access more than 4 million readable books online through the Internet Archive’s controlled digital lending library program. Similar to Goodreads, Open Library serves as a catalog of 19 million book records which readers may use to discover recommendations and track books they want to read. Similar to BookLamp, Open Library has great potential to bring book insights to its audience of more than 3M international book lovers….”

The Open Book Genome Project

“Nine years later, Will Glaser & Tim Westergren drew inspiration from HGP and launched a similar effort called the Music Genome Project, using trained experts to classify and label music according to a taxonomy of characteristics, like genre and tempo. This system became the engine which powers song recommendations for Pandora Radio.

Circa 2003, Aaron Stanton, Matt Monroe, Sidian Jones, and Dan Bowen adapted the idea of Pandora to books, creating a book recommendation service called BookLamp. Under the hood, they devised a Book Genome Project which combined computers and crowds to “identify, track, measure, and study the multitude of features that make up a book”….

In 2006, a project called the Open Music Genome Project attempted to create a public, open, community alternative to Pandora’s Music Genome Project. We thought this was a beautiful gesture and a great opportunity for Open Library; perhaps we could facilitate public book insights which any project in the ecosystem could use to create their own answer for, “what is a book?”. We also found inspiration from complimentary projects like StoryGraph, which elegantly crowd sources book tags from patrons to help you, “choose your next book based on your mood and your favorite topics and themes”, HaithiTrust Research Center (HTRC) which has led the way in making book data available to researchers, and the Open Syllabus Project which is surfacing useful academic books based on their usage across college curriculum….

Our hope is that this Open Book Genome Project will help responsibly make book data more useful and accessible to the public: to power book recommendations, to compare books based on their similarities and differences, to produce more accurate summaries, to calculate reading levels to match audiences to books, to surface citations and urls mentioned within books, and more….”

Words Algorithm Collection – finding closely related open access books using text mining techniques | LIBER Quarterly: The Journal of the Association of European Research Libraries

Open access platforms and retail websites are both trying to present the most relevant offerings to their patrons. Retail websites deploy recommender systems that collect data about their customers. These systems are successful but intrude on privacy. As an alternative, this paper presents an algorithm that uses text mining techniques to find the most important themes of an open access book or chapter. By locating other publications that share one or more of these themes, it is possible to recommend closely related books or chapters.

The algorithm splits the full text in trigrams. It removes all trigrams containing words that are commonly used in everyday language and in (open access) book publishing. The most occurring remaining trigrams are distinctive to the publication and indicate the themes of the book. The next step is finding publications that share one or more of the trigrams. The strength of the connection can be measured by counting – and ranking – the number of shared trigrams. The algorithm was used to find connections between 10,997 titles: 67% in English, 29% in German and 6% in Dutch or a combination of languages. The algorithm is able to find connected books across languages.

It is possible use the algorithm for several use cases, not just recommender systems. Creating benchmarks for publishers or creating a collection of connected titles for libraries are other possibilities. Apart from the OAPEN Library, the algorithm can be applied to other collections of open access books or even open access journal articles. Combining the results across multiple collections will enhance its effectiveness.

Project MUSE introduces AI-based links, powered by UNSILO, for related content

“Project MUSE is partnering with UNSILO, a Cactus Communications (CACTUS) brand that develops artificial intelligence(AI)-powered solutions for publishers, to implement robust new AI-driven content recommendations throughout its massive collection of books and journals in the humanities and social sciences. UNSILO recently completed the initial indexing of the Project MUSE content collection and enhanced related content recommendations appear throughout the platform.

The UNSILO Recommender API automatically identifies links to relevant content from the MUSE content corpus for any selected document (book chapter or journal article). The indexing is updated every 24 hours as new content is added to MUSE. Links are delivered to the platform in real time, enriching the user experience and providing relevance-ranked discovery that augments the learning experience. Over 250 concepts are extracted from every document, and then matched by rank with related material. …”

Experience of using CORE Recommender – an interview – Research

“Making the repository experience more rewarding for users is a continual endeavour for repository managers, and the CORE Recommender is designed to provide a simple and fast solution to help researchers discover relevant further reading. The CORE Recommender is a plugin for repositories, journals and web interfaces that provides article suggestions closely related to the articles that the user is actively reading.  The source of recommended data is the base of CORE, which consists of over 25 million full texts from CORE….”

Experience of using CORE Recommender – CORE

“CORE Recommender is a plugin for repositories, journals and web interfaces that provides suggestions on relevant articles to the article a user is looking for. The source of recommended data is the base of CORE, which consists of over 25 million full texts from CORE. Today we have interviewed George Macgregor, Scholarly Publications & Research Data Manager at the University of Strathclyde, responsible for the Strathprints institutional repository.  Read about his experience of using CORE Recommender on the Jisc Research blog….”