Improving Wikipedia verifiability with AI | Nature Machine Intelligence

Abstract:  Verifiability is a core content policy of Wikipedia: claims need to be backed by citations. Maintaining and improving the quality of Wikipedia references is an important challenge and there is a pressing need for better tools to assist humans in this effort. We show that the process of improving references can be tackled with the help of artificial intelligence (AI) powered by an information retrieval system and a language model. This neural-network-based system, which we call SIDE, can identify Wikipedia citations that are unlikely to support their claims, and subsequently recommend better ones from the web. We train this model on existing Wikipedia references, therefore learning from the contributions and combined wisdom of thousands of Wikipedia editors. Using crowdsourcing, we observe that for the top 10% most likely citations to be tagged as unverifiable by our system, humans prefer our system’s suggested alternatives compared with the originally cited reference 70% of the time. To validate the applicability of our system, we built a demo to engage with the English-speaking Wikipedia community and find that SIDE’s first citation recommendation is preferred twice as often as the existing Wikipedia citation for the same top 10% most likely unverifiable claims according to SIDE. Our results indicate that an AI-based system could be used, in tandem with humans, to improve the verifiability of Wikipedia.


COVID-19 Wikipedia pageview spikes, 2019-2022 – addshore

“Back in 2019 at the start of the COVID-19 outbreak, Wikipedia saw large spikes in page views on COVID-19 related topics while people here hunting for information.

I briefly looked at some of the spikes in March 2020 using the easy-to-use pageview tool for Wikimedia sites. But the problem with viewing the spikes through this tool is that you can only look at 10 pages at a time on a single site, when in reality you’d want to look at many pages relating to a topic, across multiple sites at once.

I wrote a notebook to do just this, submitted it for privacy review, and I am finally getting around to putting some of those moving parts and visualizations in public view….”

Wikipedia can measure dissemination of science | Times Higher Education (THE)

“Article edit history is an untapped resource in charting the evolution of scientific knowledge over time, researchers say….

Wikipedia could prove an invaluable source in charting the output of scientific knowledge as it diffuses into the public discourse, argue the authors of new research.

Rona Aviram and Omer Benjakob, whose findings were published on 13 September in the journal Plos One, combed over thousands of iterations of articles in the online encyclopaedia related to the gene-editing technology CRISPR, which served as a case study into how scientific findings influence Wikipedia entries….”

Wikipedia as a tool for contemporary history of science: A case study on CRISPR | PLOS ONE

Abstract:  Rapid developments and methodological divides hinder the study of how scientific knowledge accumulates, consolidates and transfers to the public sphere. Our work proposes using Wikipedia, the online encyclopedia, as a historiographical source for contemporary science. We chose the high-profile field of gene editing as our test case, performing a historical analysis of the English-language Wikipedia articles on CRISPR. Using a mixed-method approach, we qualitatively and quantitatively analyzed the CRISPR article’s text, sections and references, alongside 50 affiliated articles. These, we found, documented the CRISPR field’s maturation from a fundamental scientific discovery to a biotechnological revolution with vast social and cultural implications. We developed automated tools to support such research and demonstrated its applicability to two other scientific fields–coronavirus and circadian clocks. Our method utilizes Wikipedia as a digital and free archive, showing it can document the incremental growth of knowledge and the manner scientific research accumulates and translates into public discourse. Using Wikipedia in this manner compliments and overcomes some issues with contemporary histories and can also augment existing bibliometric research.


Full article: Stigmergy in Open Collaboration: An Empirical Investigation Based on Wikipedia

Abstract:  Participants in open collaboration communities coproduce knowledge despite minimal explicit communication to coordinate the efforts. Studying how participants coordinate around the knowledge artifact and its impacts are critical for understanding the open knowledge production model. This study builds on the theory of stigmergy, wherein actions performed by a participant leave traces on a knowledge artifact and stimulate succeeding actions. We find that stigmergy involves two intertwined processes: collective modification and collective excitation. We propose a new measure of stigmergy based on the spatial and temporal clustering of contributions. By analyzing thousands of Wikipedia articles, we find that the degree of stigmergy is positively associated with community members’ participation and the quality of the knowledge produced. This study contributes to the understanding of open collaboration by characterizing the spatial-temporal clustering of contributions and providing new insights into the relationship between stigmergy and knowledge production outcomes. These findings can help practitioners increase user engagement in knowledge production processes in order to create more sustainable open collaboration communities.

Wikipedia’s Moment of Truth – The New York Times, 18 July 2023

“…In late June, I began to experiment with a plug-in the Wikimedia Foundation had built for ChatGPT. At the time, this software tool was being tested by several dozen Wikipedia editors and foundation staff members, but it became available in mid-July on the OpenAI website for subscribers who want augmented answers to their ChatGPT queries. The effect is similar to the “retrieval” process that Jesse Dodge surmises might be required to produce accurate answers. GPT-4’s knowledge base is currently limited to data it ingested by the end of its training period, in September 2021. A Wikipedia plug-in helps the bot access information about events up to the present day. At least in theory, the tool — lines of code that direct a search for Wikipedia articles that answer a chatbot query — gives users an improved, combinatory experience: the fluency and linguistic capabilities of an A.I. chatbot, merged with the factuality and currency of Wikipedia….”

Abstract Wikipedia gains new support from The Rockefeller Foundation – Diff

“The Wikimedia Foundation is pleased to share that The Rockefeller Foundation has provided a $1 million grant to support the development of Abstract Wikipedia, an initiative that will enable more people to share more knowledge in more languages across Wikipedia, accelerating the Wikimedia movement‘s Knowledge Equity goals.

The long-term aim of Abstract Wikipedia is to build a knowledge base independent of language, making it easier for Wikipedia editors to share, add, translate, and improve knowledge across languages on the online encyclopedia. In short, the initiative will enable more people to contribute content in their preferred languages, making the knowledge available to a larger and more global audience.

The grant from The Rockefeller Foundation will also allow the Abstract Wikipedia team to further develop Wikifunctions, the technical infrastructure behind the idea of Abstract Wikipedia. Wikifunctions will empower volunteers to create reusable code that can perform specific tasks, such as generating text in a certain language….”

Here’s another important reason why academics should publish in open access titles: self interest – Walled Culture

“What this means in practice is that for the general public open access articles are even more beneficial than those published in traditional titles, since they frequently turn up as Wikipedia sources that can be consulted directly. They are also advantageous for the researchers who write them, since their work is more likely to be cited on the widely-read and influential Wikipedia than if the papers were not open access. As the research notes, this effect is even more pronounced for “articles with low citation counts” – basically, academic work that may be important but is rather obscure. This new paper provides yet another compelling reason why researchers should be publishing their work as open access as a matter of course: out of pure self interest.”

Wikipedia and open access

Wikipedia is a well-known platform for disseminating knowledge, and scientific sources, such as journal articles, play a critical role in supporting its mission. The open access movement aims to make scientific knowledge openly available, and we might intuitively expect open access to help further Wikipedia’s mission. However, the extent of this relationship remains largely unknown. To fill this gap, we analyze a large dataset of citations from Wikipedia and model the role of open access in Wikipedia’s citation patterns. We find that open-access articles are extensively and increasingly more cited in Wikipedia. What is more, they show a 15% higher likelihood of being cited in Wikipedia when compared to closed-access articles, after controlling for confounding factors. This open-access citation effect is particularly strong for articles with low citation counts, including recently published ones. Our results show that open access plays a key role in the dissemination of scientific knowledge, including by providing Wikipedia editors timely access to novel results. These findings have important implications for researchers, policymakers, and practitioners in the field of information science and technology.

AI Is Tearing Wikipedia Apart

“As generative artificial intelligence continues to permeate all aspects of culture, the people who steward Wikipedia are divided on how best to proceed.  During a recent community call, it became apparent that there is a community split over whether or not to use large language models to generate content. While some people expressed that tools like Open AI’s ChatGPT could help with generating and summarizing articles, others remained wary.  The concern is that machine-generated content has to be balanced with a lot of human review and would overwhelm lesser-known wikis with bad content. While AI generators are useful for writing believable, human-like text, they are also prone to including erroneous information, and even citing sources and academic papers which don’t exist. This often results in text summaries which seem accurate, but on closer inspection are revealed to be completely fabricated….”

First grants announced from the Wikimedia Endowment to support technical innovation across Wikipedia and Wikimedia projects – Wikimedia Foundation

“The Wikimedia Foundation, the nonprofit that operates Wikipedia, and the Wikimedia Endowment Board today announced the first recipients of grant funding from the Wikimedia Endowment, the long-term fund established in 2016 to support the future of Wikimedia sites. The initiatives that will receive grant funding include Abstract Wikipedia, Kiwix, Machine Learning, and Wikidata. The projects were selected for their ability to foster greater technical innovation on Wikimedia projects, crucial to keeping the sites relevant in a rapidly-evolving landscape….”

CFP Program/Submissions – Wikimania 2023, Singapore + online | deadline March 28, 2023

“The ESEAP Wikimania 2023 Core Organizing Team invites you to submit a program idea for Wikimania. The program submission form is available in Arabic, English, French, Spanish, and Traditional Chinese. We are working on including Indonesian. Submissions are accepted from Tuesday, February 28 until Tuesday, March 28, 2023….The theme for Wikimania 2023 is Diversity, Collaboration, Future. It is intended to be cross-cutting and to apply as a lens to all programming ideas. Your submission should have elements connecting to at least one of these. A lot of what we do every day in Wikimedia – on the projects or in the community – is already reflective of the theme and very much in line with how the ESEAP regional collaboration identifies and operates….”

Twenty years of Wikipedia in scholarly publications: a bibliometric network analysis of the thematic and citation landscape | SpringerLink

Abstract:  Wikipedia has grown to be the biggest online encyclopedia in terms of comprehensiveness, reach and coverage. However, although different websites and social network platforms have received considerable academic attention, Wikipedia has largely gone unnoticed. In this study, we fill this research gap by investigating how Wikipedia is used in scholarly publications since its launch in 2001. More specifically, we review and analyze the intellectual structure of Wikipedia’s scholarly publications based on 3790 Web of Science core collection documents written by 10,636 authors from 100 countries over two decades (2001–2021). Results show that the most influential outlets publishing Wikipedia research include journals such as Plos one, Nucleic Acids Research, the Journal of the Association for Information Science and Technology, the Journal of the American Society for Information Science and Technology, IEEE Access, and Information Processing and Management. Results also show that the author collaboration network is very sparsely connected, indicating the absence of close collaboration among the authors in the field. Furthermore, results reveal that the Wikipedia research institutions’ collaboration network reflects a North–South divide as very limited cooperation occurs between developed and developing countries’ institutions. Finally, the multiple correspondence analysis applied to obtain the Wikipedia research conceptual map reveals the breadth, diversity, and intellectual thrust of the Wikipedia’s scholarly publications. Our analysis has far-reaching implications for aspiring researchers interested in Wikipedia research as we retrospectively trace the evolution in research output over the last two decades, establish linkages between the authors and articles, and reveal trending topics/hotspots within the broad theme of Wikipedia research.