Wikipedia citations in Wikidata – Diff

From Google’s English:  “The Wikipedia Citations dataset currently includes approximately 30 million citations from Wikipedia pages to a variety of sources, including 4 million scientific publications. Increasing the connection with external data services and providing structured data to one of the key elements of Wikipedia articles has two significant advantages: first, better identification of relevant encyclopedic articles related to academic studies; furthermore, the strengthening of Wikipedia as a social authority and political hub, which would allow policy makers to gauge the importance of an article, a person, a research group and an institution by looking at how many Wikipedia articles cite them.

These are the motivations behind the “Wikipedia Citations in Wikidata” project , supported by a grant from the WikiCite Initiative. From January 2021 until the end of April, the team of Silvio Peroni (co-founder and director of OpenCitations), Giovanni Colavizza, Marilena Daquino, Gabriele Pisciotta and Simone Persiani of the University of Bologna (Department of Classical and Italian Philology) worked on the development of a codebase to enrich Wikidata with citations to academic publications that are currently referenced in English in Wikipedia . This codebase is divided into four software modules in Python and integrates new components (a classifier to distinguish citations based on the cited source and a search module to equip citations with identifiers from Crossref or other APIs). In doing so, Wikipedia Citations extends previous work that focused only on citations that already have identifiers….”

A Study of the Quality of Wikidata | DeepAI

Abstract:  Wikidata has been increasingly adopted by many communities for a wide variety of applications, which demand high-quality knowledge to deliver successful results. In this paper, we develop a framework to detect and analyze low-quality statements in Wikidata by shedding light on the current practices exercised by the community. We explore three indicators of data quality in Wikidata, based on: 1) community consensus on the currently recorded knowledge, assuming that statements that have been removed and not added back are implicitly agreed to be of low quality; 2) statements that have been deprecated; and 3) constraint violations in the data. We combine these indicators to detect low-quality statements, revealing challenges with duplicate entities, missing triples, violated type rules, and taxonomic distinctions. Our findings complement ongoing efforts by the Wikidata community to improve data quality, aiming to make it easier for users and editors to find and correct mistakes.

 

Reimagining Wikidata from the margins | document/manifesto | June-October 2021

“Wikidata’s ecosystem has been rapidly growing over the last nine years …[but] we’re still missing Global South and other marginalized communities from the North – both in data and in contributors….

Reimagining Wikidata from the margins is a process that precedes WikidataCon 2021 and that potentially will continue after the conference in October. This is an invitation to communities and individuals from underrepresented groups (plus good allies!) so we can better understand and envision possible ways for our meaningful and empowered agency in the broad Wikidata ecosystem!

Objectives

Awaken the debate within a diverse range of communities from Global South and other marginalized communities
Foment bonds between communities in similar contexts
Identify specific and general problems/challenges/needs/expectations from those communities regarding Wikidata
Elaborate collaboratively a strategy for decentering Wikidata from its current focus on North America and Europe
Consolidate a document/manifesto
Integrate to the conference program this decentering perspectives and the Global South voice…

How will it work?…After the rounds of conversations, we will gather a group of volunteers who participated in them to draft a document, summarizing the discussions and looking for possible ways to further integrate data and volunteers from marginalized communities on Wikidata. The document will be released during WikidataCon 2021, in the hope that it will be an initial step towards effectively decentering Wikidata and lifting up underrepresented voices….

Timeline:

June/July: First round of discussions with local communities
August: Round of thematic meetings with people from different locations
September: A volunteer committee engaged on previous discussions gather for writing a document/manifesto
October: Final review of the document and launching at WikidataCon 2021…”

WikidataCon 2021 | Distributed conference | 29-30-31 October 2021

WikidataCon 2021 | Distributed conference | 29-30-31 October 2021

“Save the date! After the first two editions in 2017 and 2019, the WikidataCon is taking place again in October 2021.

The WikidataCon is an event focused on the Wikidata community in a broad sense: editors, tools builders, but also 3rd party reusers, partner organizations that are using or contributing to the data, the ecosystem of organizations working with Wikibase. The content of the conference will have some parts dedicated to people who want to learn more about Wikidata, some workshops and discussions for the community to share skills and exchange about their practices, and some space left to include side events for specific projects (WikiCite, Wikibase, GLAM, etc.).

Important: as the global COVID pandemic is still hitting the world, and the forecast for 2021 doesn’t indicate much improvement, the situation doesn’t allow us to plan a traditional onsite international conference. In 2021, we will not gather all participants in Berlin, and we will avoid any international travel. Instead, we are experimenting with a hybrid format for the conference: most of the content and interactions will take place online, and small, local gatherings will be possible, if the situation allows it….”

Developing a scalable framework for partnerships between health agencies and the Wikimedia ecosystem

Abstract:  In this era of information overload and misinformation, it is a challenge to rapidly translate evidence-based health information to the public. Wikipedia is a prominent global source of health information with high traffic, multilingual coverage, and acceptable quality control practices. Viewership data following the Ebola crisis and during the COVID-19 pandemic reveals that a significant number of web users located health guidance through Wikipedia and related projects, including its media repository Wikimedia Commons and structured data complement, Wikidata.

The basic idea discussed in this paper is to increase and expedite health institutions’ global reach to the general public, by developing a specific strategy to maximize the availability of focused content into Wikimedia’s public digital knowledge archives. It was conceptualized from the experiences of leading health organizations such as Cochrane, the World Health Organization (WHO) and other United Nations Organizations, Cancer Research UK, National Network of Libraries of Medicine, and Centers for Disease Control and Prevention (CDC)’s National Institute for Occupational Safety and Health (NIOSH). Each has customized strategies to integrate content in Wikipedia and evaluate responses.

We propose the development of an interactive guide on the Wikipedia and Wikidata platforms to support health agencies, health professionals and communicators in quickly distributing key messages during crisis situations. The guide aims to cover basic features of Wikipedia, including adding key health messages to Wikipedia articles, citing expert sources to facilitate fact-checking, staging text for translation into multiple languages; automating metrics reporting; sharing non-text media; anticipating offline reuse of Wikipedia content in apps or virtual assistants; structuring data for querying and reuse through Wikidata, and profiling other flagship projects from major health organizations.

In the first phase, we propose the development of a curriculum for the guide using information from prior case studies. In the second phase, the guide would be tested on select health-related topics as new case studies. In its third phase, the guide would be finalized and disseminated.

Adding DOIs to Chinese scientific articles on Wikidata – Diff

“In recent years, China has become the world’s largest producer of scientific articles. However, bibliographic data of Chinese scientific articles are still very limited on Wikidata, because most commonly used bibliographic databases on Wikidata, such as Crossref, do not include articles published in most Chinese academic journals. There have been some efforts to import Chinese articles from China National Knowledge Infrastructure (CNKI), the most comprehensive database for Chinese articles. But for most articles on CNKI, there is no DOI information (the exception is when the resolved pages are hosted by CNKI itself). DOI is a key component for developing a database of open citations and linked bibliographic data, so it would be very beneficial to collect such information and add them to Wikidata. There is no available database containing comprehensive DOIs for Chinese articles, and many DOIs can only be found on journals’ official websites. A WikiCite e-scholarship was granted to develop a tool to collect those data scattered across different journal websites. During the development, more than 20,000 DOIs have already been added to Wikidata….”

Read, Hot & Digitized: Visualizing Wikipedia’s Gender Gap | TexLibris

“However, Wikipedia has a long-standing problem of gender imbalance both in terms of article content and editor demographics. Only 18% of content across Wikimedia platforms are about women. The gaps on content covering non-binary and transgender individuals are even starker: less than 1% of editors identify as trans, and less than 1% of biographies cover trans or nonbinary individuals. When gender is combined with other factors, such as race, nationality, or ethnicity, the numbers get even lower. This gender inequity has long been covered in the scholarly literature via editor surveys and analysis of article content (Hill and Shaw, 2013; Graells-Garrido, Lalmas, and Menczer, 2015; Bear and Collier, 2016; Wagner, Graells-Garrido, Garcia, and Menczer, 2016; Ford and Wajcman, 2017). To visualize these inequalities in nearly real time, the Humaniki tool was developed….”

Knowledge curation work in Wikidata WikiProject discussions | Emerald Insight

Abstract:  Purpose

The purpose of this paper is to investigate how editors participate in Wikidata and how they organize their work.

Design/methodology/approach

This qualitative study used content analysis of discussions involving data curation and negotiation in Wikidata. Activity theory was used as a conceptual framework for data collection and analysis.

Findings

The analysis identified six activities: conceptualizing the curation process, appraising objects, ingesting objects from external sources, creating collaborative infrastructure, re-organizing collaborative infrastructure and welcoming newcomers. Many of the norms and rules that were identified help regulate the activities in Wikidata.

Research limitations/implications

This study mapped Wikidata activities to curation and ontology frameworks. Results from this study provided implications for academic studies on online peer-curation work.

Practical implications

An understanding of the activities in Wikidata will help inform communities wishing to contribute data to or reuse data from Wikidata, as well as inform the design of other similar online peer-curation communities, scientific research institutional repositories, digital archives and libraries.

Originality/value

Wikidata is one of the largest knowledge curation projects on the web. The data from this project are used by other Wikimedia projects such as Wikipedia, as well as major search engines. This study explores an aspect of Wikidata WikiProject editors to the author’s knowledge has yet to be researched.

Visualizing the research ecosystem via Wikidata and Scholia | Zenodo

“Research takes place in a sociotechnical ecosystem that connects researchers with the objects of study and the natural and cultural worlds around them.

Wikidata is a community-curated open knowledge base in which concepts covered in any Wikipedia — and beyond — can be described and annotated collaboratively.

This session is devoted to demoing Scholia, an open-source tool to visualize the global research ecosystem based on information in Wikidata about research fields, researchers, institutions, funders, databases, locations, publications, methodologies and related concepts….”

Call 2020 Librarian Community Call – OpenCon

“This talk will focus on discussing the Scholarly Profiles as Service (SPaS) model developed and implemented at Indiana University-Purdue University Indianapolis. The SPaS model aims to provide representation for IUPUI-affiliated faculty and their scholarly research in Wikidata. By sharing these data in the knowledge base, IUPUI University Library is actively contributing to the growth of the bibliographic citation ecosystem in a repository that is free and open.

 

This call brings together all librarians working with, or learning about, all things Open–and gives folks an opportunity to connect with each other to better their work and librarianship. …”

Award ceremony for the best PhD theses during the IwZ’2020 ? Lewoniewski

“In the 23rd Edition of the Scientific Competition of the Economic Informatics Society in the group of doctoral dissertations, the third place was awarded to the work “The method of comparing and enriching information in multilingual wikis based on the analysis of their quality“. Author of the thesis: Dr. W?odzimierz Lewoniewski; The thesis supervisor: Prof. Witold Abramowicz; the auxiliary supervisor: Prof. Krzysztof W?cel.

The doctoral dissertation presents methods and tools that allow to determine the values of quality measures on the basis of data in various formats and with the use of various sources. As part of scientific research, data with a total volume of over 10 terabytes were analyzed and over a billion values of information quality measures were obtained from the multilingual Wikipedia. The automatic quality assessment models presented in the doctoral dissertation can be used not only to automatically enrich various language versions of Wikipedia, but also to enrich other knowledge bases (such as DBpedia, Wikidata) with information of better quality….”