Wikidata for Scholarly Communication Librarianship – Simple Book Publishing

“Wikidata for Scholarly Communication Librarianship was developed for anyone working in an academic library (or interested in working in an academic library) who may have a small or large role in supporting scholarly communication related services. The first two chapters, however, could serve as a basic introduction to Wikidata for anyone in academic librarianship. The remaining three chapters focus on a few topics that may be of more interest to those who work on open metadata, research metrics, and researcher profile projects….”


View of Leveraging Wikidata to Build Scholarly Profiles as Service | KULA: Knowledge Creation, Dissemination, and Preservation Studies

Abstract:  In this article, the authors share the different methods and tools utilized for supporting the Scholarly Profiles as Service (SPaS) model at Indiana University–Purdue University Indianapolis (IUPUI). Leveraging Wikidata to build a scholarly profile service aligns with interests in supporting open knowledge and provides opportunities to address information inequities. The article accounts for the authors’ decision to focus first on profiles for women scholars at the university and provides a detailed case study of how these profiles are created. By describing the processes of delivering the service, the authors hope to inspire other academic libraries to work toward establishing stronger open data connections between academic institutions, their scholars, and their scholars’ publications.


Enriching Wikidata with Linked Open Data

Abstract:  Large public knowledge graphs, like Wikidata, contain billions of statements about tens of millions of entities, thus inspiring various use cases to exploit such knowledge graphs. However, practice shows that much of the relevant information that fits users’ needs is still missing in Wikidata, while current linked open data (LOD) tools are not suitable to enrich large graphs like Wikidata. In this paper, we investigate the potential of enriching Wikidata with structured data sources from the LOD cloud. We present a novel workflow that includes gap detection, source selection, schema alignment, and semantic validation. We evaluate our enrichment method with two complementary LOD sources: a noisy source with broad coverage, DBpedia, and a manually curated source with narrow focus on the art domain, Getty. Our experiments show that our workflow can enrich Wikidata with millions of novel statements from external LOD sources with a high quality. Property alignment and data quality are key challenges, whereas entity alignment and source selection are well-supported by existing Wikidata mechanisms. We make our code and data available to support future work.


The LOTUS initiative for open knowledge management in natural products research | eLife

Abstract:  Contemporary bioinformatic and chemoinformatic capabilities hold promise to reshape knowledge management, analysis and interpretation of data in natural products research. Currently, reliance on a disparate set of non-standardized, insular, and specialized databases presents a series of challenges for data access, both within the discipline and for integration and interoperability between related fields. The fundamental elements of exchange are referenced structure-organism pairs that establish relationships between distinct molecular structures and the living organisms from which they were identified. Consolidating and sharing such information via an open platform has strong transformative potential for natural products research and beyond. This is the ultimate goal of the newly established LOTUS initiative, which has now completed the first steps toward the harmonization, curation, validation and open dissemination of 750,000+ referenced structure-organism pairs. LOTUS data is hosted on Wikidata and regularly mirrored on Data sharing within the Wikidata framework broadens data access and interoperability, opening new possibilities for community curation and evolving publication models. Furthermore, embedding LOTUS data into the vast Wikidata knowledge graph will facilitate new biological and chemical insights. The LOTUS initiative represents an important advancement in the design and deployment of a comprehensive and collaborative natural products knowledge base.


Smithsonian Libraries and Archives & Wikidata: Adding Artist Files to Wikidata – Smithsonian Libraries / Unbound

“The Smithsonian Libraries and Archives’ Art and Artist Files collection is a dynamic and valuable resource for art historical research. In total, the Smithsonian has hundreds of thousands of physical files, containing millions of ephemeral items: newspaper clippings, press releases, brochures, invitations, and so much more. The files hold information on artists, art collectives, and galleries, but in formats that would normally have been tossed out, being too small to catalog and shelve in a library in the usual way. Because these special items fall between the cracks of typical library and research organizational practices, libraries that collect these materials are coming up with innovative ways to make their contents discoverable to a wider world. Which made them a wonderful collection to experiment with as a part of our Smithsonian Libraries and Archives Wikidata pilot projects! …”

Library as Laboratory: Analyzing Biodiversity Literature at Scale

“Imagine the great library of life, the library that Charles Darwin said was necessary for the “cultivation of natural science” (1847). And imagine that this library is not just hundreds of thousands of books printed from 1500 to the present, but also the data contained in those books that represents all that we know about life on our planet. That library is the Biodiversity Heritage Library (BHL) The Internet Archive has provided an invaluable platform for the BHL to liberate taxonomic names, species descriptions, habitat description and much more. Connecting and harnessing the disparate data from over five-centuries is now BHL’s grand challenge. The unstructured textual data generated at the point of digitization holds immense untapped potential. Tim Berners-Lee provided the world with a semantic roadmap to address this global deluge of dark data and Wikidata is now executing on his vision. As we speak, BHL’s data is undergoing rapid transformation from legacy formats into linked open data, fulfilling the promise to evaporate data silos and foster bioliteracy for all humankind….”

Data Reuse Days 2022, March 14-24, 2022 | Wikidata Events

The Data Reuse Days are a series of gatherings taking place from March 14th to 24th, 2022, focusing on Wikidata data reuse and reusers. With presentations, discussions, editing sprints and more, the main goal of this event is to provide a space to bring together anyone interested in the topic of re-using Wikidata’s data. This means for example:

to gather people who reuse Wikidata’s data (on products, apps, websites, research, etc.) in order to understand better what they are building and what their needs and wishes regarding Wikidata’s data and technical infrastructure are
to bring together data reusers and data editors to talk about issues, wishes and common efforts, so each group can hear the other’s point of view on things to improve (data quality, ontologies, etc.)
to onboard developers who want to build applications on top of Wikidata’s data, as well as editors from other Wikimedia projects

How to participate?

This initiative is open to everyone who’s interested in reusing Wikidata’s data. As a first step, you can add yourself to the list of participants.
On this page, you will find an overview of the schedule and resources to get started. You’re welcome to join any session that you find interesting.
This initiative is coordinated by Lea Lacroix (WMDE) but its content is community-powered: if you want to organize a session, work on a project, help with documentation, you’re very welcome to add it to the schedule (instructions TBD).


Examining Wikidata and Wikibase in the context of research data management applications | TIB-Blog

For several months now, our team at the Open Science Lab has been working with Wikibase to provide research data management services for the NFDI4Culture community. We have already shown its advantages when it comes to real world data with a specific use case for architectural and art historical data [1, 2]. At the monthly NFDI InfraTalk last week, there was an interesting question at the end of the session regarding the potential of Wikidata to be used as an application for science and research. We take this as an opportunity to expand the answer to this question with some more details about Wikidata, its potential applications, its relation to standalone Wikibase instances, and what Wikibase can offer in its own right.

Representing COVID-19 information in collaborative knowledge graphs: a study of Wikidata | Zenodo

Abstract:  Information related to the COVID-19 pandemic ranges from biological to bibliographic and from geographical to genetic. Wikidata is a vast interdisciplinary, multilingual, open collaborative knowledge base of more than 88 million entities connected by well over a billion relationships and is consequently a web-scale platform for broader computer-supported cooperative work and linked open data. Here, we introduce four aspects of Wikidata that make it an ideal knowledge base for information on the COVID-19 pandemic: its flexible data model, its multilingual features, its alignment to multiple external databases, and its multidisciplinary organization. The structure of the raw data is highly complex, so converting it to meaningful insight requires extraction and visualization, the global crowdsourcing of which adds both additional challenges and opportunities. The created knowledge graph for COVID-19 in Wikidata can be visualized, explored and analyzed in near real time by specialists, automated tools and the public, for decision support as well as educational and scholarly research purposes via SPARQL, a semantic query language used to retrieve and process information from databases saved in Resource Description Framework (RDF) format.


Wikidata as a Tool for Mapping Investment in Open Infrastructure

“This working paper shares the results of research conducted to explore Wikidata’s current coverage of the domain of open infrastructure and investment therein. The research question investigates whether Wikidata, a collaboratively edited and multilingual knowledge graph hosted by the Wikimedia Foundation, is a viable prospect for hosting investment flow data for open infrastructure. 

At present, Wikidata partially describes the domain. Coordinated efforts to collectively define relevant data categories, relationships, and values, and to align distributed editing will help to improve coverage.

This study was conducted as part of a Research Fellowship with Invest in Open Infrastructure (IOI), and is generously supported by the Alfred P. Sloan Foundation. 

We invite feedback and comments directly in this document. Please feel free to add your thoughts via the commenting function. Have questions? Contact us. …”