Library as Laboratory: Analyzing Biodiversity Literature at Scale

“Imagine the great library of life, the library that Charles Darwin said was necessary for the “cultivation of natural science” (1847). And imagine that this library is not just hundreds of thousands of books printed from 1500 to the present, but also the data contained in those books that represents all that we know about life on our planet. That library is the Biodiversity Heritage Library (BHL) The Internet Archive has provided an invaluable platform for the BHL to liberate taxonomic names, species descriptions, habitat description and much more. Connecting and harnessing the disparate data from over five-centuries is now BHL’s grand challenge. The unstructured textual data generated at the point of digitization holds immense untapped potential. Tim Berners-Lee provided the world with a semantic roadmap to address this global deluge of dark data and Wikidata is now executing on his vision. As we speak, BHL’s data is undergoing rapid transformation from legacy formats into linked open data, fulfilling the promise to evaporate data silos and foster bioliteracy for all humankind….”

Data Reuse Days 2022, March 14-24, 2022 | Wikidata Events

The Data Reuse Days are a series of gatherings taking place from March 14th to 24th, 2022, focusing on Wikidata data reuse and reusers. With presentations, discussions, editing sprints and more, the main goal of this event is to provide a space to bring together anyone interested in the topic of re-using Wikidata’s data. This means for example:

to gather people who reuse Wikidata’s data (on products, apps, websites, research, etc.) in order to understand better what they are building and what their needs and wishes regarding Wikidata’s data and technical infrastructure are
to bring together data reusers and data editors to talk about issues, wishes and common efforts, so each group can hear the other’s point of view on things to improve (data quality, ontologies, etc.)
to onboard developers who want to build applications on top of Wikidata’s data, as well as editors from other Wikimedia projects

How to participate?

This initiative is open to everyone who’s interested in reusing Wikidata’s data. As a first step, you can add yourself to the list of participants.
On this page, you will find an overview of the schedule and resources to get started. You’re welcome to join any session that you find interesting.
This initiative is coordinated by Lea Lacroix (WMDE) but its content is community-powered: if you want to organize a session, work on a project, help with documentation, you’re very welcome to add it to the schedule (instructions TBD).


Examining Wikidata and Wikibase in the context of research data management applications | TIB-Blog

For several months now, our team at the Open Science Lab has been working with Wikibase to provide research data management services for the NFDI4Culture community. We have already shown its advantages when it comes to real world data with a specific use case for architectural and art historical data [1, 2]. At the monthly NFDI InfraTalk last week, there was an interesting question at the end of the session regarding the potential of Wikidata to be used as an application for science and research. We take this as an opportunity to expand the answer to this question with some more details about Wikidata, its potential applications, its relation to standalone Wikibase instances, and what Wikibase can offer in its own right.

Representing COVID-19 information in collaborative knowledge graphs: a study of Wikidata | Zenodo

Abstract:  Information related to the COVID-19 pandemic ranges from biological to bibliographic and from geographical to genetic. Wikidata is a vast interdisciplinary, multilingual, open collaborative knowledge base of more than 88 million entities connected by well over a billion relationships and is consequently a web-scale platform for broader computer-supported cooperative work and linked open data. Here, we introduce four aspects of Wikidata that make it an ideal knowledge base for information on the COVID-19 pandemic: its flexible data model, its multilingual features, its alignment to multiple external databases, and its multidisciplinary organization. The structure of the raw data is highly complex, so converting it to meaningful insight requires extraction and visualization, the global crowdsourcing of which adds both additional challenges and opportunities. The created knowledge graph for COVID-19 in Wikidata can be visualized, explored and analyzed in near real time by specialists, automated tools and the public, for decision support as well as educational and scholarly research purposes via SPARQL, a semantic query language used to retrieve and process information from databases saved in Resource Description Framework (RDF) format.


Wikidata as a Tool for Mapping Investment in Open Infrastructure

“This working paper shares the results of research conducted to explore Wikidata’s current coverage of the domain of open infrastructure and investment therein. The research question investigates whether Wikidata, a collaboratively edited and multilingual knowledge graph hosted by the Wikimedia Foundation, is a viable prospect for hosting investment flow data for open infrastructure. 

At present, Wikidata partially describes the domain. Coordinated efforts to collectively define relevant data categories, relationships, and values, and to align distributed editing will help to improve coverage.

This study was conducted as part of a Research Fellowship with Invest in Open Infrastructure (IOI), and is generously supported by the Alfred P. Sloan Foundation. 

We invite feedback and comments directly in this document. Please feel free to add your thoughts via the commenting function. Have questions? Contact us. …”

Massive open index of scholarly papers launches

“An ambitious free index of more than 200 million scientific documents that catalogues publication sources, author information and research topics, has been launched.

The index, called OpenAlex after the ancient Library of Alexandria in Egypt, also aims to chart connections between these data points to create a comprehensive, interlinked database of the global research system, say its founders. The database, which launched on 3 January, is a replacement for Microsoft Academic Graph (MAG), a free alternative to subscription-based platforms such as Scopus, Dimensions and Web of Science that was discontinued at the end of 2021.

“It’s just pulling lots of databases together in a clever way,” says Euan Adie, founder of Overton, a London-based firm that tracks the research cited in policy documents. Overton had been getting its data from various sources, including MAG, ORCID, Crossref and directly from publishers, but has now switched to using only OpenAlex, in the hope of making the process easier….”

Wikidata for Digital Preservationists: New DPC Technology Watch Guidance Note now available on general release | Digital Preservation Coalition

The Digital Preservation Coalition (DPC) has made the next in its series of Technology Watch Guidance Notes, on Wikidata for Digital Preservationists by Katherine Thornton, available on general release today.

Wikidata for Digital Preservationists gives a brief introduction to Wikidata before continuing to provide practical advice on using, contributing, describing and curating data entries to enable storage and access to trusted data.

This new Technology Watch Guidance Note and the rest of the series complements the DPC’s popular Technology Watch Reports and is designed to be a ‘bite-sized’ paper that might contain information about a problem, a solution, or a particular implementation of digital preservation and will provide a short briefing on advanced digital preservation topics.

Community Wishlist Survey 2022 | Wikimedia | proposals due January 23

“What is the Community Wishlist Survey? The Community Wishlist Survey is an annual survey that allows contributors to the Wikimedia projects to propose and vote for tools and platform improvements.


All phases of the survey begin and end at 18:00 UTC.

Phase 1 January 10 – January 23, 2022 Submit, discuss and revise proposals 

Phase 2 January 17 – January 28, 2022 Community Tech reviews and organizes proposals

Phase 3 January 28 – February 11, 2022 Vote on proposals

Phase 4 February 14, 2022 Results posted…”

Wikidata for Education Tickets, Tue 30 Nov 2021 at 1pm (GMT) | Eventbrite

This session is for educators who are interested in how to use Wikidata in the classroom.

About this event

Wikidata is a community-edited, freely accessible knowledge base. It is much easier to understand and has more interesting and varied content than most databases, so it is a good place to start when considering how knowledge can be represented by computers. It can create interactive educational visualisations on all sorts of topics and adding to Wikidata is already used as a platform for educational assignments. It can give a new lease of life to research outputs by joining them up with other information sources in a connected web. Contributing to Wikidata involves questions of data literacy and data ethics.

This session is for those who are interested in how to use Wikidata in the classroom, either as a platform to explore knowledge representation or to create educational materials. Some experience of Wikidata and other databases is useful but not assumed.

This training session will be held on Zoom, and led by Dr Martin Poulter. Instructions for joining will be sent a week or so before the sessio

Code The City 24 “Open In Practice” Nov 27-28, 2021

The theory of being open is great but what does it mean in practice to work openly, to make data, images, information and code open for others to re-use? And how could that benefit your organisation – or you as an individual?

At this hack event we will explore by practicing how we be more open and support some of the key concepts that Code The City was set up to champion.

We’ll have a number of challenges (which we will list further down this page and expand on as we get nearer the event). These will trigger prototype projects which we will work on in small teams throughout the weekend. These projects will explore

Open Data – creation, curation, finding, improving; data scraping; using the data to build new products and services.
Open Licensing – taking and sharing images with open licences
Open Working – sharing our code on Github for re-use under permissive licences.