TIB at WikidataCon: Part 2

This is the second installment of a 2-part blog post covering the latest edition of WikidataCon, October 29–31st, 2021. Learn more about the conference and its general themes, as well as recent updates to the vision and strategy of Linked Open Data development within the Wikimedia ecosystem in the first part of the blog.

Focus on TIB’s conference contribution

OSL team members participated in 3 presentations on Sunday, October 31st, in the context of the Wikibase and Education and Science tracks. Learn more about each presentation below:

Wikibase as RDM infrastructure within NFDI4Culture

[Wikibase track]

In the first half of this session, OSL’s Ina Blümel and Lucia Sohmen discussed the new minimal-viable-product (MVP) toolchain that we are developing in the context of NFDI4Culture’s Task Area 1 “Data capture and enrichment”. The MVP architecture relies on Wikibase to store and structure contextual metadata and user-contributed annotations for 3D models and reconstructions of cultural assets.

Diagram of the MVP architecture
Simplified diagram representing the MVP architecture. Credit: Lozana Rossenova. CC-BY-SA 4.0

With a view towards sustainability, the MVP development aligns with the overall strategy of the Wikimedia Movement to support a decentralized ecosystem of federated Wikibase instances wherein data from Wikidata (and other data sources) is re-used and re-contextualized for specialist domains (e.g. cultural heritage). It further contributes to needs identified by various communities for additional, domain-specific extensions, tools, and user interfaces around the Wikibase software. In Phase 1 of development, we designed an accessible data upload pipeline which streamlines the metadata upload process (via the open source software OpenRefine, see below). What is more, we developed a custom branch of the open source software Kompakkt to serve as an extended frontend to the Wikibase repository. With Kompakkt, users can upload, view and annotate a range of files and formats of 2D, 3D, and audio-visual media in a modern, user-friendly, web-based interface. The next development phase will introduce the possibility to leverage the Wikibase API and SPARQL endpoint for bulk annotations as well. In the future, the MVP will be open to any project that wants to store, visualize and annotate complex visual data. [See presentation slides here.]

The second part of the session focused on the potential benefits, as well as challenges for using Wikibase in the context of “The 4Culture Knowledge Graph” (part of Task Area 5 which TIB also contributes to with staff and infrastructure resources). The presentation was delivered by Harald Sack from FIZ Karlsruhe, and provided insights as to the need for formal semantics to be an integral part of the 4Culture Knowledge Graph, and not just an add-on. Wikibase and its MediaWiki GUI increase user accessibility to LOD and offer opportunities for collaboration and community engagement, which are important incentives for broader adoption within the NFDI consortium. At the same time, the lack of native semantics and W3C standard vocabularies (RDF, RDFS, OWL) in Wikibase, negatively impacts interoperability, data reuse and federation outside the ‘bubble’ of the Wikidata/Wikibase ecosystem. The presentation offered several mitigation strategies for addressing the issue of formal semantics that are currently being tested and evaluated at FIZ Karlsruhe; these included: declarative semantic mapping, data import/export (via the triple store), and development of a dedicated semantic extension. The results of evaluating the workaround tactics will be published as Guidelines and Best Practices to enable the NFDI4Culture community to share their data resources within a federated Knowledge Graph via Wikibase instances. [Download presentation slides here.]

Using OpenRefine with arbitrary Wikibase instances

[Wikibase track]

Building on from the presentation of the 3D annotation MVP toolchain, Lozana Rossenova and Lucia Sohmen delivered a lightning talk which expanded on the data pipeline developed for the MVP. The talk focused on the role of OpenRefine in the data pipeline. OpenRefine allows users to clean data, transform it and reconcile against other open data sources, like Wikidata. It also makes it possible to directly upload to, as well as pull data from Wikidata. Recently, this functionality was extended to make it possible to connect to any Wikibase instance. However, this requires additional server-side and frontend configurations. Much of this is not yet fully documented, so with this presentation we aimed to provide a succinct overview of all necessary steps in the process. We also presented a service box, developed at TIB, that automates the server-side setup.

We demoed the steps users need to perform in the frontend and tested uploading sample data to a Wikibase instance. Given that the version of OpenRefine that allows Wikibase connection is still a beta pre-release, we did encounter some bugs during the live demonstration. Fortunately, OpenRefine’s lead developer Antonin Delpeuch was also in the audience and took note of it. We plan to work closely with the OpenRefine team to help with their documentation and bug testing efforts since OpenRefine is an essential part of our data upload pipeline. And in the spirit of “it takes a village to raise a tool” (see Part 1 of this blog post),  we want to support a tool that plays a vital role across many community projects within the Wikimedia Movement at large, as well as the 4Culture community more narrowly. [See presentation slides here.]

Integration of Wikidata 4OpenGLAM into data and information science curricula

[Education and Science track]

It is not new that Wikidata and OpenRefine are used in academia, as they are good tools for teaching data science skills. There are many examples of this and a lot of material that can be used in teaching and for self-study. In this presentation, Ina Blümel showcased several new online resources which were developed last semester as part of a project on linking and visualising cultural heritage data using Wikidata and two Data Science courses at Hannover University of Applied Sciences and Arts for and with students of information science.

Exemplary student work from a SPARQL and visualisation task
Exemplary student work from a SPARQL and visualisation task. Source: Slide deck (see link below).

We focussed on the description and discussion of how to integrate student work and the projects of the Open Science Lab (9 projects in total, out of which 6 are in OpenGLAM, and 4 use Wikidata and/or Wikibase) and on how to motivate students to engage with more advanced tasks in the field of cultural heritage. Lucia Sohmen presented tasks she designed for one of the courses to teach students different ways of interacting with open data. These included download via an API (OAI-PMH) and by scraping IIIF manifests using a Python library; cleaning and transforming data followed by uploading it to Wikidata – all through OpenRefine; and querying and visualizing their data by using Wikidata’s SPARQL interface. [See presentation slides here.]

Wikidata & Education: A Global Panel

[Education track]

During this panel, Houcemeddine Turki, Research Assistant at the Data Engineering and Semantics Research Unit based at the University of Sfax, Tunisia, showcased a joint research proposal of the DES Unit and OSL, which involves the use of Wikidata in OSL’s Book Sprints. This proposal was developed with Christian Hauschke, Lambert Heller and Simon Worthington from OSL, in collaboration with researchers from several other institutions.

Outlook

The next event where many of these topics will be presented is the Culture Community Plenary. If you want to stay up to date, you can follow Open Science Lab on Twitter and sign up for the NFDI4Culture newsletter.

Der Beitrag TIB at WikidataCon: Part 2 erschien zuerst auf TIB-Blog.

The post TIB at WikidataCon: Part 2 first appeared on Leibniz Research Alliance Open Science.

TIB at WikidataCon: Part 1

Reflecting on questions of sustainability, growing the ecosystem of decentralized data repositories and ensuring knowledge equity

Introduction

This year WikidataCon marked the 9th birthday of Wikidata: “a free, collaborative, multilingual knowledge base with a focus on verifiability” [1]. The biennial conference took place online across all timezones between October 29-31st, opening up participation to a global audience. The conference included 142 sessions, roughly 80 hours of programming and over 700 unique visitors who checked into the event platform Venueless [2]. Beyond the numbers, this conference marks the growth of Wikidata into a mature product – part of the family of applications developed and maintained by the Wikimedia Movement – as well as the growth of a dedicated community of “project shapers”, “gardeners”, and “re-users” [3].

Shortly before the opening of the conference, Wikimedia Germany (the primary maintainers of Wikidata) and the Wikimedia Foundation published updated documents for their 2021 Strategy regarding the development of Linked Open Data within the Wikimedia movement and the vision for the development of Wikidata, their flagship LOD platform, as well as Wikibase – the underlying software which can enable a decentralized ecosystem of LOD data repositories to grow. The strategy documents focus on several key areas that were reflected in the programming of the conference as well. Below we provide a short overview of these.

Diagramme showing an ecosystem of decentralized Wikibase knowledge bases.
A view of the Wikimedia Linked Open Data web. Credit: Dan Shick (WMDE) / CC-BY-SA 4.0

Focus on services

There is a strong thread throughout the strategy documents as well as the conference programming that focuses on the scalable and sustainable provision of knowledge services. This includes the acknowledgement that making data in Wikidata easy to find and re-use with a high degree of trust in its quality relies on a range of additional tools and interfaces that need to easily connect with Wikidata via new and improved APIs. Sessions in the conference that focused on this topic, included:

Another key aspect of the focus on services is the scalability of the current query service that Wikidata provides (WDQS), which has been under significant strain as the knowledge graph has grown over the past years. In the spirit of openness, the members of the technical teams of Wikidata and the Search Platform at Wikimedia offered an overview of current issues and a view for the future on how they plan to manage the risks of rapid scaling and system overload in two dedicated conference sessions. Besides short-term solutions, one of the key strategies for longer term scalability that was discussed was decentralization and federation across multiple data stores.

Last but not least, reliable service provision requires sustainable tool ecosystem management – a particular challenge to large open source software movements relying on a high degree of self-initiative and volunteer labour. A dedicated panel session brought together the perspectives of tool developers, maintainers, volunteers and WMF officials around the same (virtual) table at the conference to discuss this issue. A day before the session, a member of the tool development community published a related blog post analysing the current challenges facing WMF and its tool environment, and proposed relevant mitigation tactics, including the focus on collaboration and harnessing the contributions of non-technical volunteer support:

It takes a village to raise a tool ? and various specialties ranging from product ownership, design, development, operations, testing, QA, security, documentation… ?  yet more often than not, a single person is behind a tool. ~ Jean-Frédéric [4]

2x2 matrix diagram for prioritizing tool support needs in the Wikimedia ecosystem
2×2 matrix for prioritizing tool support needs, drafted by Andrew Lih and shared during the sustainable tool ecosystem management panel session.

Focus on equity 

Sustainability was indeed the main theme of the conference, but sustainability was discussed also in the context of a parallel initiative: Reimagining Wikidata from the margins [5]. This year, besides a focus on the technical, the new strategy documents and the conference as a whole had an explicitly social focus, too ? acknowledging the various inequities endemic to all open movements that rely on contributions from volunteers with access to technical skills, digital literacy, financial means and leisure time, among other forms of social privilege. What this meant in practice was that the conference was co-organized in partnership with the Wiki Movimento Brasil and there were many sessions aimed explicitly at representation of a diversity of national, ethnic and linguistic backgrounds, for example:

These sessions aimed to amplify a plurality of voices traditionally marginalized by the domination of organisations and communities from (primarily) North America and Western Europe in the decision-making and data (re)use policies and practices around Wikidata and the Wikimedia movement in general. Crucially, the conference engaged with the question of equity beyond simply the issue of representation. The opening keynote ‘Decolonizing Wikidata: why does knowledge justice matter for structured data’ was delivered by Anasuya Sengupta, an Indian feminist activist, scholar, and long-time Wikimedian. Throughout the keynote and in subsequent sessions, Sengupta provided a nuanced analysis of the state of the Wikimedia movement, the call to decolonize, and the need to move away from universalizing ideas around what a global knowledge base should look like. A clear message throughout these thought-provoking sessions was the need to focus on decentralization, and to allow for an interlinked ? but also non-universalizing ? ecosystem of plural community knowledge bases and plural ontologies to be sustained.

The ideas of: 1) decentralization, 2) sustainability through broad community engagement, and 3) recognition of the importance of bringing together diverse perspectives to the movement as a whole, and the development of software tools like Wikidata and Wikibase in particular; were all highlighted throughout the second and third day of the conference with the community tracks spanning 10 different topics including: Sustainability, GLAM, Education and Science, and more [6].

Focus on Wikibase track

Of particular significance to our work at the Open Science Lab at TIB were the GLAM and Education and Science tracks, as well as the track dedicated to Wikibase. OSL’s researcher Lozana Rossenova, serving as Wikibase community manager for NFDI4Culture, was invited by Wikimedia Germany to co-curate and help facilitate the programme for the Wikibase track. The programme for this track provided an opportunity to learn more about the latest research-led and institutional projects featuring Wikibase; get inspiration from diverse use-cases; and learn more about latest developments in the tool ecosystem around Wikibase. The track featured an introduction to the Wikibase Stakeholder Group, a new cross-institutional effort – including TIB – which was established to secure further development and long-term sustainability of Wikibase and related extensions. Furthermore, a presentation by Adam Shorland (Tech Lead for Wikidata and Wikibase at Wikimedia Germany) and Sam Alipio (Product Manager for Wikibase Ecosystem at Wikimedia Germany) announced a new service launching in 2022 – wikibase.cloud, which will aim to fulfill the need to easily deploy and manage cloud-based services for independent Wikibase users. At TIB, we will be working closely with the team at Wikimedia Germany to evaluate how wikibase.cloud can help meet the needs of our research partners in ongoing programs at OSL and NFDI4Culture.

OSL team members participated in 3 presentations on the final day of the conference – Sunday, October 31st, in the context of the Wikibase and Education and Science tracks. Learn more about the presentations in the second part of this blog post.

 

Endnotes

[1] Source: https://meta.wikimedia.org/wiki/LinkedOpenData/Strategy2021/Wikidata

[2] Stats provided by Léa Lacroix, Community Engagement Coordinator at Wikimedia Germany.

[3] Source: https://meta.wikimedia.org/wiki/LinkedOpenData/Strategy2021/Wikidata

[4] Berthelot, Jean-Frédéric. 2021. “Where is the technical volunteer support in the Wikiverse?” Available from: https://commonists.wordpress.com/2021/10/29/where-is-the-technical-volunteer-support-in-the-wikiverse/

[5] Source: https://www.wikidata.org/wiki/Wikidata:Reimagining_Wikidata_from_the_margins

[6] Source: https://www.wikidata.org/wiki/Wikidata:WikidataCon_2021/Program/Day_2_and_3_-_Community_tracks

Der Beitrag TIB at WikidataCon: Part 1 erschien zuerst auf TIB-Blog.

The post TIB at WikidataCon: Part 1 first appeared on Leibniz Research Alliance Open Science.