Abstract: Scholia for Software is a project to add software profiling features to Scholia, which is a scholarly profiling service from the Wikimedia ecosystem and integrated with Wikipedia and Wikidata. This document is an adaptation of the funded grant proposal. We are sharing it for several reasons, including research transparency, our wish to encourage the sharing of research proposals for reuse and remixing in general, to assist others specifically in making proposals that would complement our activities, and because sharing this proposal helps us to tell the story of the project to community stakeholders.
A “scholarly profiling service” is a tool which assists the user in accessing data on some aspect of scholarship, usually in relation to research. Typical features of such services include returning the biography of academic publications for any given researcher, or providing a list of publications by topic. Scholia already exists as a Wikimedia platform tool built upon Wikidata and capable of serving these functions. This project will additionally add software-related data to Wikidata, develop Scholia’s own code, and address some ethical issues in diversity and representation around these activities. The end result will be that Scholia will have the ability to report what software a given researcher has described using in their publications, what software is most used among authors publishing on a given topic or in a given journal, what papers describe projects which use some given software, and what software is most often co-used in projects which use a given software.
“In October 2022, we will celebrate the 10th anniversary of Wikidata together! For this special occasion, we are creating a collaborative video that will show people from all around the world celebrating Wikidata’s birthday, sharing wishes and appreciation to the Wikidata community, and why they like Wikidata. We would love to invite you to participate in this video! You will find below more information about how to participate. In short: you can film one or several videos and send them through this form before September 18th. Please make sure that your videos have a maximum size of 1GB and filmed in 30 or 60fps. If you need help with filming the video, feel free to contact us. You can also join one of our workshops….”
“OpenAlex is a free and open Scientific Knowledge Graph (SKG). It contains information describing approximately 230M scholarly works, drawn from both structured (eg: Crossref) and unstructured (eg: institutional repositories, publisher websites) sources, clustered/merged into distinct records, and linked by citations. By parsing work metadata and enriching it with external PID sources (ROR, ORCID, ISSN Network, PubMed, Wikidata, etc), OpenAlex describes and links (approximately) 200M author clusters, 100k institutions, and100k venues (journals and repositories). Using a neural-net classifier, we assign one or more of 50k Wikidata concepts to each work. All source code and ML models are available openly, and data is freely available via a high-performance API, a complete database dump, and a search-engine-style web interface. This talk will describe the construction of OpenAlex, compare it to other SKGs (eg Scopus, MAG), and discuss plans for the future.”
“Wikidata for Scholarly Communication Librarianship was developed for anyone working in an academic library (or interested in working in an academic library) who may have a small or large role in supporting scholarly communication related services. The first two chapters, however, could serve as a basic introduction to Wikidata for anyone in academic librarianship. The remaining three chapters focus on a few topics that may be of more interest to those who work on open metadata, research metrics, and researcher profile projects….”
Abstract: In this article, the authors share the different methods and tools utilized for supporting the Scholarly Profiles as Service (SPaS) model at Indiana University–Purdue University Indianapolis (IUPUI). Leveraging Wikidata to build a scholarly profile service aligns with interests in supporting open knowledge and provides opportunities to address information inequities. The article accounts for the authors’ decision to focus first on profiles for women scholars at the university and provides a detailed case study of how these profiles are created. By describing the processes of delivering the service, the authors hope to inspire other academic libraries to work toward establishing stronger open data connections between academic institutions, their scholars, and their scholars’ publications.
Abstract: Large public knowledge graphs, like Wikidata, contain billions of statements about tens of millions of entities, thus inspiring various use cases to exploit such knowledge graphs. However, practice shows that much of the relevant information that fits users’ needs is still missing in Wikidata, while current linked open data (LOD) tools are not suitable to enrich large graphs like Wikidata. In this paper, we investigate the potential of enriching Wikidata with structured data sources from the LOD cloud. We present a novel workflow that includes gap detection, source selection, schema alignment, and semantic validation. We evaluate our enrichment method with two complementary LOD sources: a noisy source with broad coverage, DBpedia, and a manually curated source with narrow focus on the art domain, Getty. Our experiments show that our workflow can enrich Wikidata with millions of novel statements from external LOD sources with a high quality. Property alignment and data quality are key challenges, whereas entity alignment and source selection are well-supported by existing Wikidata mechanisms. We make our code and data available to support future work.
Abstract: Biological taxonomy rests on a long tail of publications spanning nearly three centuries. Not only is this literature vital to resolving disputes about taxonomy and nomenclature, for many species it represents a key source—indeed sometimes the only source—of information about that species. Unlike other disciplines such as biomedicine, the taxonomic community lacks a centralised, curated literature database (the “bibliography of life”). This article argues that Wikidata can be that database as it has flexible and sophisticated models of bibliographic information, and an active community of people and programs (“bots”) adding, editing, and curating that information.
Abstract: Contemporary bioinformatic and chemoinformatic capabilities hold promise to reshape knowledge management, analysis and interpretation of data in natural products research. Currently, reliance on a disparate set of non-standardized, insular, and specialized databases presents a series of challenges for data access, both within the discipline and for integration and interoperability between related fields. The fundamental elements of exchange are referenced structure-organism pairs that establish relationships between distinct molecular structures and the living organisms from which they were identified. Consolidating and sharing such information via an open platform has strong transformative potential for natural products research and beyond. This is the ultimate goal of the newly established LOTUS initiative, which has now completed the first steps toward the harmonization, curation, validation and open dissemination of 750,000+ referenced structure-organism pairs. LOTUS data is hosted on Wikidata and regularly mirrored on https://lotus.naturalproducts.net. Data sharing within the Wikidata framework broadens data access and interoperability, opening new possibilities for community curation and evolving publication models. Furthermore, embedding LOTUS data into the vast Wikidata knowledge graph will facilitate new biological and chemical insights. The LOTUS initiative represents an important advancement in the design and deployment of a comprehensive and collaborative natural products knowledge base.
“The Smithsonian Libraries and Archives’ Art and Artist Files collection is a dynamic and valuable resource for art historical research. In total, the Smithsonian has hundreds of thousands of physical files, containing millions of ephemeral items: newspaper clippings, press releases, brochures, invitations, and so much more. The files hold information on artists, art collectives, and galleries, but in formats that would normally have been tossed out, being too small to catalog and shelve in a library in the usual way. Because these special items fall between the cracks of typical library and research organizational practices, libraries that collect these materials are coming up with innovative ways to make their contents discoverable to a wider world. Which made them a wonderful collection to experiment with as a part of our Smithsonian Libraries and Archives Wikidata pilot projects! …”
OpenAlex is a new, fully-open scientific knowledge graph (SKG), launched to replace the discontinued Microsoft Academic Graph (MAG). It contains metadata for 209M works (journal articles, books, etc); 2013M disambiguated authors; 124k venues (places that host works, such as journals and online repositories); 109k institutions; and 65k Wikidata concepts (linked to works via an automated hierarchical multi-tag classifier). The dataset is fully and freely available via a web-based GUI, a full data dump, and high-volume REST API. The resource is under active development and future work will improve accuracy and coverage of citation information and author/institution parsing and deduplication.
“Imagine the great library of life, the library that Charles Darwin said was necessary for the “cultivation of natural science” (1847). And imagine that this library is not just hundreds of thousands of books printed from 1500 to the present, but also the data contained in those books that represents all that we know about life on our planet. That library is the Biodiversity Heritage Library (BHL) The Internet Archive has provided an invaluable platform for the BHL to liberate taxonomic names, species descriptions, habitat description and much more. Connecting and harnessing the disparate data from over five-centuries is now BHL’s grand challenge. The unstructured textual data generated at the point of digitization holds immense untapped potential. Tim Berners-Lee provided the world with a semantic roadmap to address this global deluge of dark data and Wikidata is now executing on his vision. As we speak, BHL’s data is undergoing rapid transformation from legacy formats into linked open data, fulfilling the promise to evaporate data silos and foster bioliteracy for all humankind….”
The Data Reuse Days are a series of gatherings taking place from March 14th to 24th, 2022, focusing on Wikidata data reuse and reusers. With presentations, discussions, editing sprints and more, the main goal of this event is to provide a space to bring together anyone interested in the topic of re-using Wikidata’s data. This means for example:
to gather people who reuse Wikidata’s data (on products, apps, websites, research, etc.) in order to understand better what they are building and what their needs and wishes regarding Wikidata’s data and technical infrastructure are
to bring together data reusers and data editors to talk about issues, wishes and common efforts, so each group can hear the other’s point of view on things to improve (data quality, ontologies, etc.)
to onboard developers who want to build applications on top of Wikidata’s data, as well as editors from other Wikimedia projects
How to participate?
This initiative is open to everyone who’s interested in reusing Wikidata’s data. As a first step, you can add yourself to the list of participants.
On this page, you will find an overview of the schedule and resources to get started. You’re welcome to join any session that you find interesting.
This initiative is coordinated by Lea Lacroix (WMDE) but its content is community-powered: if you want to organize a session, work on a project, help with documentation, you’re very welcome to add it to the schedule (instructions TBD).
For several months now, our team at the Open Science Lab has been working with Wikibase to provide research data management services for the NFDI4Culture community. We have already shown its advantages when it comes to real world data with a specific use case for architectural and art historical data [1, 2]. At the monthly NFDI InfraTalk last week, there was an interesting question at the end of the session regarding the potential of Wikidata to be used as an application for science and research. We take this as an opportunity to expand the answer to this question with some more details about Wikidata, its potential applications, its relation to standalone Wikibase instances, and what Wikibase can offer in its own right.