Enriching Wikidata with Linked Open Data

Abstract:  Large public knowledge graphs, like Wikidata, contain billions of statements about tens of millions of entities, thus inspiring various use cases to exploit such knowledge graphs. However, practice shows that much of the relevant information that fits users’ needs is still missing in Wikidata, while current linked open data (LOD) tools are not suitable to enrich large graphs like Wikidata. In this paper, we investigate the potential of enriching Wikidata with structured data sources from the LOD cloud. We present a novel workflow that includes gap detection, source selection, schema alignment, and semantic validation. We evaluate our enrichment method with two complementary LOD sources: a noisy source with broad coverage, DBpedia, and a manually curated source with narrow focus on the art domain, Getty. Our experiments show that our workflow can enrich Wikidata with millions of novel statements from external LOD sources with a high quality. Property alignment and data quality are key challenges, whereas entity alignment and source selection are well-supported by existing Wikidata mechanisms. We make our code and data available to support future work.

 

Library as Laboratory: Analyzing Biodiversity Literature at Scale

“Imagine the great library of life, the library that Charles Darwin said was necessary for the “cultivation of natural science” (1847). And imagine that this library is not just hundreds of thousands of books printed from 1500 to the present, but also the data contained in those books that represents all that we know about life on our planet. That library is the Biodiversity Heritage Library (BHL) The Internet Archive has provided an invaluable platform for the BHL to liberate taxonomic names, species descriptions, habitat description and much more. Connecting and harnessing the disparate data from over five-centuries is now BHL’s grand challenge. The unstructured textual data generated at the point of digitization holds immense untapped potential. Tim Berners-Lee provided the world with a semantic roadmap to address this global deluge of dark data and Wikidata is now executing on his vision. As we speak, BHL’s data is undergoing rapid transformation from legacy formats into linked open data, fulfilling the promise to evaporate data silos and foster bioliteracy for all humankind….”

Moving Koha library catalogue into linked data using the LODRefine | Emerald Insight

Abstract:  Purpose

The purpose of this paper is to investigate connected data through the use of open-source technology. It demonstrates the transformation process from library bibliographic data to linked data, which allows for easy searching across numerous collections of information.

Design/methodology/approach

In generating this file, a high-level operating system such as Ubuntu, which is based on the LAMP architecture, is used. It is required to use open-source strategies in building the relevant information. LODRefine is being used to convert all of Koha’s bibliographic data into linked data that is now available on the Web. This framework has been conceptualized and formulated based on linked data principles and search algorithms accordingly.

Findings

Linked data services have been made publicly available to library users by using a variety of different forms of data. Information may be sought quickly and easily using this interface built on numerous search structures. Aside from that, it also meets the needs of users who use the linked data search mechanism to find information. Through modern scripts and algorithms, it is now possible for library users to easily search the linked data enables services.

Originality/value

This paper demonstrates how quickly and easily related data from bibliographic details may be developed and generated using a spreadsheet. The entire procedure culminates in the presence of specialists in the library setting. A further advantage of the SPARQL system is that it allows visitors to group distinct concepts and aspects using independent URIs and URLs instead of the SPARQL endpoint.

Representing COVID-19 information in collaborative knowledge graphs: a study of Wikidata | Zenodo

Abstract:  Information related to the COVID-19 pandemic ranges from biological to bibliographic and from geographical to genetic. Wikidata is a vast interdisciplinary, multilingual, open collaborative knowledge base of more than 88 million entities connected by well over a billion relationships and is consequently a web-scale platform for broader computer-supported cooperative work and linked open data. Here, we introduce four aspects of Wikidata that make it an ideal knowledge base for information on the COVID-19 pandemic: its flexible data model, its multilingual features, its alignment to multiple external databases, and its multidisciplinary organization. The structure of the raw data is highly complex, so converting it to meaningful insight requires extraction and visualization, the global crowdsourcing of which adds both additional challenges and opportunities. The created knowledge graph for COVID-19 in Wikidata can be visualized, explored and analyzed in near real time by specialists, automated tools and the public, for decision support as well as educational and scholarly research purposes via SPARQL, a semantic query language used to retrieve and process information from databases saved in Resource Description Framework (RDF) format.

 

Application of tools to support Linked Open Data | Emerald Insight

Abstract:  Purpose

These projects aim to improve library services for users in the future by combining Link Open Data (LOD) technology with data visualization. It displays and analyses search results in an intuitive manner. These services are enhanced by integrating various LOD technologies into the authority control system.

Design/methodology/approach

The technology known as LOD is used to access, recycle, share, exchange and disseminate information, among other things. The applicability of Linked Data technologies for the development of library information services is evaluated in this study.

Findings

Apache Hadoop is used for rapidly storing and processing massive Linked Data data sets. Apache Spark is a free and open-source data processing tool. Hive is a SQL-based data warehouse that enables data scientists to write, read and manage petabytes of data.

Originality/value

The distributed large data storage system Apache HBase does not use SQL. This study’s goal is to search the geographic, authority and bibliographic databases for relevant links found on various websites. When data items are linked together, all of the data bits are linked together as well. The study observed and evaluated the tools and processes and recorded each data item’s URL. As a result, data can be combined across silos, enhanced by third-party data sources and contextualized.