Library as Laboratory: Analyzing Biodiversity Literature at Scale

“Imagine the great library of life, the library that Charles Darwin said was necessary for the “cultivation of natural science” (1847). And imagine that this library is not just hundreds of thousands of books printed from 1500 to the present, but also the data contained in those books that represents all that we know about life on our planet. That library is the Biodiversity Heritage Library (BHL) The Internet Archive has provided an invaluable platform for the BHL to liberate taxonomic names, species descriptions, habitat description and much more. Connecting and harnessing the disparate data from over five-centuries is now BHL’s grand challenge. The unstructured textual data generated at the point of digitization holds immense untapped potential. Tim Berners-Lee provided the world with a semantic roadmap to address this global deluge of dark data and Wikidata is now executing on his vision. As we speak, BHL’s data is undergoing rapid transformation from legacy formats into linked open data, fulfilling the promise to evaporate data silos and foster bioliteracy for all humankind….”

Moving Koha library catalogue into linked data using the LODRefine | Emerald Insight

Abstract:  Purpose

The purpose of this paper is to investigate connected data through the use of open-source technology. It demonstrates the transformation process from library bibliographic data to linked data, which allows for easy searching across numerous collections of information.


In generating this file, a high-level operating system such as Ubuntu, which is based on the LAMP architecture, is used. It is required to use open-source strategies in building the relevant information. LODRefine is being used to convert all of Koha’s bibliographic data into linked data that is now available on the Web. This framework has been conceptualized and formulated based on linked data principles and search algorithms accordingly.


Linked data services have been made publicly available to library users by using a variety of different forms of data. Information may be sought quickly and easily using this interface built on numerous search structures. Aside from that, it also meets the needs of users who use the linked data search mechanism to find information. Through modern scripts and algorithms, it is now possible for library users to easily search the linked data enables services.


This paper demonstrates how quickly and easily related data from bibliographic details may be developed and generated using a spreadsheet. The entire procedure culminates in the presence of specialists in the library setting. A further advantage of the SPARQL system is that it allows visitors to group distinct concepts and aspects using independent URIs and URLs instead of the SPARQL endpoint.

Representing COVID-19 information in collaborative knowledge graphs: a study of Wikidata | Zenodo

Abstract:  Information related to the COVID-19 pandemic ranges from biological to bibliographic and from geographical to genetic. Wikidata is a vast interdisciplinary, multilingual, open collaborative knowledge base of more than 88 million entities connected by well over a billion relationships and is consequently a web-scale platform for broader computer-supported cooperative work and linked open data. Here, we introduce four aspects of Wikidata that make it an ideal knowledge base for information on the COVID-19 pandemic: its flexible data model, its multilingual features, its alignment to multiple external databases, and its multidisciplinary organization. The structure of the raw data is highly complex, so converting it to meaningful insight requires extraction and visualization, the global crowdsourcing of which adds both additional challenges and opportunities. The created knowledge graph for COVID-19 in Wikidata can be visualized, explored and analyzed in near real time by specialists, automated tools and the public, for decision support as well as educational and scholarly research purposes via SPARQL, a semantic query language used to retrieve and process information from databases saved in Resource Description Framework (RDF) format.


Application of tools to support Linked Open Data | Emerald Insight

Abstract:  Purpose

These projects aim to improve library services for users in the future by combining Link Open Data (LOD) technology with data visualization. It displays and analyses search results in an intuitive manner. These services are enhanced by integrating various LOD technologies into the authority control system.


The technology known as LOD is used to access, recycle, share, exchange and disseminate information, among other things. The applicability of Linked Data technologies for the development of library information services is evaluated in this study.


Apache Hadoop is used for rapidly storing and processing massive Linked Data data sets. Apache Spark is a free and open-source data processing tool. Hive is a SQL-based data warehouse that enables data scientists to write, read and manage petabytes of data.


The distributed large data storage system Apache HBase does not use SQL. This study’s goal is to search the geographic, authority and bibliographic databases for relevant links found on various websites. When data items are linked together, all of the data bits are linked together as well. The study observed and evaluated the tools and processes and recorded each data item’s URL. As a result, data can be combined across silos, enhanced by third-party data sources and contextualized.

From little acorns . . . A retrospective on OpenCitations | OpenCitations blog

“Now that OpenCitations is hosting over one billion freely available scholarly bibliographic citations, this is perhaps an opportune moment to look back to the start of this initiative. A little over eleven years ago, on 24 April 2010, I spoke at the Open Knowledge Foundation Conference, OKCon2010, in London, on the topic

OpenCitations: Publishing Bibliographic Citations as Linked Open Data

I reported that, earlier that same week, I had applied to Jisc for a one-year grant to fund the OpenCitations Project ( Jisc (at that time ‘The JISC’, the Joint Information Systems Committee) was tasked by the UK government, among other things, to support research and development in information technology for the benefit of the academic community.

The purpose of that original OpenCitations R&D project was to develop a prototype in which we:

harvested citations from the open access biomedical literature in PubMed Central;
described and linked them using CiTO, the Citation Typing Ontology [1];
encoded and organized them in an RDF triplestore; and
published them as Linked Open Data in the OpenCitations Corpus (OCC)….”