Library as Laboratory: Analyzing Biodiversity Literature at Scale

“Imagine the great library of life, the library that Charles Darwin said was necessary for the “cultivation of natural science” (1847). And imagine that this library is not just hundreds of thousands of books printed from 1500 to the present, but also the data contained in those books that represents all that we know about life on our planet. That library is the Biodiversity Heritage Library (BHL) The Internet Archive has provided an invaluable platform for the BHL to liberate taxonomic names, species descriptions, habitat description and much more. Connecting and harnessing the disparate data from over five-centuries is now BHL’s grand challenge. The unstructured textual data generated at the point of digitization holds immense untapped potential. Tim Berners-Lee provided the world with a semantic roadmap to address this global deluge of dark data and Wikidata is now executing on his vision. As we speak, BHL’s data is undergoing rapid transformation from legacy formats into linked open data, fulfilling the promise to evaporate data silos and foster bioliteracy for all humankind….”

Automated search in the archives: testing a new tool | Europeana Pro

“Archives Portal Europe, the online repository for archives from and about Europe, aggregates archival material from more than 30 countries and 25 languages – all searchable through one simple search engine.

In order to help researchers navigating this Babylon of languages, Archives Portal Europe have created an automated topic detection tool that expands the keyword search of a single user to create semantic connections with other documents in different languages. This testing session will allow users to preview the tool (currently in its alpha version), test it, and provide fundamental feedback for its development, and will have prizes! …”

CfP: Community-based Knowledge Bases and Knowledge Graphs (submissions due Nov 01, 2021) | Journal of Web Semantics

The Journal of Web Semantics invites submissions for a special issue on Community-based Knowledge Bases and Knowledge Graphs, edited by Tim Finin, Sebastian Hellmann, David Martin, and Elena Simperl.

Submissions are due by November 01, 2021.

Community-based knowledge bases (KBs) and knowledge graphs (KGs) are critical to many domains. They contain large amounts of information, used in applications as diverse as search, question-answering systems, and conversational agents. They are the backbone of linked open data, helping connect entities from different datasets. Finally, they create rich knowledge engineering ecosystems, making significant, empirical contributions to our understanding of KB/KG science, engineering, and practices.  From here forward, we use “KB” to include both knowledge bases and knowledge graphs. Also, “KB” and “knowledge” encompass both ontology/schema and data.

Community-based KBs come in many shapes and sizes, but they tend to share a number of commonalities:

They are created through the efforts of a group of contributors, following a set of agreed goals, policies, practices, and quality norms.
They are available under open licenses.
They are central to knowledge-sharing networks bringing together various stakeholders.
They serve the needs of a community of users, including, but not restricted to, their contributor base.
Many draw their content from crowdsourced resources (such as Wikipedia, OpenStreetMap).

Examples of community-based KBs include Wikidata, DBpedia, ConceptNet, GeoNames, FrameNet, and Yago. This special issue will highlight recent research, challenges, and opportunities in the field of community-based KBs and the interaction and processes between stakeholders and the KBs.


We welcome papers on a wide variety of topics. Papers that focus on the participation of a community of contributors are especially encouraged.

Towards FAIR protocols and workflows: the OpenPREDICT use case [PeerJ]

Abstract:  It is essential for the advancement of science that researchers share, reuse and reproduce each other’s workflows and protocols. The FAIR principles are a set of guidelines that aim to maximize the value and usefulness of research data, and emphasize the importance of making digital objects findable and reusable by others. The question of how to apply these principles not just to data but also to the workflows and protocols that consume and produce them is still under debate and poses a number of challenges. In this paper we describe a two-fold approach of simultaneously applying the FAIR principles to scientific workflows as well as the involved data. We apply and evaluate our approach on the case of the PREDICT workflow, a highly cited drug repurposing workflow. This includes FAIRification of the involved datasets, as well as applying semantic technologies to represent and store data about the detailed versions of the general protocol, of the concrete workflow instructions, and of their execution traces. We propose a semantic model to address these specific requirements and was evaluated by answering competency questions. This semantic model consists of classes and relations from a number of existing ontologies, including Workflow4ever, PROV, EDAM, and BPMN. This allowed us then to formulate and answer new kinds of competency questions. Our evaluation shows the high degree to which our FAIRified OpenPREDICT workflow now adheres to the FAIR principles and the practicality and usefulness of being able to answer our new competency questions.


Google AI Blog: An NLU-Powered Tool to Explore COVID-19 Scientific Literature

“Due to the COVID-19 pandemic, scientists and researchers around the world are publishing an immense amount of new research in order to understand and combat the disease. While the volume of research is very encouraging, it can be difficult for scientists and researchers to keep up with the rapid pace of new publications. Traditional search engines can be excellent resources for finding real-time information on general COVID-19 questions like “How many COVID-19 cases are there in the United States?”, but can struggle with understanding the meaning behind research-driven queries. Furthermore, searching through the existing corpus of COVID-19 scientific literature with traditional keyword-based approaches can make it difficult to pinpoint relevant evidence for complex queries.

To help address this problem, we are launching the COVID-19 Research Explorer, a semantic search interface on top of the COVID-19 Open Research Dataset (CORD-19), which includes more than 50,000 journal articles and preprints. We have designed the tool with the goal of helping scientists and researchers efficiently pore through articles for answers or evidence to COVID-19-related questions….”

LINCS – Linked Infrastructure for Networked Cultural Scholarship

“Human brains work through a vast web of interconnections, but the web that researchers increasingly use to understand human culture and history has few meaningful links. Linked Infrastructure for Networked Cultural Scholarship (LINCS) will create the conditions to think differently, with machines, about human culture in Canada….

The LINCS infrastructure project will convert large datasets into an organized, interconnected, machine-processable set of resources for Canadian cultural research….

LINCS aims to provide context for the cultural material that currently floats around online, interlink it, ground it in its sources, and help to make the World Wide Web a trusted resource for scholarly knowledge production….

With a team of technical and domain experts, LINCS will allow Canadian scholars and partner institutions to play a significant role in the developing the Semantic Web.”

What is MEI?

“The Music Encoding Initiative (MEI) is a 21st century community-driven open-source effort to define guidelines for encoding musical documents in a machine-readable structure.

It brings together specialists from various music research communities, including technologists, librarians, historians, and theorists in a common effort to discuss and define best practices for representing a broad range of musical documents and structures. The results of these discussions are then formalized into the MEI schema, a core set of rules for recording physical and intellectual characteristics of music notation documents expressed as an eXtensible Markup Language (XML) schema. This schema is developed and maintained by the MEI Technical Team….”