[2303.14334] The Semantic Reader Project: Augmenting Scholarly Documents through AI-Powered Interactive Reading Interfaces

Abstract:  Scholarly publications are key to the transfer of knowledge from scholars to others. However, research papers are information-dense, and as the volume of the scientific literature grows, the need for new technology to support the reading process grows. In contrast to the process of finding papers, which has been transformed by Internet technology, the experience of reading research papers has changed little in decades. The PDF format for sharing research papers is widely used due to its portability, but it has significant downsides including: static content, poor accessibility for low-vision readers, and difficulty reading on mobile devices. This paper explores the question “Can recent advances in AI and HCI power intelligent, interactive, and accessible reading interfaces — even for legacy PDFs?” We describe the Semantic Reader Project, a collaborative effort across multiple institutions to explore automatic creation of dynamic reading interfaces for research papers. Through this project, we’ve developed ten research prototype interfaces and conducted usability studies with more than 300 participants and real-world users showing improved reading experiences for scholars. We’ve also released a production reading interface for research papers that will incorporate the best features as they mature. We structure this paper around challenges scholars and the public face when reading research papers — Discovery, Efficiency, Comprehension, Synthesis, and Accessibility — and present an overview of our progress and remaining open challenges.


Semantic wikis as flexible database interfaces for biomedical applications | Scientific Reports

Abstract:  Several challenges prevent extracting knowledge from biomedical resources, including data heterogeneity and the difficulty to obtain and collaborate on data and annotations by medical doctors. Therefore, flexibility in their representation and interconnection is required; it is also essential to be able to interact easily with such data. In recent years, semantic tools have been developed: semantic wikis are collections of wiki pages that can be annotated with properties and so combine flexibility and expressiveness, two desirable aspects when modeling databases, especially in the dynamic biomedical domain. However, semantics and collaborative analysis of biomedical data is still an unsolved challenge. The aim of this work is to create a tool for easing the design and the setup of semantic databases and to give the possibility to enrich them with biostatistical applications. As a side effect, this will also make them reproducible, fostering their application by other research groups. A command-line software has been developed for creating all structures required by Semantic MediaWiki. Besides, a way to expose statistical analyses as R Shiny applications in the interface is provided, along with a facility to export Prolog predicates for reasoning with external tools. The developed software allowed to create a set of biomedical databases for the Neuroscience Department of the University of Padova in a more automated way. They can be extended with additional qualitative and statistical analyses of data, including for instance regressions, geographical distribution of diseases, and clustering. The software is released as open source-code and published under the GPL-3 license at https://github.com/mfalda/tsv2swm.


Library as Laboratory: Analyzing Biodiversity Literature at Scale

“Imagine the great library of life, the library that Charles Darwin said was necessary for the “cultivation of natural science” (1847). And imagine that this library is not just hundreds of thousands of books printed from 1500 to the present, but also the data contained in those books that represents all that we know about life on our planet. That library is the Biodiversity Heritage Library (BHL) The Internet Archive has provided an invaluable platform for the BHL to liberate taxonomic names, species descriptions, habitat description and much more. Connecting and harnessing the disparate data from over five-centuries is now BHL’s grand challenge. The unstructured textual data generated at the point of digitization holds immense untapped potential. Tim Berners-Lee provided the world with a semantic roadmap to address this global deluge of dark data and Wikidata is now executing on his vision. As we speak, BHL’s data is undergoing rapid transformation from legacy formats into linked open data, fulfilling the promise to evaporate data silos and foster bioliteracy for all humankind….”

Automated search in the archives: testing a new tool | Europeana Pro

“Archives Portal Europe, the online repository for archives from and about Europe, aggregates archival material from more than 30 countries and 25 languages – all searchable through one simple search engine.

In order to help researchers navigating this Babylon of languages, Archives Portal Europe have created an automated topic detection tool that expands the keyword search of a single user to create semantic connections with other documents in different languages. This testing session will allow users to preview the tool (currently in its alpha version), test it, and provide fundamental feedback for its development, and will have prizes! …”

CfP: Community-based Knowledge Bases and Knowledge Graphs (submissions due Nov 01, 2021) | Journal of Web Semantics

The Journal of Web Semantics invites submissions for a special issue on Community-based Knowledge Bases and Knowledge Graphs, edited by Tim Finin, Sebastian Hellmann, David Martin, and Elena Simperl.

Submissions are due by November 01, 2021.

Community-based knowledge bases (KBs) and knowledge graphs (KGs) are critical to many domains. They contain large amounts of information, used in applications as diverse as search, question-answering systems, and conversational agents. They are the backbone of linked open data, helping connect entities from different datasets. Finally, they create rich knowledge engineering ecosystems, making significant, empirical contributions to our understanding of KB/KG science, engineering, and practices.  From here forward, we use “KB” to include both knowledge bases and knowledge graphs. Also, “KB” and “knowledge” encompass both ontology/schema and data.

Community-based KBs come in many shapes and sizes, but they tend to share a number of commonalities:

They are created through the efforts of a group of contributors, following a set of agreed goals, policies, practices, and quality norms.
They are available under open licenses.
They are central to knowledge-sharing networks bringing together various stakeholders.
They serve the needs of a community of users, including, but not restricted to, their contributor base.
Many draw their content from crowdsourced resources (such as Wikipedia, OpenStreetMap).

Examples of community-based KBs include Wikidata, DBpedia, ConceptNet, GeoNames, FrameNet, and Yago. This special issue will highlight recent research, challenges, and opportunities in the field of community-based KBs and the interaction and processes between stakeholders and the KBs.


We welcome papers on a wide variety of topics. Papers that focus on the participation of a community of contributors are especially encouraged.

Towards FAIR protocols and workflows: the OpenPREDICT use case [PeerJ]

Abstract:  It is essential for the advancement of science that researchers share, reuse and reproduce each other’s workflows and protocols. The FAIR principles are a set of guidelines that aim to maximize the value and usefulness of research data, and emphasize the importance of making digital objects findable and reusable by others. The question of how to apply these principles not just to data but also to the workflows and protocols that consume and produce them is still under debate and poses a number of challenges. In this paper we describe a two-fold approach of simultaneously applying the FAIR principles to scientific workflows as well as the involved data. We apply and evaluate our approach on the case of the PREDICT workflow, a highly cited drug repurposing workflow. This includes FAIRification of the involved datasets, as well as applying semantic technologies to represent and store data about the detailed versions of the general protocol, of the concrete workflow instructions, and of their execution traces. We propose a semantic model to address these specific requirements and was evaluated by answering competency questions. This semantic model consists of classes and relations from a number of existing ontologies, including Workflow4ever, PROV, EDAM, and BPMN. This allowed us then to formulate and answer new kinds of competency questions. Our evaluation shows the high degree to which our FAIRified OpenPREDICT workflow now adheres to the FAIR principles and the practicality and usefulness of being able to answer our new competency questions.


Google AI Blog: An NLU-Powered Tool to Explore COVID-19 Scientific Literature

“Due to the COVID-19 pandemic, scientists and researchers around the world are publishing an immense amount of new research in order to understand and combat the disease. While the volume of research is very encouraging, it can be difficult for scientists and researchers to keep up with the rapid pace of new publications. Traditional search engines can be excellent resources for finding real-time information on general COVID-19 questions like “How many COVID-19 cases are there in the United States?”, but can struggle with understanding the meaning behind research-driven queries. Furthermore, searching through the existing corpus of COVID-19 scientific literature with traditional keyword-based approaches can make it difficult to pinpoint relevant evidence for complex queries.

To help address this problem, we are launching the COVID-19 Research Explorer, a semantic search interface on top of the COVID-19 Open Research Dataset (CORD-19), which includes more than 50,000 journal articles and preprints. We have designed the tool with the goal of helping scientists and researchers efficiently pore through articles for answers or evidence to COVID-19-related questions….”