English professor develops virtual Open Corpus Project | The Justice

“Prof. Dorothy Kim (ENG) is currently working to develop a virtual corpus, or collection of written texts, of Early Middle English language. This would give researchers the opportunity to search across multiple archives and databases of manuscripts. The current status of the Open Corpus Project, as the site is titled, was unveiled at a Faculty Lunch Symposium on Thursday, March 17….

There are many existing corpora for Early Middle English and other languages, but each one has a different set of pros and cons, Kim explained. …

She explained that the design for the Open Corpus Project will be mainly based on a digital platform called Open Context, which is an open access archeological database. She said that Open Context has a landing page with a map, clickable links, and search filters; searches are presented in an organized list so that documents are easy to view and further searches can be done from the results. In order to develop the Open Corpus Project in a similar manner, Kim is partnering with Geocene, an engineering consultancy….”

 

Google AI Blog: A Step Toward More Inclusive People Annotations in the Open Images Extended Dataset

“In 2016, we introduced Open Images, a collaborative release of ~9 million images annotated with image labels spanning thousands of object categories and bounding box annotations for 600 classes. Since then, we have made several updates, including the release of crowdsourced data to the Open Images Extended collection to improve diversity of object annotations. While the labels provided with these datasets were expansive, they did not focus on sensitive attributes for people, which are critically important for many machine learning (ML) fairness tasks, such as fairness evaluations and bias mitigation. In fact, finding datasets that include thorough labeling of such sensitive attributes is difficult, particularly in the domain of computer vision.

Today, we introduce the More Inclusive Annotations for People (MIAP) dataset in the Open Images Extended collection. The collection contains more complete bounding box annotations for the person class hierarchy in 100k images containing people. Each annotation is also labeled with fairness-related attributes, including perceived gender presentation and perceived age range. With the increasing focus on reducing unfair bias as part of responsible AI research, we hope these annotations will encourage researchers already leveraging Open Images to incorporate fairness analysis in their research….”

Blog – Europe PMC: Announcing the new version of SciLite – the Europe PMC tool for highlighting annotations

“This month, Europe PMC released a new version of SciLite, a powerful tool for highlighting annotations in life sciences publications. SciLite is powered by the Europe PMC annotation platform via the open annotation API, which provides access to over 1.3 billion annotations. Highlighting annotations in the text enables users to easily scan the article and locate key biological entities, such as genes/proteins, accession numbers, protein interactions, diseases, gene-disease relationship and more….”

Empfehlungen für eine nationale Open Science Strategie in Österreich

From Google’s English:  “The Working Group Open Science Strategy of the Open Science Network Austria (OANA) has developed recommendations for a national Open Science Strategy in Austria and invites you to annotate or comment on the document by April 5th, 2020. To insert comments in the document, please click on the text symbol in the PDF Viewer (top right under the eye) and create an account free of charge. As soon as you log in with your account, you can then comment on the document so that it is visible to others. If the text symbol is not visible, you can find further information at hypothes.is ….”

Standards and the Role of Preprints in Scholarly Communication | NISO website

“One vision (hereafter, referred to as “model”) for preprint publication, disclosed in an evolving preprint [1], focuses on physics preprints in arXiv. This focus is natural, given that physicists and allied practitioners of other mathematical and quantitative fields have been long-standing adopters of preprints. Unsurprisingly, arXiv has therefore played a lead role in the preprint space. Preprint servers that share the “-rXiv” suffix with arXiv have emerged.

The model may never materialize in pristine form. This would take decades at a minimum. However, it provides one analytical framework for understanding the interplay of various components of scholarly journals publishing and for thinking about how preprints can mitigate problems that beset this complicated market. (A new version of the aforementioned preprint will further develop this critique, which is beyond the scope of this paper and discusses open access generally.)

The model suggests that journal publishing and preprints in physics should be increasingly symbiotic. They have distinct roles that reflect historically recurring needs in physics (and STEM) publishing generally….

The model suggests, by contrast, that journal articles take the form of traditional review articles [5] that cite other journal articles, conference proceedings, books, and — increasingly — research disclosed over several preprints. Journal articles should play a pedagogical role in orienting researchers and students to new fields, creating narratives about newly emerging trends, contextualizing discoveries, and fostering interdisciplinary research.

The model calls for the journal market to contract significantly but not entirely as preprints supplant journal articles as the place to disclose small slivers of research. Re-purposing journal articles and correspondingly trimming their numbers will decrease demand for journal subscriptions that pressure budget-strapped libraries. An increased emphasis on review articles will assist researchers in navigating their fields, help counter hyper-specialization, and make the inter-generational transmission of science much more efficient. Also, contracting the journals market and re-purposing it almost exclusively toward review articles can save genius-hours spent doing peer-review, time better spent doing research disclosed in preprints, writing or reviewing integrative journal articles, and teaching….”

AAP + scientific society letters, annotated · The Knowledge Futures Commonplace

“This past week, a range of scientific societies and a few mega-publishers sent public letters to the U.S. administration, opposing a potential executive order that would mandate immediate free access to federally-funded research.

One letter, organized by the Association of American Publishers, was signed by major publishers like Elsevier, Wiley, and Woulters-Kluwer, lobbying groups, and scholarly societies. This was distributed via a press release with the title, “COALITION OF 135+ SCIENTIFIC RESEARCH AND PUBLISHING ORGANIZATIONS SENDS LETTER TO ADMINISTRATION OPPOSING  PROPOSED ADMINISTRATION POLICY FORCING IMMEDIATE FREE DISTRIBUTION OF PEER-REVIEWED JOURNAL ARTICLES”

Another letter was signed by 62 scientific societies, focusing on claims about the impact of the prospective order on scientific initiatives. Each of these letters made bold claims that deserve supporting evidence; and are reproduced below for discussion and annotation….”

Linked Research on the Decentralised Web

Abstract:  This thesis is about research communication in the context of the Web. I analyse literature which reveals how researchers are making use of Web technologies for knowledge dissemination, as well as how individuals are disempowered by the centralisation of certain systems, such as academic publishing platforms and social media. I share my findings on the feasibility of a decentralised and interoperable information space where researchers can control their identifiers whilst fulfilling the core functions of scientific communication: registration, awareness, certification, and archiving.

The contemporary research communication paradigm operates under a diverse set of sociotechnical constraints, which influence how units of research information and personal data are created and exchanged. Economic forces and non-interoperable system designs mean that researcher identifiers and research contributions are largely shaped and controlled by third-party entities; participation requires the use of proprietary systems.

From a technical standpoint, this thesis takes a deep look at semantic structure of research artifacts, and how they can be stored, linked and shared in a way that is controlled by individual researchers, or delegated to trusted parties. Further, I find that the ecosystem was lacking a technical Web standard able to fulfill the awareness function of research communication. Thus, I contribute a new communication protocol, Linked Data Notifications (published as a W3C Recommendation) which enables decentralised notifications on the Web, and provide implementations pertinent to the academic publishing use case. So far we have seen decentralised notifications applied in research dissemination or collaboration scenarios, as well as for archival activities and scientific experiments.

Another core contribution of this work is a Web standards-based implementation of a clientside tool, dokieli, for decentralised article publishing, annotations and social interactions. dokieli can be used to fulfill the scholarly functions of registration, awareness, certification, and archiving, all in a decentralised manner, returning control of research contributions and discourse to individual researchers.

The overarching conclusion of the thesis is that Web technologies can be used to create a fully functioning ecosystem for research communication. Using the framework of Web architecture, and loosely coupling the four functions, an accessible and inclusive ecosystem can be realised whereby users are able to use and switch between interoperable applications without interfering with existing data.

Technical solutions alone do not suffice of course, so this thesis also takes into account the need for a change in the traditional mode of thinking amongst scholars, and presents the Linked Research initiative as an ongoing effort toward researcher autonomy in a social system, and universal access to human- and machine-readable information?. Outcomes of this outreach work so far include an increase in the number of individuals self-hosting their research artifacts, workshops publishing accessible proceedings on the Web, in-the-wild experiments with open and public peer-review, and semantic graphs of contributions to conference proceedings and journals (the Linked Open Research Cloud).

Some of the future challenges include: addressing the social implications of decentralised Web publishing, as well as the design of ethically grounded interoperable mechanisms; cultivating privacy aware information spaces; personal or community-controlled on-demand archiving services; and further design of decentralised applications that are aware of the core functions of scientific communication.

Discovery – GO FAIR

“The main purpose of the Discovery IN is to provide interfaces and other user-facing services for data discovery across disciplines. We explore new and innovative ways of enabling discovery, including visualizations, recommender systems, semantics, content mining, annotation, and responsible metrics. …”

Leveraging Concepts in Open Access Publications

Abstract : This paper addresses the integration of a Named Entity Recognition and Disambiguation (NERD) service within a group of open access (OA) publishing digital platforms and considers its potential impact on both research and scholarly publishing. The software powering this service, called entity-fishing, was initially developed by Inria in the context of the EU FP7 project CENDARI and provides automatic entity recognition and disambiguation using the Wikipedia and Wikidata data sets. The application is distributed with an open-source licence, and it has been deployed as a web service in DARIAH’s infrastructure hosted by the French HumaNum. In the paper, we focus on the specific issues related to its integration on five OA platforms specialized in the publication of scholarly monographs in the social sciences and humanities (SSH), as part of the work carried out within the EU H2020 project HIRMEOS (High Integration of Research Monographs in the European Open Science infrastructure). In the first section, we give a brief overview of the current status and evolution of OA publications, considering specifically the challenges that OA monographs are encountering. In the second part, we show how the HIRMEOS project aims to face these challenges by optimizing five OA digital platforms for the publication of monographs from the SSH and ensuring their interoperability. In sections three and four we give a comprehensive description of the entity-fishing service, focusing on its concrete applications in real use cases together with some further possible ideas on how to exploit the annotations generated. We show that entity-fishing annotations can improve both research and publishing process. In the last chapter, we briefly present further possible application scenarios that could be made available through infrastructural projects.

Open Knowledge Institutions: Reinventing Universities

“Can 13 authors, from the USA, Germany, Australia, China and South Africa, many previously unknown to one another, get together and, from scratch, write a 150-page book –– on a topic none of them has tackled before –– in 5 days? 

If the group in question is committed to the same goals as MIT’s PubPub platform, to “socialize the process of knowledge creation”; [1] and if the process they use is a Book Sprint, a professionally facilitated “collaborative process that captures the knowledge of a group of experts in a single book,“ [2] then the answer is yes.

What drew our diverse group together is “open knowledge.” By this we mean not just the technical specifics of open access publishing or open source computing, and not just a general commitment to an open society, open government or open science, but a need to understand how these technical and social possibilities can be brought together in open knowledge institutions. 

Specifically, how can the most long-lasting, successful and expanding version of a knowledge institution –– the university –– face the mounting challenges of global, digital and contested knowledge systems, in order to transform universities into Open Knowledge Institutions?

We present the results of our work here to the wider community for annotation, commentary, constructive criticism and engagement, with a view to extending the collaborative spirit further. We want the book to gain further analytical richness and precision from crowd-sourced expertise. You are invited to join us as we work through some of the issues that may enable or stand in the way of socialising knowledge itself….”

Automating semantic publishing – IOS Press

Abstract: “Semantic Publishing involves the use of Web and Semantic Web technologies and standards for the semantic enhancement of a scholarly work so as to improve its discoverability, interactivity, openness and (re-)usability for both humans and machines. Recently, people have suggested that the semantic enhancements of a scholarly work should be undertaken by the authors of that scholarly work, and should be considered as integral parts of the contribution subjected to peer review. However, this requires that the authors should spend additional time and effort adding such semantic annotations, time that they usually do not have available. Thus, the most pragmatic way to facilitate this additional task is to use automated services that create the semantic annotation of authors’ scholarly articles by parsing the content that they have already written, thus reducing the additional time required of the authors to that for checking and validating these semantic annotations. In this article, I propose a generic approach called compositional and iterative semantic enhancement (CISE) that enables the automatic enhancement of scholarly papers with additional semantic annotations in a way that is independent of the markup used for storing scholarly articles and the natural language used for writing their content.”

Automating semantic publishing – IOS Press

Abstract: “Semantic Publishing involves the use of Web and Semantic Web technologies and standards for the semantic enhancement of a scholarly work so as to improve its discoverability, interactivity, openness and (re-)usability for both humans and machines. Recently, people have suggested that the semantic enhancements of a scholarly work should be undertaken by the authors of that scholarly work, and should be considered as integral parts of the contribution subjected to peer review. However, this requires that the authors should spend additional time and effort adding such semantic annotations, time that they usually do not have available. Thus, the most pragmatic way to facilitate this additional task is to use automated services that create the semantic annotation of authors’ scholarly articles by parsing the content that they have already written, thus reducing the additional time required of the authors to that for checking and validating these semantic annotations. In this article, I propose a generic approach called compositional and iterative semantic enhancement (CISE) that enables the automatic enhancement of scholarly papers with additional semantic annotations in a way that is independent of the markup used for storing scholarly articles and the natural language used for writing their content.”