Aligning the Research Library to Organizational Strategy – Ithaka S+R

“Open access has matured significantly in recent years. The UK and EU countries have committed largely to a “gold” version of open access, driven largely by transformative agreements with the major incumbent publishing houses.[14] The US policy environment has been far more mixed, with a great deal of “green” open access incentivized by major scientific funders, although some individual universities pursued transformative agreements. Both Canadian and US libraries have benefitted from the expansion of free and open access in strengthening their position at the negotiating table with major publishers.[15]

Progress on open access has radically expanded public access to the research literature. It has also brought with it a number of second-order effects. Some of them are connected to the serious problems in research integrity and the growing crisis of trust in science.[16] Others can be seen in the impacts on the scholarly publishing marketplace and the platforms that support discovery and access.[17]

While open access has made scientific materials more widely available, it has not directly addressed the challenges in translating scholarship for public consumption. Looking ahead, it is likely that scholarly communication will experience further changes as a result of computers increasingly supplanting human readership. The form of the scientific output may decreasingly look like the traditional journal article as over time standardized data, methods, protocols, and other scientific artifacts become vital for computational consumption….”

The need for open access and natural language processing | PNAS

“In PNAS, Chu and Evans (1) argue that the rapidly rising number of publications in any given field actually hinders progress. The rationale is that, if too many papers are published, the really novel ideas have trouble finding traction, and more and more people tend to “go along with the majority.” Review papers are cited more and more instead of original research. We agree with Chu and Evans: Scientists simply cannot keep up. This is why we argue that we must bring the powers of artificial intelligence/machine learning (AI/ML) and open access to the forefront. AI/ML is a powerful tool and can be used to ingest and analyze large quantities of data in a short period of time. For example, some of us (2) have used AI/ML tools to ingest 500,000+ abstracts from online archives (relatively easy to do today) and categorize them for strategic planning purposes. This letter offers a short follow-on to Chu and Evans (hereafter CE) to point out a way to mitigate the problems they delineate….

In conclusion, we agree with CE (1) on the problems caused by the rapid rise in scientific publications, outpacing any individual’s ability to keep up. We propose that open access, combined with NLP, can help effectively organize the literature, and we encourage publishers to make papers open access, archives to make papers easily findable, and researchers to employ their own NLP as an important tool in their arsenal.”

Recommendations on the Transformation of Academic Publishing: Towards Open Access

“Three central arguments support this transformation: 1 ? Openly accessible publications can be read, reviewed and used more quickly and more widely by other researchers. This increases the quality of research and accelerates scientific progress. 2 ? OA makes scientific knowledge more widely available outside of the scientific community and lowers the threshold for various transfer activities. This increases the social effectiveness of (publicly funded) research. 3 ? Up to now, the business model of publishers has been based on rights of use. As they will no longer be granted exclusive rights under OA, publishers will become publication service providers and will compete with other providers. This may strengthen the negotiating position of scientific institutions vis-à-vis such service providers and improve the innovative capacity, cost transparency and cost efficiency of the publication system.

As far as the Council is concerned, the goal of the transformation is for academic publications to be made freely available immediately, permanently, at the original publication venue and in the citable, peer-reviewed and typeset version of record under an open licence (CC BY). This so-called gold route to OA (gold OA) is compatible with various business models…. 

For orientation in this market, the Council recommends that the Alliance of Science Organisations in Germany agree on common requirements for quality assurance of content (especially in terms of peer review processes) as well as for high-quality publication services. In the medium term, academic publications should not only be openly accessible, but also machine-readable through open, structured formats and semantic annotations….

“Gold OA” should not be equated with funding via article processing charges (APC)….

As the WR sees it, all third-party funders are obliged to fully finance the publication costs arising from publishing the results of the research they are funding….”


copyright act: Educators Push For Amendment To Copyright Act | Pune News – Times of India

“Senior academicians and vice-chancellors of universities in the city have demanded inclusive digital education for which technology and infrastructural advances will have to be matched with changes in the copyright law enacted in 1967.

It is related specifically to open educational resources, digitisation of resource material and their sharing or lending, text and data mining, procurement and sharing of e-resources, digitally supported teaching activities, including distance learning.
In their research, the professors have stated that, the amendments in the Copyright Act also needs to ease operations of public libraries, institutional libraries, galleries and museums and archives in physical and digital frameworks including National Digital Library of India….”

Using pretraining and text mining methods to automatically extract the chemical scientific data | Emerald Insight

Abstract:  Purpose

In computational chemistry, the chemical bond energy (pKa) is essential, but most pKa-related data are submerged in scientific papers, with only a few data that have been extracted by domain experts manually. The loss of scientific data does not contribute to in-depth and innovative scientific data analysis. To address this problem, this study aims to utilize natural language processing methods to extract pKa-related scientific data in chemical papers.


Based on the previous Bert-CRF model combined with dictionaries and rules to resolve the problem of a large number of unknown words of professional vocabulary, in this paper, the authors proposed an end-to-end Bert-CRF model with inputting constructed domain wordpiece tokens using text mining methods. The authors use standard high-frequency string extraction techniques to construct domain wordpiece tokens for specific domains. And in the subsequent deep learning work, domain features are added to the input.


The experiments show that the end-to-end Bert-CRF model could have a relatively good result and can be easily transferred to other domains because it reduces the requirements for experts by using automatic high-frequency wordpiece tokens extraction techniques to construct the domain wordpiece tokenization rules and then input domain features to the Bert model.


By decomposing lots of unknown words with domain feature-based wordpiece tokens, the authors manage to resolve the problem of a large amount of professional vocabulary and achieve a relatively ideal extraction result compared to the baseline model. The end-to-end model explores low-cost migration for entity and relation extraction in professional fields, reducing the requirements for experts.

Text and Data Mining | NISO website

“Not so long ago, Text and Data Mining (TDM) — the automated detection of patterns and extraction of knowledge from machine-readable content or data — was a particular area of interest. So much so, that libraries and content providers developed licensing language and other resources to support researchers wanting to work with and manipulate this material, including a proliferation of LibGuides and APIs. But where are we now in identifying available resources and tools for TDM activities?

This virtual conference will provide an “explainer” for information professionals tasked with supporting researchers who are just beginning to engage with TDM, and wondering how to pull the data they need, how it is structured, and how they can expect to engage with it. Our speakers will cover essential technology, how it is deployed and used, the scope of support that the library may be asked to provide, and the spectrum of options for collaboration between information professionals and content and service providers.”

Representing COVID-19 information in collaborative knowledge graphs: a study of Wikidata | Zenodo

Abstract:  Information related to the COVID-19 pandemic ranges from biological to bibliographic and from geographical to genetic. Wikidata is a vast interdisciplinary, multilingual, open collaborative knowledge base of more than 88 million entities connected by well over a billion relationships and is consequently a web-scale platform for broader computer-supported cooperative work and linked open data. Here, we introduce four aspects of Wikidata that make it an ideal knowledge base for information on the COVID-19 pandemic: its flexible data model, its multilingual features, its alignment to multiple external databases, and its multidisciplinary organization. The structure of the raw data is highly complex, so converting it to meaningful insight requires extraction and visualization, the global crowdsourcing of which adds both additional challenges and opportunities. The created knowledge graph for COVID-19 in Wikidata can be visualized, explored and analyzed in near real time by specialists, automated tools and the public, for decision support as well as educational and scholarly research purposes via SPARQL, a semantic query language used to retrieve and process information from databases saved in Resource Description Framework (RDF) format.


Making Biomedical Sciences publications more accessible for machines | SpringerLink

Abstract:  With the rapidly expanding catalogue of scientific publications, especially within the Biomedical Sciences field, it is becoming increasingly difficult for researchers to search for, read or even interpret emerging scientific findings. PubMed, just one of the current biomedical data repositories, comprises over 33 million citations for biomedical research, and over 2500 publications are added each day. To further strengthen the impact biomedical research, we suggest that there should be more synergy between publications and machines. By bringing machines into the realm of research and publication, we can greatly augment the assessment, investigation and cataloging of the biomedical literary corpus. The effective application of machine-based manuscript assessment and interpretation is now crucial, and potentially stands as the most effective way for researchers to comprehend and process the tsunami of biomedical data and literature. Many biomedical manuscripts are currently published online in poorly searchable document types, with figures and data presented in formats that are partially inaccessible to machine-based approaches. The structure and format of biomedical manuscripts should be adapted to facilitate machine-assisted interrogation of this important literary corpus. In this context, it is important to embrace the concept that biomedical scientists should also write manuscripts that can be read by machines. It is likely that an enhanced human–machine synergy in reading biomedical publications will greatly enhance biomedical data retrieval and reveal novel insights into complex datasets.


Borchard Foundation Grant Will Help Digitize Rare Art Exhibition Catalogs | UCSB Library

“The Albert and Elaine Borchard Foundation issued a $10,000 grant to the UC Santa Barbara Library for the digitization of a portion of the Marcel Nicolle Collection, which consists of more than 1,000 rare 19th-century exhibition catalogs in Western European languages, mostly French….


Digitization will help mitigate further damage to the objects by decreasing physical handling and will also help broaden research access to these frequently requested materials. It also provides opportunities for text mining and other forms of digital scholarship on the collection….”

Borchard Foundation Grant Will Help Digitize Rare Art Exhibition Catalogs | UCSB Library

“The Albert and Elaine Borchard Foundation issued a $10,000 grant to the UC Santa Barbara Library for the digitization of a portion of the Marcel Nicolle Collection, which consists of more than 1,000 rare 19th-century exhibition catalogs in Western European languages, mostly French….


Digitization will help mitigate further damage to the objects by decreasing physical handling and will also help broaden research access to these frequently requested materials. It also provides opportunities for text mining and other forms of digital scholarship on the collection….”

Tracking Science: How Libraries can Protect Data and Scientific Freedom | ZBW MediaTalk

A modern expression states: If you are not paying for the product, you are the product yourself. How can libraries help to prevent tracking in science, thereby protecting the data of the researchers and, in an idealistic sense, scientific freedom? In an interview, Julia Reda reveals the starting points and pitfalls.

Rough Notes on ‘Surveillance Capitalism in our Libraries’ | Open Working

This is a very partial and personal list of sources and documents related to the topic of ‘Surveillance Capitalism in our Libraries’. I am sure there is much more to be added. It is shared in case anyone else finds it useful. A commentable version of the document is available here

Singapore starts making its copyright law fit for the digital world; others need to follow its example – Walled Culture

“Singapore’s previous copyright law provided a broad “fair dealing” right that allowed a range of general uses.  The title of this exception has now been changed from “fair dealing” to “fair use.” That might seem a trivial change, but it’s significant.  Fair dealing rights are, in general, more limited than fair use ones.  The adoption of the latter term is further confirmation that Singapore’s new Copyright Law is moving in the right direction, and aims to provide greater freedoms for the general public, rather than fewer, as has so often been the case in this sector.”