The need for open access and natural language processing | PNAS

“In PNAS, Chu and Evans (1) argue that the rapidly rising number of publications in any given field actually hinders progress. The rationale is that, if too many papers are published, the really novel ideas have trouble finding traction, and more and more people tend to “go along with the majority.” Review papers are cited more and more instead of original research. We agree with Chu and Evans: Scientists simply cannot keep up. This is why we argue that we must bring the powers of artificial intelligence/machine learning (AI/ML) and open access to the forefront. AI/ML is a powerful tool and can be used to ingest and analyze large quantities of data in a short period of time. For example, some of us (2) have used AI/ML tools to ingest 500,000+ abstracts from online archives (relatively easy to do today) and categorize them for strategic planning purposes. This letter offers a short follow-on to Chu and Evans (hereafter CE) to point out a way to mitigate the problems they delineate….

In conclusion, we agree with CE (1) on the problems caused by the rapid rise in scientific publications, outpacing any individual’s ability to keep up. We propose that open access, combined with NLP, can help effectively organize the literature, and we encourage publishers to make papers open access, archives to make papers easily findable, and researchers to employ their own NLP as an important tool in their arsenal.”

NLP needs to be open. 500+ researchers are trying to make it happen | VentureBeat

“A group of more than 500 researchers from 45 different countries — from France, the US, and Japan to Indonesia, Ghana, and Ethiopia — has come together to work towards tackling some of these problems. The project, which the authors of this article are all involved in, is called Big Science, and our goal is to improve the scientific understanding of the capabilities and limitations of large-scale neural network models in NLP and to create a diverse and multilingual dataset and a large-scale language model as research artifacts, open to the scientific community.

BigScience was inspired by scientific creation schemes existing in other scientific fields, such as CERN and the LHC in particle physics, in which open scientific collaborations facilitate the creation of large-scale artifacts useful for the entire research community. So far, a broad range of institutions and disciplines have joined the project in its year-long effort that started in May 2021….

Our effort keeps evolving and growing, with more researchers joining every day, making it already the biggest open science contribution in artificial intelligence to date.

Much like the tensions between proprietary and open-source software in the early 2000s, AI is at a turning point where it can either go in a proprietary direction, where large-scale state-of-the-art models are increasingly developed internally in companies and kept private, or in an open, collaborative, community-oriented direction, marrying the best aspects of open-source and open-science. It’s essential that we make the most of this current opportunity to push AI onto that community-oriented path so that it can benefit society as a whole.”

ripeta – responsible science

“Ripeta is a credit review for scientific publications. Similar to a financial credit report, which reviews the fiscal health of a person, Ripeta assesses the responsible reporting of the scientific paper. The Ripeta suite identifies and extracts the key components of research reporting, thus drastically shortening and improving the publication process; furthermore, Ripeta’s ability to extract data makes these pieces of text easily discoverable for future use….

Researchers: Rapidly check your pre-print manuscripts to improve the transparency of reporting your research.

Publishers: Improve the reproducibility of the articles you publish with an automated tool that helps evidence-based science.

Funders: Evaluate your portfolio by checking your manuscripts for robust scientific reporting.”

Wellcome and Ripeta partner to assess dataset availability in funded research – Digital Science

“Ripeta and Wellcome are pleased to announce a collaborative effort to assess data and code availability in the manuscripts of funded research projects.

The project will analyze papers funded by Wellcome from the year prior to it establishing a dedicated Open Research team (2016) and from the most recent calendar year (2019). It supports Wellcome’s commitment to maximising the availability and re-use of results from its funded research.

Ripeta, a Digital Science portfolio company, aims to make better science easier by identifying and highlighting the important parts of research that should be transparently presented in a manuscript and other materials.

The collaboration will leverage Ripeta’s natural language processing (NLP) technology, which scans articles for reproducibility criteria. For both data availability and code availability, the NLP will produce a binary yes-no response for the presence of availability statements. Those with a “yes” response will then be categorized by the way that data or code are shared….”

NimbleMiner: A Novel Multi-Lingual Text Mining Application

Abstract:  This demonstration showcase will present a novel open access text mining application called NimbleMiner. NimbleMiner’s architecture is language agnostic and it can be potentially applied in multiple languages. The system was applied in a series of recent studies in several languages, including English and Hebrew. The system showed good results in terms of text classification performance when compared to other natural language processing approaches.