“Just one week after Google’s DeepMind AI group finally described its biology efforts in detail, the company is releasing a paper that explains how it analyzed nearly every protein encoded in the human genome and predicted its likely three-dimensional structure—a structure that can be critical for understanding disease and designing treatments. In the very near future, all of these structures will be released under a Creative Commons license via the European Bioinformatics Institute, which already hosts a major database of protein structures.
In a press conference associated with the paper’s release, DeepMind’s Demis Hassabis made clear that the company isn’t stopping there. In addition to the work described in the paper, the company will release structural predictions for the genomes of 20 major research organisms, from yeast to fruit flies to mice. In total, the database launch will include roughly 350,000 protein structures….”
“Before we can build machines that make decisions based on common sense, the AI powering those machines must be capable of more than simply finding patterns in data. It must also consider the intentions, beliefs, and desires of others that people use to intuitively make decisions.
At the 2021 International Conference on Machine Learning (ICML), we are releasing a new dataset for benchmarking AI intuition, along with two machine learning models representing different approaches to the problem. The research has been done with our colleagues at MIT and Harvard University to accelerate the development of AI that exhibits common sense. These tools rely on testing techniques that psychologists use to study the behavior of infants….”
“A group of more than 500 researchers from 45 different countries — from France, the US, and Japan to Indonesia, Ghana, and Ethiopia — has come together to work towards tackling some of these problems. The project, which the authors of this article are all involved in, is called Big Science, and our goal is to improve the scientific understanding of the capabilities and limitations of large-scale neural network models in NLP and to create a diverse and multilingual dataset and a large-scale language model as research artifacts, open to the scientific community.
BigScience was inspired by scientific creation schemes existing in other scientific fields, such as CERN and the LHC in particle physics, in which open scientific collaborations facilitate the creation of large-scale artifacts useful for the entire research community. So far, a broad range of institutions and disciplines have joined the project in its year-long effort that started in May 2021….
Our effort keeps evolving and growing, with more researchers joining every day, making it already the biggest open science contribution in artificial intelligence to date.
Much like the tensions between proprietary and open-source software in the early 2000s, AI is at a turning point where it can either go in a proprietary direction, where large-scale state-of-the-art models are increasingly developed internally in companies and kept private, or in an open, collaborative, community-oriented direction, marrying the best aspects of open-source and open-science. It’s essential that we make the most of this current opportunity to push AI onto that community-oriented path so that it can benefit society as a whole.”
“A team of researchers from EleutherAI have open-sourced GPT-J, a six-billion parameter natural language processing (NLP) AI model based on GPT-3. The model was trained on an 800GB open-source text dataset and has performance comparable to a GPT-3 model of similar size.
Developer Aran Komatsuzaki announced the release on his blog. The model was trained on EleutherAI’s Pile dataset using Google Cloud’s v3-256 TPUs; training took approximately five weeks. On common NLP benchmark tasks, GPT-J achieves an accuracy similar to OpenAI’s published results for their 6.7B parameter version of GPT-3. EleutherAI’s release includes the model code, pre-trained weight files, Colab notebook, and a demo website. According to Komatsuzaki,…”
“INASP believes there is an opportunity to leverage new technologies in service of Southern knowledge systems, and we seek partners to work with us to identify possibilities and to test and build new tools.
We are inviting proposals from Africa, Asia and Latin America for small grants of approximately $3000 (£2,100) to enable groups to organise and host a series of discovery workshops to explore these ideas further….”
Abstract: The Microbiology Society will be launching an open research platform in October 2021. Developed using funding from the Wellcome Trust and the Howard Hughes Medical Institute (HHMI), the platform will combine our current sound-science journal, Access Microbiology, with artificial intelligence (AI) review tools and many of the elements of a preprint server. In an effort to improve the rigour, reproducibility and transparency of the academic record, the Access Microbiology platform will host both preprints of articles and their Version of Record (VOR) publications, as well as the reviewer reports, Editor’s decision, authors’ response to reviewers and the AI review reports. To ensure the platform meets the needs of our community, in February 2020 we conducted focus group meetings with various stakeholders. Using articles previously submitted to Access Microbiology, we undertook testing of a range of potential AI review tools and investigated the technical feasibility and utility of including these tools as part of the platform. In keeping with the open and transparent ethos of the platform, we present here a summary of the focus group feedback and AI review tool testing.
“In 2016, we introduced Open Images, a collaborative release of ~9 million images annotated with image labels spanning thousands of object categories and bounding box annotations for 600 classes. Since then, we have made several updates, including the release of crowdsourced data to the Open Images Extended collection to improve diversity of object annotations. While the labels provided with these datasets were expansive, they did not focus on sensitive attributes for people, which are critically important for many machine learning (ML) fairness tasks, such as fairness evaluations and bias mitigation. In fact, finding datasets that include thorough labeling of such sensitive attributes is difficult, particularly in the domain of computer vision.
Today, we introduce the More Inclusive Annotations for People (MIAP) dataset in the Open Images Extended collection. The collection contains more complete bounding box annotations for the person class hierarchy in 100k images containing people. Each annotation is also labeled with fairness-related attributes, including perceived gender presentation and perceived age range. With the increasing focus on reducing unfair bias as part of responsible AI research, we hope these annotations will encourage researchers already leveraging Open Images to incorporate fairness analysis in their research….”
“Project MUSE is partnering with UNSILO, a Cactus Communications (CACTUS) brand that develops artificial intelligence(AI)-powered solutions for publishers, to implement robust new AI-driven content recommendations throughout its massive collection of books and journals in the humanities and social sciences. UNSILO recently completed the initial indexing of the Project MUSE content collection and enhanced related content recommendations appear throughout the platform.
The UNSILO Recommender API automatically identifies links to relevant content from the MUSE content corpus for any selected document (book chapter or journal article). The indexing is updated every 24 hours as new content is added to MUSE. Links are delivered to the platform in real time, enriching the user experience and providing relevance-ranked discovery that augments the learning experience. Over 250 concepts are extracted from every document, and then matched by rank with related material. …”
This decade will see the tipping point reached for open research content between the [top down] expansion of OA initiatives from commercial publishers and the [bottom up] support for Open Science efforts from within the academy. Having more content freely available and more content on the same platforms enables large scale analyses. The economic models are shifting from the value of the content at the unit level to the deployment of tools to uncover intelligence in a large body of content….”
“In the race to harness the power of cloud computing, and further develop artificial intelligence, academics have a new concern: falling behind a fast-moving tech industry. In the US, 22 higher education institutions, including Stanford and Carnegie Mellon, have signed up to a National Research Cloud initiative seeking access to the computational power they need to keep up. It is one of several cloud projects being called for by academics globally, and is being explored by the US Congress, given the potential of the technology to deliver breakthroughs in healthcare and climate change….”
“To help scientists deal with the increasing volume of published scientific literature, a research team at the I School is designing ScholarPhi, an augmented reading interface that makes scientific papers more understandable and contextually rich.
The project is led by UC Berkeley School of Information Professor Marti Hearst, and includes UC Berkeley postdoctoral fellows Andrew Head and Dongyeop Kang, and collaborators Raymond Folk, Kyle Lo, Sam Sjonsberg, and Dan Weld from the Allen Institute for AI (AI2) and the University of Washington. It is funded in part by the Alfred P. Sloan Foundation and by AI2.
ScholarPhi broadens access to scientific literature by developing a new document reader user interface and natural language analysis algorithms for context-relevant explanations of technical terms and notation….”
“Semantic Reader Beta is an augmented reader with the potential to revolutionize scientific reading by making it more accessible and richly contextual.
Observations of scientists reading technical papers showed that readers frequently page back and forth looking for the definitions of terms and mathematical symbols as well as for the details of cited papers. This need to jump around through the paper breaks the flow of paper comprehension.
Semantic Reader provides this information directly in context by dimming unrelated text and providing details in tooltips, and soon will also provide corresponding term definitions. It uses artificial intelligence to understand a document’s structure. Usability studies show readers answered questions requiring deep understanding of paper concepts significantly more quickly with ScholarPhi than with a baseline PDF reader; furthermore, they viewed much less of the paper.
Based on the ScholarPhi research from the Semantic Scholar team at AI2, UC Berkeley and the University of Washington, and supported in part by the Alfred P. Sloan Foundation, the Semantic Reader is now available in beta for a select group of arXiv papers on semanticscholar.org with plans to add additional features and expand coverage soon….”
“This is our proposal for how we might create a radically new scholarly publishing system with the potential to disrupt the scholarly publishing industry. The proposed model is: (a) open, (b) objective, (c) crowd sourced and community-controlled, (d) decentralised, and (e) capable of generating prestige. Submitted articles are openly rated by researchers on multiple dimensions of interest (e.g., novelty, reliability, transparency) and ‘impact prediction algorithms’ are trained on these data to classify articles into journal ‘tiers’.
In time, with growing adoption, the highest impact tiers within such a system could develop sufficient prestige to rival even the most established of legacy journals (e.g., Nature). In return for their support, researchers would be rewarded with prestige, nuanced metrics, reduced fees, faster publication rates, and increased control over their outputs….”
Abstract: The number of scholarly publications grows steadily every year and it becomes harder to find, assess and compare scholarly knowledge effectively. Scholarly knowledge graphs have the potential to address these challenges. However, creating such graphs remains a complex task. We propose a method to crowdsource structured scholarly knowledge from paper authors with a web-based user interface supported by artificial intelligence. The interface enables authors to select key sentences for annotation. It integrates multiple machine learning algorithms to assist authors during the annotation, including class recommendation and key sentence highlighting. We envision that the interface is integrated in paper submission processes for which we define three main task requirements: The task has to be . We evaluated the interface with a user study in which participants were assigned the task to annotate one of their own articles. With the resulting data, we determined whether the participants were successfully able to perform the task. Furthermore, we evaluated the interface’s usability and the participant’s attitude towards the interface with a survey. The results suggest that sentence annotation is a feasible task for researchers and that they do not object to annotate their articles during the submission process.
“The archive’s catalog currently holds more than 120 million digital records, as well as “archival metadata and other types of records, including electronic databases.” However, the system has “an unsophisticated search” function, according to a request for information.
While NARA employees add metadata tags to digital records, “There is a delta between what NARA has been able to describe and the specific information that users want from our records,” the RFI states, asking, “Can AI fill the gap?”
During an informational day held in early April, NARA executives outlined some of the challenge, including a single search returning a flood of results from the same source—making it difficult to sift through to find multiple sources—and difficulty distinguishing between records with similar names, such as a search for “Truman” the president versus “Truman” the aircraft carrier.
The current search function also is not able to return accurate results if the search term input is not exactly the same as it exists in the metadata.
The RFI is seeking feedback on automated solutions that can analyze how users search the digital archives and associate those search terms with the appropriate record….”