Opinion: Pros and Cons of Google vs. Subscription Databases

“During my time overseeing the library services department of a large school district, we found our subscription databases were generally a well-kept secret. The lack of trained school librarians available to teach these resources was part of the issue. But Google was ubiquitous, as was Wikipedia, and they became de facto research sources for students, despite their limitations for such a role.

Google has its place for students and researchers (I used it for this article), as does Google Scholar (which I also used). But for students, subscription databases should also play a central research role, beginning with age-appropriate sources for elementary kids – like National Geographic – and moving up to “Gale in Context” for middle school students, and more scholarly articles for high schoolers from sources like ABC-CLIO….”

 

Book Review – Along Came Google: A History of Library Digitization – The Scholarly Kitchen

“Meanwhile, Google had only just gone public with an IPO in 2004. That year, at the Frankfurt Book Fair, Google announced its Publisher Program, which promised to support the same type of search functionality. Publishers willingly signed up, unaware that the Library Project would be announced two months later. The Library Project was ambitious, digitizing titles acquired for collections held at Harvard, Stanford, the University of Michigan, the Bodleian Library at Oxford University, and the New York Public Library. This was a breathtaking step farther than Amazon, and the information community was thunderstruck as it tried to process the implications of what such an expansion could mean. 

This is the story that is told in Along Came Google: A History of Library Digitization by Deana Marcum and Roger Schonfeld (full disclosure, Roger is a regular contributor to this blog). Note the subtitle. This book documents from a library perspective the implications and long-term impact of Google’s move to make a significant corpus of “offline content searchable online” through optimized means of scanning and digitization. The outcome of Google’s ambitious project would ultimately be diminished, due to constraints resulting from extended legal battles, but key library leadership has managed to create the infrastructure needed to sustain and carry on the massive digitization needed. There were significant barriers to that work, as the authors note, despite the fact that “in this story, there are many actors, all of good intentions. Inevitably, it is also a story of limitations and failures to collaborate.” …”

Google turns AlphaFold loose on the entire human genome | Ars Technica

“Just one week after Google’s DeepMind AI group finally described its biology efforts in detail, the company is releasing a paper that explains how it analyzed nearly every protein encoded in the human genome and predicted its likely three-dimensional structure—a structure that can be critical for understanding disease and designing treatments. In the very near future, all of these structures will be released under a Creative Commons license via the European Bioinformatics Institute, which already hosts a major database of protein structures.

In a press conference associated with the paper’s release, DeepMind’s Demis Hassabis made clear that the company isn’t stopping there. In addition to the work described in the paper, the company will release structural predictions for the genomes of 20 major research organisms, from yeast to fruit flies to mice. In total, the database launch will include roughly 350,000 protein structures….”

Google Dataset Search: Using open access tools during the research process – News – Illinois State

“We often discuss publications and publishing open access (OA) materials in these news items, but the OA movement can be a part of many other steps of the research process. Many researchers choose to make the datasets their research is based on open access as well. This can be done as part of a funding institution’s requirements, to increase transparency and reproducibility, or simply because they wish to make their data easily available to other researchers.

One way students and faculty can find these datasets is through Google Dataset Search. Out of beta in early 2020, Google Dataset Search can be used to find links to datasets that have been published on the web and described via the schema.org standard. The internet does not include all datasets, and not all are described using this standard, but Google does claim that over 25 million datasets are indexed for searching….”

Google AI Blog: A Step Toward More Inclusive People Annotations in the Open Images Extended Dataset

“In 2016, we introduced Open Images, a collaborative release of ~9 million images annotated with image labels spanning thousands of object categories and bounding box annotations for 600 classes. Since then, we have made several updates, including the release of crowdsourced data to the Open Images Extended collection to improve diversity of object annotations. While the labels provided with these datasets were expansive, they did not focus on sensitive attributes for people, which are critically important for many machine learning (ML) fairness tasks, such as fairness evaluations and bias mitigation. In fact, finding datasets that include thorough labeling of such sensitive attributes is difficult, particularly in the domain of computer vision.

Today, we introduce the More Inclusive Annotations for People (MIAP) dataset in the Open Images Extended collection. The collection contains more complete bounding box annotations for the person class hierarchy in 100k images containing people. Each annotation is also labeled with fairness-related attributes, including perceived gender presentation and perceived age range. With the increasing focus on reducing unfair bias as part of responsible AI research, we hope these annotations will encourage researchers already leveraging Open Images to incorporate fairness analysis in their research….”

Open search tools need sustainable funding – Research Professional News

“The Covid-19 pandemic has triggered an explosion of knowledge, with more than 200,000 papers published to date. At one point last year, scientific output on the topic was doubling every 20 days. This huge growth poses big challenges for researchers, many of whom have pivoted to coronavirus research without experience or preparation.

Mainstream academic search engines are not built for such a situation. Tools such as Google Scholar, Scopus and Web of Science provide long, unstructured lists of results with little context.

These work well if you know what you are looking for. But for anyone diving into an unknown field, it can take weeks, even months, to identify the most important topics, publication venues and authors. This is far too long in a public health emergency.

The result has been delays, duplicated work, and problems with identifying reliable findings. This lack of tools to provide a quick overview of research results and evaluate them correctly has created a crisis in discoverability itself. …

Building on these, meta-aggregators such as Base, Core and OpenAIRE have begun to rival and in some cases outperform the proprietary search engines. …”

Google Books: how to get the full text of public domain books

“While Google Books has digitised millions of books all over the world with the help of thousands of libraries as part of the Library Project, not all of those digitised books are freely available on the website. Books that are still in copyright cannot be consulted in full-text, even though you might see a snippet preview.

Sometimes, however, Google has not assessed the copyright correctly and the book is not publicly available, although Google has scanned it and it is out-of-copyright. That is the case with all books published before 1900 and some books published between 1900 and 1930.

When you know that Google Books has a scan of a book available, and you believe that the book should be in the public domain, you can ask Google to re-evaluate the copyright situation of that publication. The Google Books team will give you an answer in a couple of days….”

Cambridge University Library joins Google Arts and Culture

“Cambridge University Library (UL) is the first institution of the University of Cambridge to join the [Google Arts and Culture] platform and joins organisations such as the British Museum, Rijksmuseum and the White House, among many others, who share their collections freely, and openly, with the world….”

Campus Activated Subscriber Access (CASA) – Highwire Press

“HighWire and Google co-developed CASA (Campus Activated Subscriber Access) as an authentication enhancement that improves the authentication for off-campus users of Google Scholar.  CASA is free and is automatically enabled for all HighWire-hosted Journals that are indexed in Google Scholar.

How does it work?

When a user is on-campus, they often connect to a University network. When connected, if they visit Google Scholar, Google automatically creates an affiliation between that user and their institution.  This affiliation allows Google Scholar to record that the user has subscription privileges granted by that institution. With Google CASA, this same seamless authentication follows the user when they take their device to any off-campus location.   Once the affiliation is created, it grants them immediate access to the articles and Journals that their institution subscribes to even when the user is off campus….”

Campus Activated Subscriber Access (CASA) – Highwire Press

“HighWire and Google co-developed CASA (Campus Activated Subscriber Access) as an authentication enhancement that improves the authentication for off-campus users of Google Scholar.  CASA is free and is automatically enabled for all HighWire-hosted Journals that are indexed in Google Scholar.

How does it work?

When a user is on-campus, they often connect to a University network. When connected, if they visit Google Scholar, Google automatically creates an affiliation between that user and their institution.  This affiliation allows Google Scholar to record that the user has subscription privileges granted by that institution. With Google CASA, this same seamless authentication follows the user when they take their device to any off-campus location.   Once the affiliation is created, it grants them immediate access to the articles and Journals that their institution subscribes to even when the user is off campus….”

Google AI Blog: An NLU-Powered Tool to Explore COVID-19 Scientific Literature

“Due to the COVID-19 pandemic, scientists and researchers around the world are publishing an immense amount of new research in order to understand and combat the disease. While the volume of research is very encouraging, it can be difficult for scientists and researchers to keep up with the rapid pace of new publications. Traditional search engines can be excellent resources for finding real-time information on general COVID-19 questions like “How many COVID-19 cases are there in the United States?”, but can struggle with understanding the meaning behind research-driven queries. Furthermore, searching through the existing corpus of COVID-19 scientific literature with traditional keyword-based approaches can make it difficult to pinpoint relevant evidence for complex queries.

To help address this problem, we are launching the COVID-19 Research Explorer, a semantic search interface on top of the COVID-19 Open Research Dataset (CORD-19), which includes more than 50,000 journal articles and preprints. We have designed the tool with the goal of helping scientists and researchers efficiently pore through articles for answers or evidence to COVID-19-related questions….”

Google’s new AI-powered search tool helps researchers with coronavirus queries

“Google‘s AI team has released a new tool to help researchers traverse through a trove of coronavirus papers, journals, and articles. The COVID-19 research explorer tool is a semantic search interface that sits on top of the COVID-19 Open Research Dataset (CORD-19). …”