Coyle’s InFormation: Digitization Wars, Redux

“From 2004 to 2016 the book world (authors, publishers, libraries, and booksellers) was involved in the complex and legally fraught activities around Google’s book digitization project. Once known as “Google Book Search,” the company claimed that it was digitizing books to be able to provide search services across the print corpus, much as it provides search capabilities over texts and other media that are hosted throughout the Internet. 

Both the US Authors Guild and the Association of American Publishers sued Google (both separately and together) for violation of copyright. These suits took a number of turns including proposals for settlements that were arcane in their complexity and that ultimately failed. Finally, in 2016 the legal question was decided: digitizing to create an index is fair use as long as only minor portions of the original text are shown to users in the form of context-specific snippets. 

We now have another question about book digitization: can books be digitized for the purpose of substituting remote lending in the place of the lending of a physical copy? This has been referred to as “Controlled Digital Lending (CDL),” a term developed by the Internet Archive for its online book lending services. The Archive has considerable experience with both digitization and providing online access to materials in various formats, and its Open Library site has been providing digital downloads of out of copyright books for more than a decade. Controlled digital lending applies solely to works that are presumed to be in copyright. …”

HathiTrust: A Digital Library Revolution Takes Flight

“The phrase “closed until further notice due to COVID-19” has become all too familiar. And, while we have started to grow accustomed to losing access to many resources that typically define our community existence, there’s one that’s particularly crucial to student and faculty researchers: libraries. For some, it may be easy to write off libraries as “nice-to-have.” But for scholars, they are essential. And as library doors began to shutter throughout California and much of the world, the potential impact on the academic community was profound.

Thankfully, the University of California has been preparing for this moment for decades. In 2008, the UC Libraries co-founded HathiTrust, and started contributing scanned copies of books and journals to the new organization. Based at the University of Michigan (U-M), HathiTrust is a large-scale repository of digital content collaboratively created by academic and research institutions. As researchers lost access to vital hard-copy materials, it initiated an Emergency Temporary Access Service (ETAS) to give UC researchers critical access to more than 13 million digital volumes. This revolution has been immediately impactful — and a profound advancement in sharing digital content….”

CSU Explores the Possibility of a Google Books Partnership – Cal schol.com

“Just heard yesterday that our CSU Council of Library Deans (COLD) approved a request I’d made to send records of our entire CSU print holdings to Google Books for evaluation. Google Books will run a comparison of their current digitized holdings against our holdings and evaluate on their end whether a digitization partnership makes sense. If it does, then the CSU will consider whether it might make sense for us as well….”

CSU Explores the Possibility of a Google Books Partnership – Cal schol.com

“Just heard yesterday that our CSU Council of Library Deans (COLD) approved a request I’d made to send records of our entire CSU print holdings to Google Books for evaluation. Google Books will run a comparison of their current digitized holdings against our holdings and evaluate on their end whether a digitization partnership makes sense. If it does, then the CSU will consider whether it might make sense for us as well….”

Why a National Emergency Library Would Have Been Unnecessary – Disruptive Competition Project

“Last week, in response to the COVID-19 pandemic, the Internet Archive announced the National Emergency Library (“NEL”), which expanded digital access to the books in its collection. The New Yorker welcomed it as “a gift to readers everywhere.” Predictably, the Authors Guild, the Copyright Alliance, and the Association of American Publishers condemned the move as infringing copyright. Overlooked in this controversy is that had the 2008 attempted settlement of the litigation over the Google Library Project been approved by the court, the NEL would likely have been unnecessary….”

B2fxxx: Carl Malamud at the Open University

“Without asking publishers’ permission, Malamud has put a lot of stuff online via a project at Jawaharlal Nehru University (JNU) in India – 125 million journal articles from many sources, from the mid 19th century up to the present.

The storage facility is air-gapped and not connected to the internet. Researchers who want access can bring their computers to the facility and text & data mine the materials there. Without having to read or download the articles which is not permitted, they can, nevertheless, draw scientific insights, thereby circumventing any potential copyright problems. The terms and conditions are modeled on those of the HathiTrust and the store specialises in bioinformatics. The access model is 3-tiered:

Tier 0 is air-gapped and pdfs of the articles

Tier 1 is extracted texts and is also air-gapped

Tier 2 is facts. As there is no copyright on facts, this can be made available openly to everyone….

In 2016 the US Supreme Court rejected the Authors Guild’s request to further appeal the decision, ending the more than a decade long litigation. The Authors Guild also tried suing the HathiTrust but were unsuccessful in that case too. The technicalities of the case were different.  One interesting angle was that the court made a point of noting the value of the HathiTrust approach to making the books available to print disabled and visually impaired.

The bottom line was that Google Books and the HathiTrust were given the ok by the US courts.

In the UK text and data mining is permitted only for non-commercial use. …”

Google Books 2020 Update | Communications

“What would you do if Google came to you and said: You have 1 million items that we would like to scan for you and make available to the world?

Over the past two years, a team from Access Services, Stacks Management, Library Technology Services, Information and Technical Services, Harvard Depository, and ReCAP have been attempting to do just that as part of a Harvard Library Digital Strategies and Innovation (DSI) initiative. This project began nearly a decade after our first partnership with Google Books, and it has been an opportunity to approach this work differently — to identify the challenges that we face at each step of the workflow and to look for creative, iterative ways to meet them….

Between 2004 and 2009, Google scanned 891,164 volumes from Harvard. Google has begun reprocessing those materials, enhancing and correcting the raw images and running them through updated OCR to create better, more searchable, machine-readable text.  

As part of this relationship, we are involved in the Google Library Partners group, an active community of our colleagues from peer institutions who also share their materials with Google. As a group we have been able to advocate for and contribute to reviews for handling of materials, quality assurance in scanning, and expanded treatments for items with foldouts or materials of non-traditional size. We have also led a review of how our peers provide access to materials and are actively partnering with HathiTrust to conduct more research into how users find and utilize these materials….”

4.5 Million UC Volumes Digitized & UC’s Most Popular Full View Books in HathiTrust for 2019 – California Digital Library

“The University of California Libraries recently contributed the 4,500,000th digitized book from their collections to HathiTrust Digital Library–a tremendous achievement resulting from 15 years of continuous digitization work. 

The vast majority of these millions of volumes were generated via the Google Books Library Project, which UC joined in 2006. That year the mass digitization of UC’s library collections began in earnest when the Northern Research Library Facility (NRLF) started sending books to the Google Books Library Project for scanning. UC’s work with the Google Books Library Project has never paused–by the time UC’s 3,000,000th volume was digitized in 2010, UC San Diego, UC Santa Cruz, and UCLA had all begun sending collections to Google for digitization. Since then, UC San Francisco, the Southern Research Library Facility (SRLF), UC Davis, UC Berkeley, UC Riverside, UC Irvine, and UC Santa Barbara have all participated, with UC Santa Barbara, UC Berkeley, UC San Diego, UC Riverside, UCLA, and NRLF continuing to do so….”

The Rebirth of Copyright As an Opt-In System? – The Media Institute

“For most of the history of Anglo-American copyright law, copyright was an opt-in system: Authors had to jump through certain regulatory hoops if they wanted to prevent others from copying their works without consent.  These threshold formalities included registering their works with a government agency, affixing a notice to published copies, depositing exemplars with a centralized library, and more.  A failure to comply with the requirements usually meant a diminution in the authors’ copyright entitlement – and in some cases a wholesale forfeiture, under which the works would pass immediately into the public domain.

After some 200 years, however, U.S. copyright abandoned its formal requirements.  Beginning in 1976 and culminating in 1989, Congress responded to complaints from authors (who had sometimes lost protection due to what they viewed as a technicality) and to pressure to join the international copyright community (which forbade most formalities).  Copyright law accordingly underwent a conversion from opt-in to opt-out.

As a result, copyright protection now arises by operation of law, without any action by the author.  As long as a work contains a modicum of originality and is fixed in some tangible form, copyright automatically protects it, and authors must affirmatively disclaim the entitlement if they don’t want its protection.  And these threshold requirements of originality and fixation are incredibly minimal, such that every reader of this essay is probably the owner of hundreds, and quite possibly thousands, of copyrights – in everything from diary entries to doodles….

Of course, any opt-in proposal would face a number of political obstacles, including the fact that predicating copyright protection on any formality (at least for foreign works) is inconsistent with the international copyright conventions to which the United States is a party.  But the Internet does not stop at the border; if opt-in makes sense here, it will make sense abroad as well.  When the United States and its trade partners are done figuring out what to do with Google Books, then, they should consider a return to copyright’s roots.  Make copyright opt-in once more….”

ASECS at 50: Interview with Robert Darnton

“Of the potential solutions, open research practices are among the most promising. The argument is that transparency acts as an implicit quality control process. If others are able to scrutinise our work—not just the final published output, but the underlying data, code, and so on—researchers will be incentivised to ensure these are high quality.

So, if we think that research could benefit from improved quality control, and if we think that open research might have a role to play in this, why aren’t we all doing it? In a word: incentives….”

Sapping Attention: How badly is Google Books search broken, and why?

I periodically write about Google Books here, so I thought I’d point out something that I’ve noticed recently that should be concerning to anyone accustomed to treating it as the largest collection of books: it appears that when you use a year constraint on book search, the search index has dramatically constricted to the point of being, essentially, broken….

What’s going on? I don’t know. I guess I blame the lawyers: I suspect that the reasons have to do with the way the Google books project has become a sort of Herculaneum-on-the-Web, frozen in time at the moment that anti-Books lawsuits erupted in earnest 11 years ago. The site is still littered with pre-2012 branding and icons, and the still-live “project history” page ends with the words “stay tuned…” after describing their annual activity for 2007….”

The Foxfire of Fair Use: The Google Books Litigation and the Future of Copyright Law – infojustice

Abstract:  This article considers the dynamic evolution of copyright exceptions and limitations in the United States in light of new technological developments. There has been significant legal debate in the courts and in the United States Congress in respect of the scope of the defence of fair use. The copyright litigation over Google Books has been a landmark development in the modern history of copyright law. The victory by Google Inc. over The Authors Guild in the decade long copyright dispute is an important milestone on copyright law. The ruling of Leval J emphasizes the defence of fair use in the United States plays a critical role in promoting transformative creativity, freedom of speech, and innovation. The Supreme Court of the United States was decisive in its rejection of The Authors Guild’s efforts to challenge the decision of Leval J. There has been significant debate in the United States Copyright Office and United States Congress over the development of ‘the Next Great Copyright Act’. Hearings have taken place within the United States Congressional system about the history, nature, and future of the defence of fair use under United States copyright law. There remains much debate about the internationalisation of the defence of fair use, and the need for the trading partners of the United States to enjoy similar flexibilities in respect of copyright exceptions. There has been concern about the impact of mega-regional trade agreements – such as the Trans-Pacific Partnership – upon copyright exceptions – such as the defence of fair use.

Authors Guild v. Google, Part II: Fair Use Proceedings | Electronic Frontier Foundation

“The Authors Guild filed an appeal to the U.S. Court of Appeals for the Second Circuit, which was argued on December 3, 2014. On October 16, 2015, the Second Circuit affirmed the district court and agreed that Google Books was a fair use.  In April 2016, the U.S. Supreme Court turned down the Guild’s request that it review the case.”

Why I joined the Authors Alliance | DSPS Press

“Of all the absurdities associated with the Authors Guild suit against Google over the Google Books Project, perhaps the greatest was the Guild’s efforts to make it a class action, with the 8,000 members of the Guild speaking for all authors everywhere in the world.  Most academic authors realize that providing a keyword index to all published literature can only aid scholarship.  At the same time, by making it easier to identify works that might be of interest, Google Books can only increase readership and sales of the original works.  Yet at the time of the lawsuit, there was no organization that could speak for authors motivated by concerns that were not solely commercial. Now there is.  On 21 May, the Authors Alliance was formally launched in San Francisco….