Celebrating UC San Diego’s Mass Contributions to HathiTrust

“This month, the University of California San Diego will send its final shipment of carts filled with library books to be digitized by Google as part of the Google Books Library Project. In total, well over half a million UC San Diego Library volumes have been sent to Google to be digitized and deposited in HathiTrust. This will be the third time the university has taken part in the project. 

UC San Diego joined as an early Google Books partner in 2008. From 2008 to 2011 the campus sent over 470,000 volumes to be digitized. Since rejoining the project in 2017, UC San Diego Library has sent over 111,000 books. The project paused in March 2020 due pandemic shutdowns, but the campus resumed sending shipments to be digitized in November 2021.

Tens of thousands of volumes from UC San Diego’s International Relations, Pacific Studies, and East Asian Language collections were digitized in the first three years of the project. When UC San Diego rejoined the Google Library Project in 2017, the campus included large numbers of US federal government documents, dissertations, and special collections volumes in its shipments. But throughout all phases of participation in the project, hundreds of thousands of books from the general library collection were digitized….”

Strategic Visioning: HathiTrust in the Future | March 2023 | HathiTrust Digital Library

“The world has changed dramatically in the 15 years since HathiTrust’s creation and even more so in the 5 years since we adopted our 2019-2023 Strategic Directions. Despite the global disruption and changes of recent years — as well as 47% membership growth — we have followed the course laid out in that plan. We are now poised to draw on the strength of our accomplishments and prepare to serve the future needs of our membership. To do so, we are launching an in-depth process of exploration, discovery, and strategic visioning, to begin our next 15 years. Your participation is vital to creating a vision for our future services and programs that will benefit your library and communities, so we encourage you to participate wherever you can.   What We’re Doing We’ve partnered with Athenaeum 21, a long-standing digital strategy and technology planning consultancy and collaborator in the library and cultural heritage community. They will guide the 3-part collaborative visioning process that will take place through the end of 2023. During this process, we will connect with people throughout our member libraries — subject librarians, deans, collection managers, directors — as well as with HathiTrust staff and industry peers….”

Fair Use Week 2023 (10th Anniversary): Day Two With Guest Expert Prof. Pia Hunter | Copyright at Harvard Library

“One question that has emerged frequently these past three years, is how? How have libraries provided access to copyrighted materials for remote users? How were students able to access copyrighted materials at the height of the pandemic? When we think of a classroom, most of us consider a traditional space with walls and students together in one room. The logistics for students to access library materials from their homes seemed insurmountable to some because the copyright laws surrounding how students and teachers can gain remote access is complex. Section 110(1) sets a generous standard for how content may be used, but it only applies to face-to-face instruction. Section 110(2), the TEACH Act, allows the digital transmission of copyrighted materials, but only under limited circumstances and the requirements are difficult for many educational institutions to achieve. With these competing sections of the Copyright Act, what was the solution?…

Although the IA had announced their intention to end the emergency access by June 30, 2020, they ended the program two weeks early when publishers Hachette, Penguin Random House, Wiley, and HarperCollins announced that they would sue the IA for copyright infringement. On June 1, 2020, the publishers and several authors filed a complaint in the United States District Court for the Southern District of New York. But this case, Hachette v. Internet Archive, is not about the expanded access IA provided during the pandemic. It is a challenge to how we can use materials in a digital age and how fair use supports our right to do so….”

Fair Use Week 2023 (10th Anniversary): Day Two With Guest Expert Prof. Pia Hunter | Copyright at Harvard Library

“One question that has emerged frequently these past three years, is how? How have libraries provided access to copyrighted materials for remote users? How were students able to access copyrighted materials at the height of the pandemic? When we think of a classroom, most of us consider a traditional space with walls and students together in one room. The logistics for students to access library materials from their homes seemed insurmountable to some because the copyright laws surrounding how students and teachers can gain remote access is complex. Section 110(1) sets a generous standard for how content may be used, but it only applies to face-to-face instruction. Section 110(2), the TEACH Act, allows the digital transmission of copyrighted materials, but only under limited circumstances and the requirements are difficult for many educational institutions to achieve. With these competing sections of the Copyright Act, what was the solution?…

Although the IA had announced their intention to end the emergency access by June 30, 2020, they ended the program two weeks early when publishers Hachette, Penguin Random House, Wiley, and HarperCollins announced that they would sue the IA for copyright infringement. On June 1, 2020, the publishers and several authors filed a complaint in the United States District Court for the Southern District of New York. But this case, Hachette v. Internet Archive, is not about the expanded access IA provided during the pandemic. It is a challenge to how we can use materials in a digital age and how fair use supports our right to do so….”

HathiTrust Receives $1 Million Mellon Grant to Enhance Core Oper… | HathiTrust Digital Library

“HathiTrust, a member-based organization hosted by the University of Michigan, has received a 5-year, $1 million grant from the Mellon Foundation to fund a multi-year effort to strengthen its preservation and access mission. 

The funding will initially finance three new positions to develop an integrated program of assessment, analytics, and portfolio management for the HathiTrust organization.  “With these new capabilities in place, we can better match our resources to high impact work,” says Mike Furlough, Executive Director. “We will be able to grow our team and modernize our tools and processes, and create a more nimble and disciplined organization to meet our community’s strategic needs.”

In March 2020, HathiTrust developed the Emergency Temporary Access Service (ETAS), permitting access to digitized materials for hundreds of academic and research libraries and their communities during the height of the Covid-19 pandemic lockdowns. “Emergency services increased demand for access, and confirmed the importance of large scale digitization and long-term digital preservation. From that experience, we learned that over the next several years we need to diversify the ways that libraries and users engage with HathiTrust. I’m grateful for the Mellon Foundation’s support, which will allow us to better respond to those needs,”  Furlough says….”

 

Project LEND – UC Libraries

“In January 2023, the University of California libraries launched a landmark research project – Project LEND (Library Expansion of Networked Delivery) – to investigate the potential for expanded lawful use of digitized books held by academic and research libraries. The project seeks to analyze all aspects of a digital access program — including user needs, legal frameworks, technical requirements, and collection scope — in designing an expanded service or set of services for UC faculty, staff, and students.”

UC Libraries-research-expanding-use-digitized-books | UC Davis

“The University of California libraries — which comprise the largest university research library in the world — are launching a landmark research project to investigate the potential for expanded lawful use of digitized books held by academic and research libraries.

The Mellon Foundation is providing $1.1 million support for Project LEND (Library Expansion of Networked Delivery), a two-year project that the UC Davis Library will lead on behalf of the 10-campus UC system….

The project’s broad investigation aims to extend and strengthen the historical role of academic libraries in making information as broadly accessible as possible for use in research and education. Project teams will:

use focus groups and other methods to understand the needs of UC faculty and students for a range of research, education and clinical care scenarios
evaluate the legal frameworks under which libraries could provide expanded access to digitized books, including those still in copyright
review and analyze existing technology platforms and systems for sharing and interacting with digital books, and explore the possibilities for creating new systems and services
determine the optimal composition of a digital book collection to meet user needs; what digitized collections are currently available or where more digitization efforts may be required; and how best to manage both print and digitized collections.”

 

How one digital book led to an important COVID-19 discovery

“In early 2020, many scientists believed that particles containing COVID-19 were too large to be airborne. Medical canon held that only particles sized 5 microns or smaller could stay in the air long enough to be transmitted between people over 6 feet apart. But a team of scientists questioned the 5 micron figure. Katie Randall, then a graduate student at Virginia Tech, went to work investigating the origin of the number. “I was working on my dissertation when the pandemic hit, and I had to pause in-person research,” says Katie. “I was supposed to focus on revising my research plan, but when I got the email about this project, I knew I couldn’t say no — it was too important and too intriguing to ignore.”

In her research, Katie located an out-of-print book, Airborne Contagion and Air Hygiene: An Ecological Study of Droplet Infections, written by William Firth Wells in 1955. While she normally would have borrowed the book through an agreement between libraries to share items in their collection, pandemic closures meant that was not an option. Fortunately, she was able to locate a digital copy of the book in HathiTrust, a Google Books partner.

HathiTrust is a nonprofit collaborative of academic and research libraries which preserves digitized items — most of which come from partnerships with Google Books. “Early on, our partner libraries dedicated themselves to digital preservation,” says Mike Furlough, Executive Director of HathiTrust. “But even when preservation was the goal, our thoughts were always on providing access for research and scholarship.”

With the help of the digitized book, Katie discovered that the 5 micron threshold had no real scientific basis — in fact, the experiments detailed in Wells’ book showed the aerosolization of particles as big as 100 microns….”

2022 HathiTrust Community Week | www.hathitrust.org | HathiTrust Digital Library

“This July, join colleagues from around the world for HathiTrust Community Week, four days of member-led sessions on local projects, research, and workshops on topics from text and data mining to science fiction. All sessions are open to any interested party affiliated with a member library. You may register for as many sessions as you wish….”

HathiTrust Copyright Review Passes 1 Million Milestone | www.hat… | HathiTrust Digital Library

“The HathiTrust Copyright Review Program has met a milestone: the review of more than 1,000,000 books! The HathiTrust Copyright Review Program launched in 2008 with three consecutive IMLS National Leadership grants to responsibly ascertain copyright status of works in the HathiTrust collection. On June 2, HathiTrust reached the review of its 1 millionth HathiTrust item, bringing the total number of U.S. public domain determinations in the collection to 570,594….”

Digitization, open access and the internet aid UCLA’s return of books looted by Nazis | UCLA

“For two decades the Jewish Museum in Prague, or JMP, has undertaken a global search for lost publications from the city’s Jewish Community Library, which was looted and shuttered by Nazi occupiers during World War II. With the recent emphasis on digitization of collections by academic libraries, including UCLA’s, the museum’s work has become a lot easier and more fruitful. The JMP’s efforts to repatriate these stolen items have increased in intensity as anyone capable of using an online search tool can access these vast online repositories.

UCLA Library is one of the earliest and largest contributors to one such repository, the HathiTrust — a collaborative of academic research libraries that have thus far digitized 17 million volumes and made them full-text-searchable….”

Call for Proposals: 2022 Community Week, July 11-15 | www.hathit… | HathiTrust Digital Library

“HathiTrust Community Week is back! Members asked and we listened, so we’ve reserved the week of July 11-15 for members to go deep on all things HathiTrust — from what their library is doing with the services and collection to what users are finding that enables their teaching, learning, and scholarship. HathiTrust Community Week is dedicated to giving space for members of our wider community to share projects, research, and workshops (and anything else related to HathiTrust) with other members of the community. 

This year, we invite participants to step outside the webinar box and consider other ways to bring people into the HathiTrust world — whether it’s building a joint collection in Collection Builder in real time, teaching a research 101 course using HathiTrust, or inviting in students and faculty to help illustrate the role of HathiTrust in teaching and learning. …”

University Libraries’ Anne Conway reaches a copyright milestone – UNC-Chapel Hill Libraries

“Since 2018, preservation services supervisor Anne Conway has spent six hours each week researching the copyright status of online books. She has now completed an outstanding 50,000 assessments as a volunteer for HathiTrust’s Copyright Review Program.

HathiTrust is a not-for-profit collaborative of academic and research libraries—including the University Libraries at UNC-Chapel Hill—that preserves digital copies of more than 17 million books and other materials. When those texts are in the public domain, meaning they are free of copyright restrictions, then HathiTrust makes them accessible online for anyone to read….

It is meaningful work, but it can be complex. While all books first published in the United States before 1928 are in the public domain, reviewers like Conway must apply a rigorous review process to determine whether other texts can be made freely accessible.

That multi-step process includes assessing whether the book matches the project’s legal scope; determining whether its copyright has been renewed; and determining whether the book contains credits, permissions or acknowledgements indicating that the digital file might contain other copyrighted content.

This requires nuance and attention to detail. All copyright reviewers go through an extensive training program before they start evaluating texts, according to the HathiTrust website. Even then, each file has to be assessed by two independent reviewers who must agree on its status before it is made public….”

HATHI 1M: Introducing a Million Page Historical Prose Dataset in English from the Hathi Trust

Abstract:  We present a new dataset built on prior work consisting of 1,671,370 randomly sampled pages of English-language prose roughly divided between modes of fictional and non-fictional writing and published between the years 1800 and 2000. In addition to focusing on the “page’’ as the basic bibliographic unit, our work employs a single predictive model for the historical period under consideration in contrast to prior work. Besides publication metadata, we also provide an enriched feature set of 107 features including part-of-speech tags, sentiment scores, word supersenses and more. Our data is designed to give researchers in the digital humanities large yet portable random samples of historical writing across two foundational modes of English prose writing. We present initial insights into transformations of linguistic patterns across this historical period using our enriched features as possible pointers to future work. The data can be accessed at https://doi.org/10.7910/DVN/HAKKUA.