NASIG – Open Access Content Under Threat: Internet Archive and Portico

“With disruptions in print supply chains and cutbacks on server and staff support, the threat of unpreserved content disappearing is greater than ever. Join us for an informational overview and deep dive into how technology is being used to preserve websites and their underlying content. Stephanie Orphan, Director of Content Preservation for Portico will discuss the preservation service’s efforts to preserve OA content and successes in providing access to it when it disappears. Jefferson Bailey, Director, Web Archiving and Data Services at Internet Archive will provide an update on Internet Archive Scholar and its efforts to ensure at risk content remains available.

In 2018, the Internet Archive undertook a large-scale project to build as complete a collection as possible of scholarly outputs published on the web, as well as improve the discoverability and accessibility of scholarly works archived as part of these global web harvests. This project involved a number of areas of work: targeted archiving of known OA publications (especially at-risk “long tail” publications), extraction and augmentation of bibliographic metadata and full text, integration and preservation of related identifier, registry, and aggregation services and datastores, partnerships with affiliated initiatives and joint service developments, and creation of new tools and machine learning approaches for identifying archived scholarly work in existing global scale born-digital and web collections. The project also identifies and archives associated research outputs such as blogs, datasets, code repos, and other secondary research objects. The alpha public interface, not yet officially announced, can be found at https://scholar-qa.archive.org/ and the testing and catalog temporarily hosted at https://fatcat.wiki/. Portico has long been preserving OA content and is currently preserving more than 5,000 OA journals from 309 publishers. They currently provide access to 114 of these OA journals, which were otherwise no longer available online for use by researchers (these are referred to as triggered titles). Portico is actively exploring methods of preserving more of the most vulnerable scholarly content and seeking input from the community on this topic. Whether you are a digital preservation expert or new to the scene, this session will offer something for you.”