Automated archiving: a case study | Community-led Open Publication Infrastructures for Monographs (COPIM)

by Ross Higman

In a previous post, I summarised initial investigations by COPIM’s archiving and preservation team into the possibilities for automated archiving. This followed on from earlier experiments with manual workflows, which highlighted the prohibitive time investment that would be necessary for small and scholar-led publishers to manage archiving in this way. Due to the rich and well-structured nature of metadata within Thoth, and the options available for integrating the Thoth software with archiving platforms, we concluded that a basic level of automated ingest would be both worthwhile and eminently achievable. Three months later, we obtained proof of concept with a bulk upload of over 600 Thoth works to the Internet Archive.

This blog post will explore the steps taken to accomplish this, providing pointers for anyone looking into implementing a similar system themselves, as well as giving some background for publishers interested in joining the Thoth programme to take advantage of this feature. All code used in the process is available on GitHub under an open-source licence, as is standard for the COPIM project. The post will also outline our plans for building on this initial work as we start to develop the Thoth Archiving Network.




Thoth Archiving Network Workshop, November 2022 | Community-led Open Publication Infrastructures for Monographs (COPIM)

by Miranda Barnes

COPIM Work Package 7’s Thoth Archiving Network workshop was held virtually on Tuesday, 2nd November 2022.  Around 30 participants attended, and we thank all of you who participated and provided feedback.

Work Package 7 Lead Gareth Cole began the workshop with a presentation, updating attendees on the activities of the COPIM Project, including Opening the Future (Work Package 3), the Open Book Collective (Work Package 4), and the Thoth metadata management system (Work Package 5), Experimental Publishing (Work Package 6), and of course, Archiving & Preservation (Work Package 7). 

Gareth explained the overall values and goals of the COPIM Project and introduced the core objectives and activities of each work package. This led into the important discussion of the proposed Thoth Archiving Network, a collaboration between Work Packages 5 and 7, to create a simple dissemination system for small publishers to archive their monographs in a network of participating institutional repositories. Proof-of-concept has been developed and tested, and several universities have already agreed to take part.  

Small and scholar-led presses make up much of the “long tail” of publishers without an active preservation policy in place, putting their significant contributions to the scholarly record at risk. While large-scale publishers have existing agreements with digital preservation archives, such as CLOCKSS and Portico, the small press often languishes without financial or institutional support, alongside challenges in technical expertise and staff resource. The Thoth Archiving Network would not solve every issue, but it would be an initial step towards essential community infrastructure, allowing for presses to use a push-button deposit option to archive their publications in multiple repository locations. This would create an opportunity to safeguard against the complete loss of their catalogue should they cease to operate. 

For the second half of the workshop session, the attendees and COPIM colleagues were divided into three breakout rooms. The same two questions were posed for each group: ‘Would you be interested in joining the Thoth Archiving Network?’ and ‘What are the potential barriers for you joining the Thoth Archiving Network?’. 






Experimenting with repository workflows for archiving: Automated ingest | Community-led Open Publication Infrastructures for Monographs (COPIM)

by Ross Higman

In a recent post, my colleague Miranda Barnes outlined the challenges of archiving and preservation for small and scholar-led open access publishers, and described the process of manually uploading books to the Loughborough University institutional repository. The conclusion of this manual ingest experiment was that while university repositories offer a potential route for open access archiving of publisher output, the manual workflow is prohibitively time- and resource-intensive, particularly for small and scholar-led presses who are often stretched in these respects.

Fortunately, many institutional repositories provide routes for uploading files and metadata which allow for the process to be automated, as an alternative to the standard web browser user interface. Different repositories offer different routes, but a large proportion of them are based on the same technologies. By experimenting with a handful of repositories, we were therefore able to investigate workflows which should also be applicable to a much broader spread of institutions.