by Ross Higman
In a previous post, I summarised initial investigations by COPIM’s archiving and preservation team into the possibilities for automated archiving. This followed on from earlier experiments with manual workflows, which highlighted the prohibitive time investment that would be necessary for small and scholar-led publishers to manage archiving in this way. Due to the rich and well-structured nature of metadata within Thoth, and the options available for integrating the Thoth software with archiving platforms, we concluded that a basic level of automated ingest would be both worthwhile and eminently achievable. Three months later, we obtained proof of concept with a bulk upload of over 600 Thoth works to the Internet Archive.
This blog post will explore the steps taken to accomplish this, providing pointers for anyone looking into implementing a similar system themselves, as well as giving some background for publishers interested in joining the Thoth programme to take advantage of this feature. All code used in the process is available on GitHub under an open-source licence, as is standard for the COPIM project. The post will also outline our plans for building on this initial work as we start to develop the Thoth Archiving Network.