Digital History and the Politics of Digitization | Digital Scholarship in the Humanities | Oxford Academic

Abstract:  Much has been made in recent years of the transformative potential of digital resources and historical data for historical research. Historians seem to be flooded with retro-digitized and born-digital materials and tend to take these for granted, grateful for the opportunities they afford. In a research environment that increasingly privileges what is available online, the questions of why, where, and how we can access what we can access, and how it affects historical research have become ever more urgent. This article proposes a framework through which to contextualize the politics of (digital) heritage preservation, and a model to analyse its most important political dimensions, drawing upon literature from the digital humanities and history as well as archival, library, and information science. The first part will outline the global dimensions of the politics of digital cultural heritage, focusing on developments between and within the Global North and South, framed within the broader context of the politics of heritage and its preservation. The second part surveys the history and current state of digitization and offers a structured analysis of the process of digitization and its political dimensions. Choices and decisions about selection for digitization, how to catalogue, classify, and what metadata to add are all political in nature and have political consequences, and the same is true for access. The article concludes with several recommendations and a plea to acknowledge the importance of digital cataloguing in enabling access to the global human record.


UCLA Library to expand global preservation work thanks to largest grant in its history | UCLA

Key takeaways:

In four years, the Modern Endangered Archives Program has published content from 11 collections, featuring more than 12,000 objects from 11 countries.
The program has preserved audio recordings, political ephemera, photography, newspapers and financial ledgers.
The preserved collections are publicly accessible and digitally preserved, while the physical materials remain in their origin countries.


Experimenting with repository workflows for archiving: Automated ingest | Community-led Open Publication Infrastructures for Monographs (COPIM)

by Ross Higman

In a recent post, my colleague Miranda Barnes outlined the challenges of archiving and preservation for small and scholar-led open access publishers, and described the process of manually uploading books to the Loughborough University institutional repository. The conclusion of this manual ingest experiment was that while university repositories offer a potential route for open access archiving of publisher output, the manual workflow is prohibitively time- and resource-intensive, particularly for small and scholar-led presses who are often stretched in these respects.

Fortunately, many institutional repositories provide routes for uploading files and metadata which allow for the process to be automated, as an alternative to the standard web browser user interface. Different repositories offer different routes, but a large proportion of them are based on the same technologies. By experimenting with a handful of repositories, we were therefore able to investigate workflows which should also be applicable to a much broader spread of institutions.



Long-term availability of data associated with articles in PLOS ONE | PLOS ONE

Abstract:  The adoption of journal policies requiring authors to include a Data Availability Statement has helped to increase the availability of research data associated with research articles. However, having a Data Availability Statement is not a guarantee that readers will be able to locate the data; even if provided with an identifier like a uniform resource locator (URL) or a digital object identifier (DOI), the data may become unavailable due to link rot and content drift. To explore the long-term availability of resources including data, code, and other digital research objects associated with papers, this study extracted 8,503 URLs and DOIs from a corpus of nearly 50,000 Data Availability Statements from papers published in PLOS ONE between 2014 and 2016. These URLs and DOIs were used to attempt to retrieve the data through both automated and manual means. Overall, 80% of the resources could be retrieved automatically, compared to much lower retrieval rates of 10–40% found in previous papers that relied on contacting authors to locate data. Because a URL or DOI might be valid but still not point to the resource, a subset of 350 URLs and 350 DOIs were manually tested, with 78% and 98% of resources, respectively, successfully retrieved. Having a DOI and being shared in a repository were both positively associated with availability. Although resources associated with older papers were slightly less likely to be available, this difference was not statistically significant, suggesting that URLs and DOIs may be an effective means for accessing data over time. These findings point to the value of including URLs and DOIs in Data Availability Statements to ensure access to data on a long-term basis.



MacDonald (2022) Imagining networked scholarly communication: self-archiving, academic labour, and the early internet

Corina MacDonald (2022) Imagining networked scholarly communication: self-archiving, academic labour, and the early internet, Internet Histories, DOI: 10.1080/24701475.2022.2103987


This essay explores the emergence of self-archiving practices in the 1990s as a form of academic labour that is intimately tied to the popularisation of the Internet. It argues that self-archiving is part of a sociotechnical imaginary of networked scholarly communication that has helped to shape understandings of digital scholarship and dissemination over the past three decades. Focussing on influential texts written by open access archivangelist Stevan Harnad in 1990 and 1994, the essay analyzes the language and discursive strategies used to promote self-archiving as form of collective scholarly exchange. Through these writings, Harnad helped to articulate scholars to the Internet as a medium of publication, with impacts still seen today in policy discussions around open access and the public good that shape relations of knowledge production under contemporary forms of capitalism.


Information Retention in the Multi-platform Sharing of Science

Abstract:  The public interest in accurate scientific communication, underscored by recent public health crises, highlights how content often loses critical pieces of information as it spreads online. However, multi-platform analyses of this phenomenon remain limited due to challenges in data collection. Collecting mentions of research tracked by Altmetric LLC, we examine information retention in the over 4 million online posts referencing 9,765 of the most-mentioned scientific articles across blog sites, Facebook, news sites, Twitter, and Wikipedia. To do so, we present a burst-based framework for examining online discussions about science over time and across different platforms. To measure information retention we develop a keyword-based computational measure comparing an online post to the scientific article’s abstract. We evaluate our measure using ground truth data labeled by within field experts. We highlight three main findings: first, we find a strong tendency towards low levels of information retention, following a distinct trajectory of loss except when bursts of attention begin in social media. Second, platforms show significant differences in information retention. Third, sequences involving more platforms tend to be associated with higher information retention. These findings highlight a strong tendency towards information loss over time – posing a critical concern for researchers, policymakers, and citizens alike – but suggest that multi-platform discussions may improve information retention overall.


New COPIM Scoping Report Published on Archiving and Preserving Open Access Monographs | Community-led Open Publication Infrastructures for Monographs (COPIM)

by Miranda Barnes

Work Package 7 of the COPIM Project has released their Scoping Report, identifying and examining the key challenges associated with archiving and preserving open access monographs, particularly those published by small and scholar-led presses.


ATG Interviews Alicia Wise, Executive Director of CLOCKSS – Charleston Hub

“ATG:  When you accepted the position, you remarked that CLOCKSS was “a profoundly important service.”  For those unfamiliar with CLOCKSS and its mission, can you tell us why it’s so important?  What essential services does CLOCKSS offer to those in the world of scholarly communications?  Is there anything unique about those services?

AW:  The mission of the CLOCKSS archive is to ensure the scholarly record remains available for humanity.  Scholars have worked so hard to advance knowledge, and their hard work is important to us all and especially to those scholars who will build on this foundation in the future.  Digital preservation is too big a job for any single organization, and even were it possible, it’s too important a job to entrust to any single organization, and so the community approach of CLOCKSS along with, and more broadly, LOCKSS is inspiring.

At CLOCKSS we focus on electronic publications.  Initially this meant books and journals, but now it means books, journals, and much more.  We are preserving all the rich resources that underpin articles and books (think data, protocols, software, visualizations), and entirely new forms of scholarship too (think scholar-led, interactive humanities resources published by academics or libraries).

CLOCKSS is a dark archive which means the content entrusted to us is made accessible only after the original or successor creators and publishers are no longer able to look after it.  When CLOCKSS provides access to the content, it becomes open access to everyone in perpetuity. …”

Experimenting with repository workflows for archiving: Manual ingest | Community-led Open Publication Infrastructures for Monographs (COPIM)

Barnes, M. (2022). Experimenting with repository workflows for archiving: Manual ingest. Community-Led Open Publication Infrastructures for Monographs (COPIM).

Over the course of the last year (2021-2022), colleagues in COPIM’s archiving and preservation team have been considering ways to solve the issues surrounding the archiving and preservation of open access scholarly monographs. Most large publishers and many University presses have existing digital preservation relationships with digital preservation archives, but small and scholar-led publishers lag behind due to lack of resource.

One of the potential solutions we have been considering is the university repository as open access archive for some of these presses. COPIM includes a number of scholar-led presses, such as Mattering Press, meson press, Open Humanities Press, Open Book Publishers and punctum books. Partners on the project also include UCSB Library and Loughborough University Library. In cooperation with Loughborough University Library, we began to run some preliminary repository workflow experimentations to see what might be possible, using books from one of the partner publishers.

Loughborough University employs Figshare as their primary institutional repository, so we began with this as a test bed for our experimentations.



The Effectiveness and Durability of Digital Preservation and Curation Systems | Ithaka S+R

Oya Y. Rieger, Roger C. Schonfeld, Liam Sweeney (2022) The Effectiveness and Durability of Digital Preservation and Curation Systems.

Executive Summary

Our cultural, historic, and scientific heritage is increasingly being produced and shared in digital forms, whether born-digital or reformatted from physical materials. There are fundamentally two different types of approaches being taken to preservation: One is programmatic preservation, a series of cross-institutional efforts to curate and preserve specific content types or collections usually based on the establishment of trusted repositories. Examples of providers in this category that provide programmatic preservation include CLOCKSS, Internet Archive, HathiTrust, and Portico.[1] In addition, there are third-party preservation platforms, which are utilized by individual heritage organizations that undertake their own discrete efforts to provide curation, discovery, and long-term management of their institutional digital content and collections.[2]

In August 2020, with funding from the Institute of Library and Museum Services (IMLS), Ithaka S+R launched an 18-month research project to examine and assess the sustainability of these third-party digital preservation systems. In addition to a broad examination of the landscape, we more closely studied eight systems: APTrust, Archivematica, Arkivum, Islandora, LIBNOVA, MetaArchive, Samvera and Preservica. Specifically, we assessed what works well and the challenges and risk factors these systems face in their ability to continue to successfully serve their mission and the needs of the market. In scoping this project and selecting these organizations, we intentionally included a combination of profit-seeking and not-for-profit initiatives, focusing on third-party preservation platforms rather than programmatic preservation.

Because so many heritage organizations pursue the preservation imperative for their collections with increasingly limited resources, we examine not only the sustainability of the providers but also the decision-making processes of heritage organizations and the challenges they face in working with the providers.

Our key findings include:

The term “preservation” has become devalued nearly to the point of having lost its meaning. Providers are marketing their offerings as “preservation systems” regardless of actual functionality or storage configurations. Many systems marketed as preservation systems usually address only some aspects of preservation work, such as providing workflow systems (and user interfaces) to streamline the process of moving content into and out of a storage layer.
Because no digital preservation system is truly turnkey, digital preservation cannot be fully outsourced. Digital preservation is a distributed and iterative activity that requires in-house expertise, adequate staffing, and access to different technologies and systems. While it is possible to outsource key components of the digital preservation process to a system provider, no digital preservation system is truly turnkey. Today, it is neither feasible nor desirable for a heritage organization to outsource responsibility for its digital preservation program.
Heritage organizations select preservation systems within the context of marketplace competition. Many observers believe that heritage organizations should support not-for-profit solutions based on shared values and other common principles. But this has not always been the principal driver of organizational behavior. Providers compete within a marketplace that recognizes organizational values as one characteristic among many, such as the total cost of implementation and the feasibility of local implementation.
The not-for-profit preservation platforms are at risk. They tend to have limited capital and have comparatively ponderous governance structures. As a result, many have not been able to innovate quickly enough to keep up with the needs of heritage organizations. Their business and governance models are often ill-suited to the demands of a competitive marketplace, even if growth is not their primary objective. It seems reasonable to forecast additional mergers or buy outs (if not outright failures) among this category of providers.
The growing reliance on profit-seeking providers carries risks. The profit-seekers tend to pursue a growth strategy, and by this measure they are succeeding. Private capital and a decision to scale across multiple sectors has enabled this category of providers t

Lakota elders helped a white man preserve their language. Then he tried to sell it back to them.

“ay Taken Alive had been fighting for this moment for two years: At his urging, the Standing Rock Sioux Tribal Council was about to take the rare and severe step of banishing a nonprofit organization from the tribe’s land. 

The Lakota Language Consortium had promised to preserve the tribe’s native language and had spent years gathering recordings of elders, including Taken Alive’s grandmother, to create a new, standardized Lakota dictionary and textbooks. 


But when Taken Alive, 35, asked for copies, he was shocked to learn that the consortium, run by a white man, had copyrighted the language materials, which were based on generations of Lakota tradition. The traditional knowledge gathered from the tribe was now being sold back to it in the form of textbooks.  

“No matter how it was collected, where it was collected, when it was collected, our language belongs to us. Our stories belong to us. Our songs belong to us,” Taken Alive, who teaches Lakota to elementary school students, told the tribal council in April. …”