The Horrors of Good Intentions: Told through the story of a dark repository.

“The library manages a dark repository, named Dark Blue because I have no imagination, for material needing preservation but not public access such as preservation copies of digitized moving image and in-process born-digital material. You can read more about the implementation of this repository in this 2018 post. It is fair to say that Dark Blue had some growing pains over these past few years that include incorrect packaging of material and broken deposit and withdrawal workflows. While these sound like technical problems, the thesis of this post is that our troubles with Dark Blue are not based on bad systems or policies, but the limitation of people and time, and choosing to do the “nice” thing over the realistic thing….”


figshare plus

“Figshare has been helping researchers make their data openly available for more than 10 years. We want to offer a way to get more support for sharing larger datasets in a trusted generalist repository.

Figshare+ offers data deposit as a one-time Data Publishing Charge (DPC) to share the datasets and materials supporting a specific publication or project. Added features and expert guidance are also included for sharing your data FAIR-ly….”

When XML Marks the Spot: Machine-readable journal articles for discovery and preservation

“If you work with a campus-based journal program and you’re looking to expand the readership and reputation of the articles you publish, adding them to relevant archives and indexes (A&Is) presents a treasure trove of opportunities. A&Is serve as valuable content distribution networks, and inclusion in selective ones is a signal of research quality. You may have heard about XML, one of the primary machine-readable formats academic databases use to ingest content, and wonder if that’s something you need to reach your archiving and indexing goals.

This free webinar, co-hosted by Scholastica, UOregon Libraries, and the GWU Masters in Publishing program, will offer a crash course in the benefits of XML production and use cases, including:

What XML is and the different types required or preferred by academic indexes and archives (with an overview of JATS)
How producing metadata and/or full-text articles in XML can unlock discovery and archiving opportunities with examples
Additional benefits of XML for journal accessibility as well as publishing program and professional development
When XML is needed and when it may not be the best use of journal resources
Ways you can produce XML, including an overview of Scholastica’s production service…”

[2301.01189] On the long-term archiving of research data

Abstract:  Accessing research data at any time is what FAIR (Findable Accessible Interoperable Reusable) data sharing aims to achieve at scale. Yet, we argue that it is not sustainable to keep accumulating and maintaining all datasets for rapid access, considering the monetary and ecological cost of maintaining repositories. Here, we address the issue of cold data storage: when to dispose of data for offline storage, how can this be done while maintaining FAIR principles and who should be responsible for cold archiving and long-term preservation.


Program Officer, Archiving and Data Services

“Interested in a mission-driven job ensuring open access to information for a global audience? Enjoy working to ensure a diverse, expansive archive of the digital historical record? Internet Archive is seeking a Program Officer for its Archiving & Data Services team. Internet Archive is a non-profit digital library, top 200 website at, and an archive of over 99 petabytes of digital information running in self-owned and operated data centers. Internet Archive provides mission-aligned services to thousands of organizations, working collaboratively to advance the goal of “Universal Access to All Knowledge.” …”

ITHAKA and JSTOR in 2023A letter from Kevin Guthrie – ITHAKA

“In the coming months, we’ll be inviting broad community participation in a variety of initiatives to deliver on these aims.

We’re charting a path to open access for scholarly books in partnership with university presses and libraries to support publishing diverse voices and ideas
We’re fully integrating Artstor and JSTOR to deliver a high-quality, multi-content research and teaching experience
We’re launching hosting and preservation services to enable libraries to share their digital collections with millions of users around the world and to ensure their long-term sustainability
We’re taking steps to preserve emerging digital scholarship and collections of under-represented materials through experimentation and collaboration with publishers and archives
We’re rolling out an updated funding model to enable vastly increased access to the extensive journal archive and primary source collections the scholarly community has helped us to create
We’re gearing up for our next wave of growth for Constellate, our new teaching and learning platform for text analysis….”

A newspaper vanished from the internet. Did someone pay to kill it? – The Washington Post

“In many ways, the erasure of the alternative weekly, whose print and online journalism included matters such as nightlife listings as well as deep investigative work, isn’t unusual. Historians have long warned about the decay of digital news archives, which are increasingly falling victim to mishandling, indifference, bankruptcies and technical failures.

But some of the Hook’s founding journalists suspect the archive didn’t simply expire from natural causes. They think someone paid to kill it.

Their evidence, while circumstantial, is intriguing. There’s the mystery buyer who purchased the Hook archive from its longtime custodian a few months before it went dark. There’s the reluctance of people involved in that sale to say much about it….”

Building Blocks for a Scholarly Blog Archive

“If reading this post feels like it is 2006 – the year James Brown (used for the feature image of this post) died – again with talk about blogs, RSS, Markdown, Creative Commons, and related technologies (I for example didn’t mention Zotero, XML, or WordPress), you are right. This is intentional, these technologies are not as sexy as using artificial intelligence or cryptocurrencies to drive this, but I want the Science Blog archive to become a scholarly resource that is useful, open, and inclusive.”

Venkat Srinivasan: We need to push ahead for efforts in nurturing archives

“In 2009-2010, I was actively thinking about and doing scientific work during the day, conducting interviews after hours and over lunch, and doing my own writing over evenings and weekends. In hindsight, it was a formative period for me and led me to thinking critically about the form and content of archives going forward. I started to develop some basic ideas for the structure of an archive with interconnections to oral history interviews, publicly accessible information, and some methods to democratise description of historical objects. Of course, these are ideas that see parallels across generations, and one is merely building on others’ work. Again, I had luck on my side and was able to connect with the oral historian and archivist, Indira Chowdhury, in ~2012 and she was the one who very kindly connected me to NCBS, where I am now based. I was new to archiving then, and found the archiving community to be extraordinarily welcoming. I am so grateful that archivists from across the world shared ideas and material and really trained me in the past decade….”

UBC Library digitizes William Shakespeare’s First Folio – About UBC Library

“UBC Library has made its first edition of William Shakespeare’s Comedies, Histories, & Tragedies openly accessible to the public by publishing a digitized version of the volume online through Open Collections. The process to digitize the First Folio took more than a year to facilitate due to the Folio’s age and fragility….”

Digital Books wear out faster than Physical Books – Internet Archive Blogs

“Ever try to read a physical book passed down in your family from 100 years ago?  Probably worked well. Ever try reading an ebook you paid for 10 years ago?   Probably a different experience. From the leasing business model of mega publishers to physical device evolution to format obsolescence, digital books are fragile and threatened.

For those of us tending libraries of digitized and born-digital books, we know that they need constant maintenance—reprocessing, reformatting, re-invigorating or they will not be readable or read. Fortunately this is what libraries do (if they are not sued to stop it). Publishers try to introduce new ideas into the public sphere. Libraries acquire these and keep them alive for generations to come.

And, to serve users with print disabilities, we have to keep up with the ever-improving tools they use.

Mega-publishers are saying electronic books do not wear out, but this is not true at all. The Internet Archive processes and reprocesses the books it has digitized as new optical character recognition technologies come around, as new text understanding technologies open new analysis, as formats change from djvu to daisy to epub1 to epub2 to epub3 to pdf-a and on and on. This takes thousands of computer-months and programmer-years to do this work. This is what libraries have signed up for—our long-term custodial roles….”