Celebrating 25 Years of Preserving the Web – The Scholarly Kitchen

This week, the Internet Archive is celebrating its 25th anniversary of its first crawl of the world-wide-web and its first snapshot of our collective digital lives. Since the first crawl, the scale of the Internet Archive’s collection was grown astoundingly. Its regular snapshots of the internet have grown to comprise some 588 billion web pages in total. So much flourishes on the internet; both the amazing and the frightening, the mundane and the radical. The Internet Archive preserves it all. While most know of its work capturing and preserving snapshots of the internet, the Internet Archive also preserves an amazing amount of our digital heritage. The Archive has digitized and archived more than 28 million books and texts, 14 million audio recordings, 6 million video recordings, 3.5 million images and more than a half million software programs.

The Flickr Foundation

“The Flickr Commons program was launched in 2008 and has become a unique collection of historical photography shared with the Flickr community by 114 cultural institutions around the world.

This year, 13 years after it launched, we’ve taken time to evaluate the program and figure out how to reinvigorate it after a period of neglect. We have an opportunity to preserve the Flickr Commons collection resolutely and use techniques and tactics we develop to protect the longevity of the larger Flickr corpus….

We believe the establishment of a non-profit Flickr Foundation will combine with Flickr to properly preserve and care for the Flickr Commons archive, support Commons members to collaborate in a true 21st-century Commons, and plan for the very long-term health and longevity of the entire Flickr collection. We’re also in the early stages of imagining other educational and curatorial initiatives to highlight and share the power of photography for decades to come….”

 

Scoping future data services for the arts and humanities – UKRI

“Apply for funding to explore ways to archive arts and humanities research data….

Your proposal could focus on one of the following:

 

large or complex 3D objects
‘born-digital’ material and complex digital objects
practice research, including performance and visual arts….”

Scholarly journals should use “Archived on” instead of “Accessed on” | chem-bla-ics

“Publishing habits changes very slowly, too slowly. The whole industry is incredibly inert, which can lead to severe frustration as it did for me. But sometimes small changes can do so much. 

Linkrot, the phenomenon that URLs are not persistent, has been studied, including the in scholarly settings (see 1998, 2000, 2003, 2006, 2008, 2014, 2015, 2000, 2021, and probably many more). Indeed, scholarly publishers started introducing the following: URLs should be accompanied with an “accessed on” statement. Indeed, you can find this in many bibliographic formatting standards.

Indeed, this must change, and we already have a solution since 1996: the Internet Archive (tho the archive goes back much longer). I call all publishers to change their “Accessed on” to “Archived on”. Two simpel solutions that can compliment each other:

Authors archive upon submission

This solution is simply introduced by updating author guidelines. Surely it will take a bit of time for bibliography software to be updated, and for the time being we still write “Accessed on” until there is proper support of “Archived on”.

Journals archive upon acceptance…

BTW, projects like Wikipedia have automated the process of archiving URLs and I see no reason why publishers could not do this.”

The Andrew W. Mellon Foundation Awards NYU $502,400 For Libraries Project to Expand Capabilities For Preserving Digital Scholarship | Research Guides at New York University

With a major new grant of $502,400 to NYU from The Andrew W. Mellon Foundation, the NYU Division of Libraries and its project partners will deepen their exploration and analysis of digital preservation methods and the extent to which they can preserve complex scholarly publications. The goal is to support publishers in making design choices that result in publications, including very complex ones, that can be preserved at scale without sacrificing functionality. 

The three-year project, Embedding Preservability, will be conducted by the NYU Libraries’ Digital Library Technology Services (DLTS) unit and led by Assistant Dean and DLTS Director David Millman. The work follows a recently completed, two-year project, also funded by the Mellon Foundation, in which DLTS, working with digital preservation practitioners and academic presses, developed an extensive set of digital-publishing guidelines for ensuring effective preservability. The Embedding Preservability project will refine, expand, and operationalize these guidelines.

[…]

Vacancy for Systems Administrator at the Open Preservation Foundation – Open Preservation Foundation

“The Open Preservation Foundation is looking for an experienced, motivated and versatile Systems Administrator to join our team in a new part-time role.

This is an opportunity for the right candidate to develop new skills, contribute to open source software projects and work with some of the leading minds in digital preservation….”

Vacancy for Systems Administrator at the Open Preservation Foundation – Open Preservation Foundation

“The Open Preservation Foundation is looking for an experienced, motivated and versatile Systems Administrator to join our team in a new part-time role.

This is an opportunity for the right candidate to develop new skills, contribute to open source software projects and work with some of the leading minds in digital preservation….”

Where Did the Web Archive Go?

Abstract:  To perform a longitudinal investigation of web archives and detecting variations and changes replaying individual archived pages, or mementos, we created a sample of 16,627 mementos from 17 public web archives. Over the course of our 14-month study (November, 2017 – January, 2019), we found that four web archives changed their base URIs and did not leave a machine-readable method of locating their new base URIs, necessitating manual rediscovery. Of the 1,981 mementos in our sample from these four web archives, 537 were impacted: 517 mementos were rediscovered but with changes in their time of archiving (or Memento-Datetime), HTTP status code, or the string comprising their original URI (or URI-R), and 20 of the mementos could not be found at all.

 

News and Stories – Research Guides at New York University

“The Alfred P. Sloan Foundation has awarded New York University a grant of $520,503 to enable libraries and other institutions to reliably archive digital scholarship, with a focus on research code, for long-term accessibility. Vicky Rampin, NYU’s Research Data Management and Reproducibility Librarian, designed the project with her co-principal investigator, Martin Klein, Research Scientist at the Los Alamos National Laboratory (LANL).

The project follows Investigating and Archiving the Scholarly Git Experience (IASGE), an extensive NYU Libraries study also funded by the Sloan Foundation and led by Rampin, examining the landscape of current research software archiving efforts and the behavior of academics using Git and Git Hosting Platforms for scholarly reasons. The findings of both facets of IASGE underscore the vulnerability of scholarship on these platforms, from lack of holistic archival practices for research code to gaps in the research software management landscape that make long-term access more difficult. As Rampin and Klein wrote in their most recent proposal: “These factors leave us with little hope for long-term access to and availability of our scholarly artifacts on the Web.” …”

NYU Wins Major Grant From Alfred P. Sloan Foundation To Expand Capabilities For Archiving Digital Scholarship

“The Alfred P. Sloan Foundation has awarded New York University a grant of $520,503 to enable libraries and other institutions to reliably archive digital scholarship, with a focus on research code, for long-term accessibility. Vicky Rampin, NYU’s Research Data Management and Reproducibility Librarian, designed the project with her co-principal investigator, Martin Klein, Research Scientist at the Los Alamos National Laboratory (LANL).

The project follows Investigating and Archiving the Scholarly Git Experience (IASGE), an extensive NYU Libraries study also funded by the Sloan Foundation and led by Rampin, examining the landscape of current research software archiving efforts and the behavior of academics using Git and Git Hosting Platforms for scholarly reasons. The findings of both facets of IASGE underscore the vulnerability of scholarship on these platforms, from lack of holistic archival practices for research code to gaps in the research software management landscape that make long-term access more difficult. As Rampin and Klein wrote in their most recent proposal: “These factors leave us with little hope for long-term access to and availability of our scholarly artifacts on the Web.” …”

“Optional Data Curation Feature Use by Harvard Dataverse Repository Users” by Ceilyn Boyd

Abstract:  Objective: Investigate how different groups of depositors vary in their use of optional data curation features that provide support for FAIR research data in the Harvard Dataverse repository.

Methods: A numerical score based upon the presence or absence of characteristics associated with the use of optional features was assigned to each of the 29,295 datasets deposited in Harvard Dataverse between 2007 and 2019. Statistical analyses were performed to investigate patterns of optional feature use amongst different groups of depositors and their relationship to other dataset characteristics.

Results: Members of groups make greater use of Harvard Dataverse’s optional features than individual researchers. Datasets that undergo a data curation review before submission to Harvard Dataverse, are associated with a publication, or contain restricted files also make greater use of optional features.

Conclusions: Individual researchers might benefit from increased outreach and improved documentation about the benefits and use of optional features to improve their datasets’ level of curation beyond the FAIR-informed support that the Harvard Dataverse repository provides by default. Platform designers, developers, and managers may also use the numerical scoring approach to explore how different user groups use optional application features.

Are repositories the key to institutional resilience? | Research Information

“I think it’s fair to say that the purpose of a repository has fundamentally evolved to become a far more encompassing and essential tool for institutions across the world since Covid-19 was declared a global pandemic.

The repository (sometimes referred to as an institutional repository) has expanded in its use and purpose with many more stakeholders realising the additional value it can reward them with, notably by using it for open access to share knowledge and materials and increase collaboration….

The repository has evolved from being predominantly used as a storage solution for research data, to become a hub for learning and collaboration. According to Universities UK Open Access (OA) Coordination Group, institutional repositories are ‘now meeting a broad national need in support of OA and in so doing form an essential component of national research infrastructure.’…”

Reflections as the Internet Archive turns 25 – Internet Archive Blogs

“As a young man, I wanted to help make a new medium that would be a step forward from Gutenberg’s invention hundreds of years before. 

By building a Library of Everything in the digital age, I thought the opportunity was not just to make it available to everybody in the world, but to make it better–smarter than paper. By using computers, we could make the Library not just searchable, but organizable; make it so that you could navigate your way through millions, and maybe eventually billions of web pages.

The first step was to make computers that worked for large collections of rich media. The next was to create a network that could tap into computers all over the world: the Arpanet that became the Internet. Next came augmented intelligence, which came to be called search engines. I then helped build WAIS–Wide Area Information Server–that helped publishers get online to anchor this new and open system, which came to be enveloped by the World Wide Web.  

By 1996, it was time to start building the library….”