“The Scholarly Orphans project explores an institution driven approach to discover, capture, and archive scholarly artifacts that researchers deposit in productivity web portals as a means to collaborate and communicate with their peers. The project is funded by the Andrew W. Mellon Foundation and is a collaboration between the Prototyping Team of the Los Alamos National Laboratory and the Web Science and Digital Library Research Group at Old Dominion University.
myresearch.institute and scholarlyorphans.org are components in a limited-term experiment conducted as part of the Scholarly Orphans project. The experiment is set up as an automated pipeline that is coordinated by an institutional orchestrator process, as depicted below. It was started on August 1 2018 and will be terminated on March 31 2020.
The modules in the pipeline are as follows:
Discovery of new artifacts deposited by a researcher in a portal is achieved by a Tracker that recurrently polls the portal’s API using the identity of the researcher in each portal as an access key. If a new artifact is discovered, its URI is passed on to the capture process.
Capturing an artifact is achieved by using web archiving techniques that pay special attention to generating representative high fidelity captures. A major project finding in this realm is the use of Traces that abstractly describe how a web crawler should capture a certain class of web resources. A Trace is recorded by a curator through interaction with a web resource that is an instance of that class. The result of capturing a new artifact is a WARC file in an institutional archive. The file encompasses all web resources that are an essential part of the artifact, according to the curator who recorded the Trace that was used to guide the capture process.
Archiving is achieved by ingesting WARC files from various institutions into a cross-institutional web archive that supports the Memento “Time Travel for the Web” protocol. As such, the Mementos in this web archive integrate seamlessly with those in other web archives….”