Arxiv’s Funding Pains May Be A Wake-Up Call: Distributed Versus Central Archiving

Comments on:

Ginsparg, Paul (2011) Arxiv at 20. Nature 476: 145?147 doi:10.1038/476145a

&

Fischman, Josh (2011) Anonymous FTP Achives. The First Free Research-Sharing Site, arXiv, Turns 20 With an Uncertain Future. Chronicle of Higher Education August 10, 2011

Anonymous FTP archives. Arxiv (1991) was an invaluable milestone on the road to Open Access. But it was not the first free research-sharing site: That began in the 1970’s with the internet itself, with authors making their papers freely accessible to all users net-wide by self-archiving them in their own local institutional “anonymous FTP archives.”

Distributed local websites. With the creation of the world wide web in 1990, HTTP began replacing FTP sites for the self-archiving of papers on authors’ institutional websites. FTP and HTTP sites were mostly local and distributed, but accessible free for all, webwide. Arxiv was the first important central HTTP site for research self-archiving, with physicists webwide all depositing their papers in one central locus (first hosted at Los Alamos). Arxiv’s remarkable growth and success were due to both its timeliness and the fact that it had emerged from a widespread practice among high energy physicists that had already predated the web, namely, to share hard copies of their papers before publication by mailing them to central preprint distribution sites such as SLAC and CERN.

Central harvesting and search. At the same time, while physicists were taking to central self-archiving, in other disciplines (particularly computer science), distributed self-archiving continued to grow. Later web developments, notably google and webwide harvesting and search engines, continued to make distributed self-archiving more and more powerful and attractive. Meanwhile, under the stimulus of Arxiv itself, the Open Archives Initiative (OAI) was created in 1999 — a metadata-harvesting protocol that made all distributed OAI-compliant websites interoperable, as if their distributed local contents were all in one global, searchable archive.

No need for direct central deposit in google! Together, google and OAI probably marked the end of the need for central archives. The cost and effort can instead be distributed across institutions, with all the essential search and retrieval functionality provided by automated central “overlay” services for harvesting, indexing, search and retrieval (e.g., OAIster, Scirus, Base and Google Scholar). Arxiv continues to flourish, because two decades of invaluable service to the physics community has several generations of users deeply committed to it. But no other dedicated central archive has arisen since. Like computer scientists, whose local, distributed self-archiving is harvested centrally by Citeseerx, economists, for example, self-archive institutionally, with central harvesting by RepEc.

Mandating self-archiving. In biomedicine, PubMed Central looks to be an exception, with direct central depositing rather than local. But PubMed Central was not a direct author initiative, like anonymous FTP, author websites or Arxiv. It was designed by NLM, deposit was mandated by NIH, and deposit is done not only by authors but by publishers.

Institutions are the universal research providers. Open Access is still growing far more slowly than it might, and one of the factors holding it back might be notional conflicts between institutional and central archiving. It is clear that Open Access self-archiving will have to be universally mandated, if all disciplines are to enjoy its benefits (maximized research access, uptake, usage and impact, minimized costs). The universal providers of all research paper output, funded and unfunded, are the world’s universities and research institutions, distributed globally across all scholarly and scientific disciplines, all languages, and all national boundaries.

Deposit institutionally, harvest centrally. Hence funder self-archiving mandates like NIH’s and institutional self-archiving mandates like Harvard’s need to join forces to reinforce one another rather than to complete for the same papers, and the most natural, efficient and economical way to do this is for both institutiions and funders to mandate that all self-archivingshould be done locally, in the author’s institutional OAI-compliant repository. The contents of the institutional repositories can then be harvested automatically by central OAI-compliant repositories such as PubMed Central (as well as by google and other central harvesters) for global indexing and search.

Distribute the archiving, rather than the cost. In this light, Arxiv’s self-funding pains may be a wake-up call: Why should Cornell University (or a “wealthy donor”) subsidize a cost that institutions can best “sponsor” by each doing (and mandating) their own distributed archiving locally (thereby reducing total cost, to boot)? After all, no one deposits directly in Google?

Stevan Harnad
EnablingOpenScholarship


How to Integrate University and Funder Open Access Mandates

SUMMARY: Research funder openaccess mandates (such as NIH‘s) and university openaccess mandates (such as Harvard‘s) are complementary. There is a simple way to integrate them to make them synergistic and mutually reinforcing:
      Universities’ own Institutional Repositories (IRs) are the natural locus for the direct deposit of their own research output: Universities (and research institutions) are the universal research providers of all research (funded and unfunded, in all fields) and have a direct interest in archiving, monitoring, measuring, evaluating, and showcasing their own research assets — as well as in maximizing their uptake, usage and impact.
      Both universities and funders should accordingly mandate deposit of all peer-reviewed final drafts (postprints), in each author’s own university IR, immediately upon acceptance for publication, for institutional and funder record-keeping purposes. Access to that immediate postprint deposit in the author’s university IR may be set immediately as Open Access if copyright conditions allow; otherwise access can be set as Closed Access, pending copyright negotiations or embargoes. All the rest of the conditions described by universities and funders should accordingly apply only to the timing and copyright conditions for setting open access to those deposits, not to the depositing itself, its locus or its timing.
      As a result, (1) there will be a common deposit locus for all research output worldwide; (2) university mandates will reinforce and monitor compliance with funder mandates; (3) funder mandates will reinforce university mandates; (4) legal details concerning openaccess provision, copyright and embargoes will be applied independently of deposit itself, on a case by case basis, according to the conditions of each mandate; (5) opt-outs will apply only to copyright negotiations, not to deposit itself, nor its timing; and (6) any central OA repositories can then harvest the postprints from the authors’ IRs under the agreed conditions at the agreed time, if they wish.