Arxiv Arcana

Nat Gustafson-Sundell wrote:

NGS: “I don’t expect local repositories to ever offer quality control.”

Of course not. They are merely offering a locus for authors to provide free access to their preprint drafts before submitting them to journals for peer review, and to their final drafts (postprints) after they have been peer-reviewed and accepted for publication by a journal.

Individual institutions cannot peer-review their own research output (that would be in-house vanity-publishing).

And global repositories like arxiv or pubmedcentral or citeseerx or google scholar cannot assume the peer-review functions of the thousands and thousands of journals that are actually doing the peer- review today. That would add billions to their costs (making each into one monstrous (generic?) megajournal: near impossible, practically, if it weren’t also totally unnecessary — and irrelevant to OA and its costs).

NGS: “Also, users have said again and again that they prefer discovery by subject, which will be possible for semantic docs in local repositories or better indexes (probably built through better collaborations), but not now.”

Search should of course be central and subject-tagged, over a harvested central collection from the distributed local IRs, not local, IR by IR.

(My point was that central deposit is no longer necessary nor desirable, either for content-provision or for search. The optimal system is institutional deposit (mandated by institutions as well as funders) and then central harvesting for search.

NGS: “I agree that it would be great if local repositories were more used, and eventually, the systems will be in place to make it possible, but every study I’ve seen still shows local repository use to remain disappointingly low, although some universities are doing better than others.”

“Use” is ambiguous, as it can refer both to author use (for deposit) and user use (for search and retrieval). We agree that the latter makes no sense: users search at the harvester level, not the IR level.

But for the former (low author “use,” i.e., low levels of deposit), the solution is already known: Unmandated IRs (i.e., most of the existing c. 1500 IRs) are near empty (of OA’s target content, which is preprints and postprints of peer-reviewed journal articles) whereas mandated IRs (c. 150, i.e.m 1%!) are capturing (or on the way to capturing) their full annual postprint output.

So the solution is mandates. And the locus of deposit for both institutional and funder mandates should be institutional, not central, so the two kinds of mandates converge rather than compete (requiring multiple deposit of the same paper).

For the special case of arxiv, with its long history of unmandated deposit, a university’s IR could import its own remote arxiv deposits (or export its local deposits to arxiv) with software like SWORD, but eventually it is clear that institution-external deposit makes no sense:

Institutions are the universal providers of all peer-reviewed research, funded and unfunded, across all fields. One-stop/one-step local deposit (followed by automatic import. export. and harvesting to/ from whatever central services are needed) is the only sensible, scaleable and sustainable system, and also the one that is most conducive to the growth of universal OA deposit mandates from institutions, reinforced by funder mandates likewise requiring institutional deposit, rather than discouraged by gratuitously requiring institution-external deposit.

NGS: “Inter-institutional repositories by subject area (however broadly defined) simply work better, such as arXiv or even the Princeton-Stanford repository for working papers in the classics.”

“Work better” for what? Deposit or search? You are conflating the locus of search (which should, of course, be cross-institutional) with the locus of deposit, which should be institutional, in order to accelerate institutional deposit mandates and in order to prevent discouraging adoption and compliance because of the prospect of having to deposit the same paper in more than one place.

(Yes, automatic import/export/harvesting software is indifferent to whether it is transferring from local IRs to central CRs or from central CRs to local IRs, but the logistics and pragmatics of deposit and deposit mandates — since the institution is always the source of the content — make it obvious that one-time deposit institutionally fits all output, systematically and tractably, whereas willy-nilly IR/CR deposit, depending on fields’ prior deposit habits or funder preferences is a recipe for many more years of the confusion, inaction, absence of mandates, and near-absence of OA content that we have now.)

NGS: “Currently, universities are paying external middlemen an outsized fee for validation and packaging services. These services can and should be brought “in-house” (at least as an ideal/ goal to develop toward whenever the opportunities can be seized) except in cases where prices align with value, which occurs still with some society and commercial publications.”

I completely agree that along with hosting their own peer-reviewed research output, and mandating its deposit in their own IRs, institutions can also use their IRs (along with specially developed software for this purpose) to showcase, manage, monitor, and measure their own research output. That is what OA metrics (local and global) will make possible.

But not till the problem of getting the content into OA IRs is solved. And the solution is institutional and funder mandates — for institutional (not institution-external) deposit.

NGS: “To the extent that an arXiv or the inter-institutional repository for humanities research which will be showing up in 3-7 years moves toward offering these services, they are clearly preferable to old fashioned subscription models (since the financial support is for actual services) and current local repositories which do not offer everything needed in the value chain (as listed in Van de Sompel et al. 2004).”

(1) The reason 99% of IRs offer no value is that 99% of IRs are at least 85% empty. Only the 1% that are mandated are providing the full institutional OA content — funded and unfunded, across all disciplines — that all this depends on.

(2) The central collections, as noted, are indispensable for the services they provide, but that does not include locus of deposit and hosting: There, central deposit is counterproductive, a disservice.

(3) With local hosting of all their research output, plus central harvesting services, institutions can get all they need by way of search and metrics, partly through local statistics, partly from central ones.

NGS: ” I remember when I first read an article quoting a researcher in an arXiv covered field who essentially said that journals in his field were just for vanity and advancement, since all the “action” was in arXiv (Ober et al. 2007 quoting Manuel 2001 quoting McGinty 1999) — now think about the value of a repository that doesn’t just store content and offer access.”

This familiar slogan, often voiced by longstanding arxiv users, that “Journals are obsolete: They’re only for tenure committees. We [researchers] only use the arxiv” is as false, empirically, as it is incoherent, logically: It is just another instance of the “Simon Says” phenomenon: (Pay attention to what Simon actually does, not to what he says.)

Although it is perfectly true that most arxiv users don’t bother to consult journals any more — using the OA version in arxiv only, and referring to the journal’s canonical version-of-record only in citing — it is equally (and far more relevantly) true that they all continue to submit all those papers to peer-reviewed journals, and to revise them according to the feedback from the referees, until they are accepted and published.

That is precisely the same thing that all other researchers are doing, including the vast majority that do not self-archive their peer-reviewed postprints (or, even more rarely, their unrefereed preprints) at all.

So journals are not just for vanity and advancement; they are for peer review. And arxiv users are just as dependent on that as all other researchers. (No one has ever done the experiment of trying to base all research usage on nothing but unrefereed preprints and spontaneous user feedback.)

So the only thing that is true in what “Simon says” is that when all papers are available, OA, as peer-reviewed final drafts (and sometimes also supplemented earlier by the prerefereeing drafts) there is no longer any need for users or authors to consult the journal’s proprietary version of record. (They can just cite it, sight unseen.)

But what follows from that is that journals will eventually have to scale down to becoming just peer-review service-providers and certifiers (rather than continuing also to be access-providers or document producers, either on-paper or online).

Nothing follows from that about the value of repositories, except that they are useless if they do not contain the target content (at least after peer review, and, where possible and desired by authors, also before peer review).

Harnad, S. (1998/2000/2004) The invisible hand of peer review. Nature [online] (5 Nov. 1998), Exploit Interactive 5 (2000): and in Shatz, B. (2004) (ed.) Peer Review: A Critical Inquiry. Rowland & Littlefield. Pp. 235-242.

NGS: “Do I think the financial backing will remain in place? It depends on the services actually offered and to what extent subject repositories could replace a patchwork system of single titles offered by a patchwork of publishers.”

At the moment the issue is whether arxiv, such as it is (a central locus for institution-external deposit of institutional research content in some fields, mostly physics, plus a search and alerting service), can be sustained by voluntary sub-sidy/scription — not whether, if arxiv also somehow “took over” the function of journals (peer review), that too could be paid for by voluntary sub-sidy/ scription

NGS: “Universities could save a great deal by refusing to pay the same overhead over and over again to maintain complete collections in single subject areas (not to mention paying for other people’s profits).”

I can’t quite follow this: You mean universities can cancel journal subscriptions? How do those universities’ users then get access to those cancelled journals’ contents, unless they are all being systematically made OA? Apart from those areas of physics where it has already been happening since 1991, that isn’t going to happen in most other fields till OA is mandated by the universal providers of that content, the universities (reinforced by mandates from their funders).

Then (but only then) can universities cancel their journal subscriptions and use (part of) their windfall saving to pay (journals!) for the peer-review of their own research output, article by article (instead of buying in other universities’ output, journal by journal).

NGS: “More importantly, more could be done to make articles useful and discoverable in a collaborative environment, from metadata to preservation, so that the value chain is extended and improved (my sci-fi includes semantic docs, not just cataloged texts, and improved, or multi-stage, peer review, or peer review on top of a working papers repository).”

All fine, and desirable — but not until all the OA content is being provided, and (outside of physics), it isn’t being provided — except when mandated…

So let’s not build castles in Spain before we have their contents safely in hand.

NGS: “I think there’s been plenty of ‘chatter’ to indicate that the basic assumptions in conversations between universities are changing (see recent conference agendas), so that we can expect to see more and more practical plans to collaborate on metadata, preservation, and , yes, publications.”

I’ll believe the “chatter” when it has been cashed into action (deposit mandates). Till then it’s just distraction and time-wasting.

NGS: “My head spins to think of the amount of money to be saved on the development of more shared platforms, although, the money will only be saved if other expenditures are slowly turned off.”

All this talk about money, while the target content — which could be provided at no cost — is still not being provided (or mandated)…

NGS: “Sandy mentioned in another post that she [he] would hope for arXiv like support for university monographs…”

Monographs (not even a clearcut case, like peer-reviewed articles, which are all, already, author give-aways, written only for usage and impact) are moot, while not even peer-reviewed articles are being deposited, or mandated…

NGS: “Open access and NFP publications which do offer the full value chain have been proven to have much lower production costs per page than FP publishers and they do not suffer any impact disadvantages — and these are still operated on a largely stand-alone basis, without the advantages that can be gained by sharing overhead.”

Cash castles in Spain again, while the free content is not yet being provided or mandated…

NGS: “Maybe local repositories really are the way to go, since then each institution has more control over its own contribution, but the collaboration and the support will still need to occur to support discovery (implying metadata, both in production and development of standards and tools) and preservation.”

No, search and preservation are not the problem: content is.

NGS: “I suppose another problem with local repositories, however, is that a consensus is far less likely to unite around local repositories as a practical option at this juncture — the case can’t just be made with words, you need the numbers and arXiv has them — and while I am interested to see strong local repositories emerge, there is greater sense in supporting what can be achieved, since we need more steps in the right direction.”

“The numbers” say the following:

Physicists have been depositing their preprints and postprints spontaneously (unmandated) in arxiv since 1991, but in the ensuing 20 years this commendable practice has not been taken up by other disciplines. The numbers, in other words, are static, and stagnant. The only cases in which they have grown are those where deposit was mandated (by institutions and funders).

And for that, it no longer makes sense (indeed it goes contrary to sense) to deposit them institutional-externally, instead of mandating institutional deposit and then harvesting centrally.

And the virtue of that is that it distributes the costs of managing deposits sustainably, by offloading them onto each institution, for its own output, instead of depending on voluntary institutional sub-sidy/scription for obsolete and unnecessary central deposit.

(See also the “denominator fallacy,” which arises when you compare the size of size of central repositories with the size of institutional repositories: The world’s 25,000 peer-reviewed journals publish about 2.5 million articles annually, across all fields. A repository’s success rate is the proportion of its annual target contents that are being deposited annually. For an institution, the denominator is its own total annual peer-reviewed journal article output across all fields. For a central repository, it is the total annual article output — in the field(s) it covers — from all the institutions in the world. Of course the central repository’s numerator is greater than any single institutional repository’s numerator. But its denominator is far greater still. Arxiv has famously been doing extremely well for certain areas of physics, unmandated, for two decades. But in other areas arxiv is not not doing so well, relative to the field’s true denominator; and most other central repositories are likewise not doing well, In fact, it is pretty certain that — apart from physics, with its 2-decade tradition of deposit, plus a few other fields such as economics (preprints) and computer science — unmandated central repositories are doing exactly as badly unmandated institutional repositories are doing, namely, about 15%.)

Stevan Harnad
American Scientist Open Access Forum