6. Joining things up

A national portal designed to raise the profile of Open Accessresearch in Ireland. The Irish-African Partnership for Research Capacity Building Digital repository for the participating Irish and African universities.
Niamh Brennan, Programme Manager, Research Information Systems & Services, Trinity College Library Dublin, Ireland

OA McMemberships, Dismemberment and MC Escher

Gold OA institutional “membership” is incoherent and does not scale. It only gives the illusion of making sense if you think of it locally, and myopically. Annual institutional subscriptions to journals containing the annual outgoing refereed research of all other institutions do not morph into annual institutional memberships for publishing each institution’s own outgoing refereed research. There are 25,000 journals and 10,000 institutions! Is every single institution to commit and contract in advance to pay for its authors’ (potential) fraction of annual submissions to every single journal? Is that a “membership” or a distributed dismemberment? And is every journal to commit and contract in advance to accept every institution’s annual fraction of submissions? (Is that peer review?) This is a global oligopolistic illusion that would fit publishers just about as well as it would fit McDonalds, except there are at least 25,000 different journals to “join”, and institutions each have thousands of author-consumers with diverse dietary needs, varying day to day and year to year.

Part of the illusion of coherence comes from thinking in terms of journal-fleet publishers instead of individual journal article submissions. But this is merely another variant of the “Big Deal” strategy that has done nothing to solve either the accessibility or the affordability problem. The reality is that (Gold) OA publishing is premature today, except as a proof of principle. What is needed first is for universal (Green) OA self-archiving mandates to be adopted by institutions and funders. That will provide universal (Green) OA, which may eventually generate cancellation pressure that will induce journals to cut obsolete costs and products/services by downsizing to just providing peer review, paid for by individual institutions on an individual outgoing article basis out of a fraction of their annual windfall savings from their institutional subscription cancellations. To buy into “memberships” with fleet publishers now, pre-emptively, and at current prices, while the money is still tied up in subscriptions (which cannot, of course, be cancelled in advance, before OA) is both penny- and pound-foolish — and downright absurd if a “member” institution has not even first mandated Green OA self-archiving for all of its own refereed research output…

Stevan Harnad
American Scientist Open Access Forum

Open access roundup

OA data: recent discussion and announcements

Optimism for OA book study in Uganda

National Book Trust of Uganda, Commercial Publishers experiment with Open Access, press release, November 17, 2009.

… The difficulty in accessing learning materials for cash strapped Ugandan students is the subject of a research investigation by NABOTU. The research is exploring ways through which content providers such as commercial publishers can make available online some of their content under a flexible license such as creative commons. The research is also looking at what business models would guarantee income streams for the publishers. …

NABOTU set up a publishing experiment at the beginning of 2009 attracting one commercial publisher and an NGO. The two have now published some of their books on the Internet, available for free downloading, sharing and reading. Fountain Publishers Ltd has three titles under a creative commons license including: Genocide by denial, handbook on decentralization in Uganda and funding and implementing universal access. FEMRITE has two fiction titles including: the invisible weevil and farming ashes.

Early reports from the two companies show that the books have been well received in Uganda and abroad. The books have been downloaded many times in different countries including Uganda. The companies are optimistic about the potential of the internet for business expansion. NABOTU is currently tracking the impact of the free downloads on sales figures for each of the titles to ascertain the viability of a free access business model. …

SciELO adopts CC licenses

SciELO Brazil adopts Creative Commons attribution of access and use, Virtual Health Library Newsletter, November 16, 2009.

Scientific Electronic Library Online (SciELO) has become the most important collection of scientific periodicals of developing countries in line with the international open access movement. In its eleven years of operation SciELO has been progressively improving online publication methodologies and technologies thus keeping up with the international state of the art methods and technology for open access.

After a long process of analysis and consultations with experts, scientific editors and members of the Advisory Committee of the SciELO Brasil collection, the Creative Commons (CC) Licensing, with the minimum standard “Attribution – Non-commercial Use” (CC-BY-NC) was formally adopted by the SciELO collection for all of its content, and with the possibility for the editors to adopt the BY license with broader attribution.

The decision has been implemented in the Brazilian collection and should be extended progressively to all the SciELO Network of national and thematic collections of open access scientific periodicals. The management of intellectual property rights for the SciELO collection content started formally in September 2009, when Creative Commons was adopted. …

In order to implement the Creative Commons license, SciELO Brasil editors received a letter on the adoption of the standard CC-BY-NC license for all the periodicals that are indexed in the collection, with an option to adopt the CC-BY license which is less restrictive and more in line with the open access movement.

Of the 197 editors, ten accepted SciELO’s suggestion and adopted the CC-BY license: …

Once the Creative Commons license has been fully implemented in the SciELO Brasil collection, the license will be extended in the coming months to the other SciELO-certified collections with the support of the network coordinators.

The implementation of the Creative Commons license requires the adaptation of the procedures adopted by SciELO Brasil to the scenario of each country. The idea is to finalize the license implementation process in all certified collections by the end of 2010. …

National Academies: data and method should be public

Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age, National Academies Press, November 2009. A report of the National Academies’ Committee on Ensuring the Utility and Integrity of Research Data in a Digital Age, published in book form last week. From the summary:

… Advances in knowledge depend on the open flow of information. Only if data and research results are shared can other researchers check the accuracy of the data, verify analyses and conclusions, and build on previous work. Furthermore, openness enables the results of research to be incorporated into socially beneficial goods and services and into public policies, improving the quality of life and the welfare of society.

Despite the many benefits arising from the open availability of research data and results, many data are not publicly accessible, or their release is delayed, for a variety of reasons. …

Legitimate reasons may exist for keeping some data private or delaying their release, but the default assumption should be that research data, methods (including the techniques, procedures, and tools that have been used to collect, generate, or analyze data, such as models, computer code, and input data), and other information integral to a publicly reported result will be publicly accessible when results are reported, at no more than the cost of fulfilling a user request. This assumption underlies the following principle of accessibility:

Data Access and Sharing Principle: Research data, methods, and other information integral to publicly reported results should be publicly accessible.

Although this principle applies throughout research, in some cases the open dissemination of research data may not be possible or advisable. … Nevertheless, the main objective of the research enterprise must be to implement policies and promote practices that allow this principle to be realized as fully as possible.

This principle has important implications for researchers.

Recommendation 5: All researchers should make research data, methods, and other information integral to their publicly reported results publicly accessible in a timely manner to allow verification of published findings and to enable other researchers to build on published results, except in unusual cases in which there are compelling reasons for not releasing data. In these cases, researchers should explain in a publicly accessible manner why the data are being withheld from release.

Recommendation 6: In research fields that currently lack standards for sharing research data, such standards should be developed through a process that involves researchers, research institutions, research sponsors, professional societies, journals, representatives of other research fields, and representatives of public interest organizations, as appropriate for each particular field.

Recommendation 7: Research institutions, research sponsors, professional societies, and journals should promote the sharing of research data through such means as publication policies, public recognition of outstanding data-sharing efforts, and funding.

Recommendation 8: Research institutions should establish clear policies regarding the management of and access to research data and ensure that these policies are communicated to researchers. Institutional policies should cover the mutual responsibilities of researchers and the institution in cases in which access to data is requested or demanded by outside organizations or individuals.

€5 million project on OA repositories: OpenAIRE

Danielle Venton, OpenAIRE: archive access anytime, anywhere, International Science Grid This Week, November 25, 2009.

… Formally embracing the open access ethic, the European Commission has decided to require that results from research it funds in some fields — such as health, energy, environment, information and communication technologies, research infrastructures, social sciences and humanities — become freely available. Authors will deposit a copy of their articles in a “digital repository,” a kind of electronic library accessible through the Web.

While many institutions or subjects have their own, pre-existing repositories for published documents, these are not comprehensively linked and searchable. And some institutions hosting EC-funded researchers are without digital libraries for keeping research papers.

Stepping in to provide this open access e-infrastructure is the OpenAIRE project, which will be launched on the first of December, 2009. The project will run for three years in its first phase. OpenAIRE’s proposal, with a budget of about €5 million, was approved in September after the EC put out a call for a project that would create the e-Infrastructure to disseminate scientific results to anyone, anywhere, at anytime.

Researchers approaching OpenAIRE with a document will first be directed to the repository of their home institute, if one exists. If the researcher is in a discipline which has a repository structure for the entire discipline (the high energy physics community, for example, frequently uses arXiv.org) they will be directed there. If the document is still without a home, the researcher will use an “orphan” repository, hosted at CERN, which will provide everyone a chance to submit their results — which would otherwise be lost.

OpenAIRE technology is based on two technologies: DNET, developed by the DRIVER consortium, will connect the existing repositories, while the orphan repository technology is based on Invenio, a digital library software that has been developed by the CERN Document Server team in the IT department at CERN over the past 15 years — serving the basis for CDS. Other partners, about 35 in total, will provide service help to users. OpenAIRE will therefore be not just a technical infrastructure, but a human one as well.

“Ideally, each researcher will have a help desk in their own member state,” says Salvatore Mele, Open Access Project Leader at CERN, also working for OpenAIRE. …

Institutional vs. Central Repositories: 2 (of 2)

Simeon Warner (Arxiv, Cornell) wrote in JISC-REPOSITORIES:

SW: “Lots of money is being spent on institutional repositories and, so far, the return on that investment is quite low.”

Low compared to what? It is undeniable that most of the thousands of institutional repositories are languishing near empty. The only exceptions are the fewer than a hundred mandated ones.

But that’s the point. What’s needed is more mandates, not more “investment.” Mandates are what will bring the return on the investment.

And there is another crucial point, constantly overlooked: Most central repositories are languishing near-empty too! The only reason it looks otherwise is that usually a subject repository has more content than an institutional repository. But the reason for that is quite simple:

The annual worldwide output of an entire field is incomparably bigger than the annual output of any single institution. So when an institution contains no more than the usual low baseline for annual unmandated self-archiving (c. 15% of total annual research output) it has a much smaller absolute number of annual deposits than a central repository (even though that too contains only the very same low baseline 15% of the annual output in the field as a whole, across all institutions, worldwide). (This is the “denominator fallacy.”)

Yes, I know the physics Arxiv is an exception (with an incomparably higher unmandated central deposit rate for several of its subfields). But that’s the point: Arxiv is, and has been, an exception for nearly 20 years now. No point continuing to hold our breath and hope that the longstanding spontaneous (unmandated) self-archiving practices of (some fields of) physics will be adopted by other fields. It’s not happening, and 20 years is an awfully long time.

PubMedCentral (PMC) might — and I say might, because no one has actually done the calculation — possibly be doing better than the 15% default baseline, but that’s because PMC deposit is mandatory (by NIH and other funders), not because PMC is central!

(Indeed, my whole point is that the NIH and kindred biomedical self-archiving mandates would get incomparably more bang for the buck if they mandated institutional deposit — and then just harvested/imported to PMC — rather than needlessly insisting on direct central (PMC) deposit. For if NIH mandated institutional deposit, it would help stir the Slumbering Giant — the universal providers of all research, funded and unfunded, in all fields, namely, the world’s universities and research institutes — into mandating deposit for all the rest of their annual research output too.

SW: “I am still optimistic that institutional repositories will become more useful but for that to happen there need to be useful worldwide (not just UK or European focused because that doesn’t match research communities) disciplinary services and portals built on top them. The Catch 22 here is that disciplinary services have exactly the same funding and sustainability issues that disciplinary repositories have.”

What institutional repositories need is deposit mandates, so they can have content that is worth building services on top of. It’s not the potential (or the funding) for services that’s missing, it’s the content (85%). And to get that content deposited, we need (convergent) institutional and funder deposit mandates.

SW: “My group manages both Cornell’s eCommons institutional repository and the arXiv.org disciplinary repository. The effective cost per item [footnote 1] submitted is more than 10 times higher for the institutional repository than the disciplinary repository and the benefit/utility/visibility is lower. However, I know exactly who should and will fund eCommons (Cornell), and that nicely matches the vested interest (Cornell). The community benefit from arXiv.org is enormous and the effective cost per new item very low (<$7/item), but given 60k new items per year that is a significant cost and sustainability is a challenge."

The cost-per-item stats are funny-money. Cornell’s problem is not that it costs too much per item to deposit, it’s that the deposits are not being done, because Cornell has no mandate. That makes the ratio of IR costs to IR items unsatisfying, of course, but you are missing the real cause!

Moreover, if all institutions had mandates, the (equally small) cost per deposited item would be distributed across the planet’s 10K institutions, instead of concentrated on a few central repositories (most near-empty, just like Cornell’s institutional one, plus a [very] few serendipitously overstocked central ones, like Arxiv).

SW: “I think the best example of a disciplinary service over institutional repositories is RePEc in economics. This predates OAI and our current conception of IRs but fits the model: institutions (typically economics departments [footnote 2]) host articles and expose metadata/data via a standard interface. The institutionally held content is genuinely useful to the economics community because of the disciplinary services.”

All true. (And note that your “best example” is a central service over distributed institutional repositories, not a central repository in which authors deposit directly! Citeseer is another excellent example, in computer science, a field that has been self-archiving even longer than physics and economics.)

But here again, we have a community that has been self-archiving (spontaneously, and institutionally) unmandated for almost as long as Arxiv users. And again, this admirable practice has not generalized to other fields.

What physicists and economists (and computer scientists) seem to have in common is that they find the practice of publicly disseminating working papers — unrefereed preprints — useful and productive. That is splendid. I do too. But the majority of fields — and hence of researchers — do not find publicly disseminating their unrefereed drafts useful. And you certainly cannot mandate making authors’ unrefereed drafts public; in some biomedical fields that might even be dangerous.

But you can mandate making refereed final drafts (published or accepted for publication) public: they are already being made public, since they’re being published. So all you need to do is make it mandatory that they also be made freely accessible online (OA), so that not only subscribers can access and use them but all potential users can.

And that is what OA is about.

SW: “At the end of the day, researchers want and will use disciplinary services (look at usage stats for arXiv, ADS, SPIRES, RePEc, PMC, SSRN vs IRs). They probably don’t care whether the items themselves are stored centrally or institutionally.”

Correct, for users. But users do care whether the items are accessible at all. And that’s what deposit mandates (and OA itself) are for.

And authors do care about whether they need to do multiple deposits; and institutions do care about whether they host their own research output.

So it does matter whether deposit is mandated institutionally or centrally, by both institutions and funders.

The difference is not in functionality, but in content. And you have no functionality if you have no content!

SW: “Some of Stevan’s arguments miss key points:”

sh: “(1) Institutions are the universal providers of all research output — funded and unfunded, across all subjects, all institutions, and all nations.”

SW: “Not true, researchers are the universal providers of research output. They often work in teams that span multiple institutions and their first allegiance is often to their discipline rather than their institution.”

That is (sometimes) true, but trivial. Researchers are answerable to their own institutions (employers) when it comes to the tallying of their research output for research performance assessment. (You may be more loyal to “Physics” than to Cornell U, but it is Cornell, not “Physics,” that hires you, pays your salary, and evaluates your productivity; it is “for” Cornell that you “publish or perish” even if your heart belongs to “Physics.”)

sh: “(3) OAI-compliant Repositories are all interoperable.
“(7) The metadata and/or full-text deposits of any OAI compliant repository can be harvested, exported or imported to any OAI compliant repository.”

SW: “Interoperable to a point, and I say that as one of the creators of OAI-PMH. There is plenty of experience showing how hard it is to maintain large harvested collections and merge varying metadata (e.g. OAIster, NSDL). Institutional repositories are often managed with scant attention to maintaining interoperability, managers change the OAI-PMH base URL on a whim or do not monitor for errors. Full-text often has copyright/license issues preventing import into other repositories. “

All extremely minor (and readily remediable) points, compared to the real problem of institutional repositories, which is not that they are errorful but that they are EMPTY. (No point even fixing the errors while content is so impoverished. And once content is rich enough, there’s the requisite motivation to clean up errors and maximize interoperability — and services.)

sh: “(11) The solution is to fix the funder locus-of-deposit specs, not to switch to central locus of deposit.”

SW: “The solution is to build disciplinary services (either on disciplinary repositories or over harvested content) that are sufficiently useful to motivate researchers to submit of their own free will.”

The solution to what problem? The problem I am addressing (‘lo these nigh on 20 years) is the absence of the target content over which the putative services are built. Arxiv does not suffer from this problem — and saints be praised for that — but that doesn’t help the rest of us!

Yes, all kinds of powerful new services would be more than welcome (and will come) — but they are useless in the absence of the content on which they are meant to operate.

And it is not researchers as users that are the problem. It is researchers as authors — hence content-providers, depositors — that are the problem. The reason they are failing to deposit is not — let me save you the trouble of waiting more years to find out that this is so — because the user-services (or even the author-services) are not spiffy enough yet.

They are failing to deposit because their fingers are “paralyzed” (for at least 34 reasons):

Harnad, S. (2006) Opening Access by Overcoming Zeno’s Paralysis in Jacobs, N., Eds. Open Access: Key Strategic, Technical and Economic Aspects. Chandos.

And the cure for that paralysis is deposit mandates: “keystroke mandates” from their institutions and funders.

And one of the (many) things holding up the universal adoption of those keystroke mandates is funders needlessly competing with institutions for their researchers’ reluctant keystrokes by mandating central deposit, hence stoking instead of soothing paralyzed authors’ (rightful) resistance to the prospect of having to do divergent multiple deposit at central sites instead of convergent one-time local deposit in their own institutional repository.

SW: “(footnote 1) I think effective cost per new item is a good measure of repository cost because almost all effort beyond relatively fixed costs of keeping the system going tends to be dealing with new items. I calculate as operating budget over some period divided by number of new items in that period.”

But surely you also see that the cost per item deposited depends on the overall number of items deposited!

SW: “(footnote 2) I’m pleased to say that the section of arXiv that overlaps with RePEc — Quantitative Finance (q-fin) — is also included in RePEc (http://ideas.repec.org/s/arx/papers.html).”

Splendid. And I wish both Arxiv and RePec all the best in taking their very useful place among (many) central collections and service-providers.

But let the one-time locus of deposit be where it belongs, and needs to be: in the researcher’s own local institutional repository. And let that be the designated convergent locus of deposit for both institutional and funder mandates.


Stevan Harnad
American Scientist Open Access Forum