On Not Conflating Open Data (OD) With Open Access (OA)

Anon: I hope you don?t mind my asking you for guidance ? I follow the IR list and you are obviously expert in this area. I am having a debate with a colleague who argues that forcing researchers to give up their data to archives and repositories breeches their autonomy and control over intellectual property.  He goes so far as to position the entire open access movement in the camp of the neoliberal agenda of commodifying knowledge for capitalist dominated state authority (at the expense of researchers ? often very junior team members ? who actually create the data).“.

It is important to distinguish OA (Open Access to refereed research journal articles) from Open Data (Open Access to research data, OD).

All researchers, without exception, want to maximise access to their refereed research findings as soon as they are accepted for publication by a refereed journal, in order to maximise their uptake, usage and impact. Otherwise they would not be providing access to them, by publishing them. The impact of their research findings is what their careers, as well as research progress, are all about.

But raw data are not research findings until they have been data-mined and analysed. Hence, by the same token (except in rare exceptions), researchers are not merely data-gatherers, collecting data so that others can go on to do the data-mining and analysis: In science especially, their data-collection is driven by their theories, and their attempts to test and validate them. In the humanities too, the intellectual contributions are rarely databases themselves; the scholarly contributions are the author’s analysis and interpretation of their data — and these are often reported in books (long in the writing), which are not part of OA’s primary target content, because books are definitely not all or mostly giveaway content, written solely to maximise their uptake, usage and impact (at least not yet). [See Figure, below.]

In short, with good reason, OD is not immediate, exception-free author give-away content, whereas OA is. It may be reasonable, when data-gathering is funded, that the funders stipulate how long the data may be held for exclusive data-analysis by the fundee, before it must be made openly accessible. But, in general, primary research data — just like books, software, audio, video, and unrefereed research — are not amenable to OA mandates because there may be good reasons why their creators do not wish to make them OA, at least not immediately. Indeed, that is the reason that all OA mandates, whether by funders or universities, are very specifically restricted to refereed research journal publications.

In the new world of OA mandates, which is merely a PostGutenberg successor to the Gutenberg world of “publish-or-perish” mandates, it is critically important to distinguish carefully what is required (and why) from what is merely recommended (and why).

Anon: I agree there is a risk of misuse and appropriation of the open access agenda, but that is true for any technology, or any social change more generally“.

Researchers’ unwillingness to make their laboriously gathered data immediately OA is not just out of fear of misuse and misappropriation. It is much closer to the reason that a sculptor does not do the hard work of mining rock for a sculpture only in order to put the raw rock on craigslist for anyone to buy and sculpt for themselves, let alone putting it on the street corner for anyone to take home and sculpt for themselves. That just isn’t what sculpture is about. And the same is true of research (apart from some rare exceptions, like the Human Genome Project, where the research itself is the data-gathering, and the research findings are the data).

Anon: And I believe researchers generally have more to gain than lose from sharing data but hard evidence on this point ? again for data, not outputs, is almost non-existent so far. If you can direct me to any articles or arguments, I would be grateful“.

There is no hard evidence on this because — except in exceptional cases — it is simply not true. The work of science and scholarship does not end with data-gathering, it begins with it, and motivates it. If funders and universities mandated away the motivation to gather the data, they would not be left with an obedient set of data-gatherers, duly continuing to gather data so that anyone and everyone could then go ahead and data-mine it immediately. They would simply be mandating away much of the incentive to gather the data in the first place.

To put it another way: The embargo on making refereed research articles immediately OA — the access delay that publishers seek in order to protect their revenue — is the tail wagging the dog: Research progress and researchers’ careers do not exist in the service of publishers’ revenues, but vice versa. In stark contrast to this, however, the “embargo” on making primary research data OD is necessary and justified (in most cases) if researchers are to have any incentive for gathering data (and doing research) at all.

The length of the embargo is another matter, and can and should be negotiated by research funders on a field by field or even a case by case basis.

So although it is crucial not to conflate OA and OD (thereby needlessly eliciting author resistance to OA when all they really want to resist is immediate OD), there is indeed a connection between OA and OD, and universal OA will undoubtedly encourage more OD to be provided, sooner, than the current status quo does.

Anon: An important point in addition is that the archives I work with, while aspiring to openness, cannot adopt full and unqualified open access.  Issues of sensitive and confidential data, and consent terms from human research subjects, have to be respected.  We strive to make data as open and free as possible, subject to these limits.  Typically, agreeing to a licence specifying legal and ethical use is all that is required.  So in fact, researchers do retain control, to some extent, over the terms and conditions of reuse when they deposit their data for sharing in data archives“.

Yes, of course even OD will need to have some access restrictions, but that is not the point, and that is not why researchers in general have good reason not be favorably disposed to immediate mandatory OD — whereas they have no reason at all not to be favorably disposed to immediate mandatory OA.

It is also important to bear in mind that the fundamental motivation for OA is research access and progress, not research archiving and preservation (although those are of course important too). Data must of course be archived and preserved as well, but that, again, is not OD. Closed Access data-archiving would serve that purpose — and to the extent that researchers store digital data in any form, closed access digital archiving is what all researchers do already. Proposing to help them with data-preservation is not the same thing as proposing that they make their data immediately OD.

Stevan Harnad
American Scientist Open Access Forum