On Eggs and Citations

Failing to observe a platypus laying eggs is not a demonstration that the platypus does not lay eggs. You have to actually observe the provenance, ab ovo, of the little newborn platypusses, if you want to demonstrate that they are not engendered by egg-laying.

Failing to observe a significant OA citation Advantage after a year (or a year and a half — or longer, as the case may be) with randomized OA does not demonstrate that the many studies that do observe a significant OA citation Advantage with nonrandomized OA are simply reporting self-selection artifacts (i.e., selective provision of OA for the more highly citable articles.)

You first have to replicate the OA citation Advantage with nonrandomized OA (on the same or comparable sample) and then demonstrate that randomized OA (on the same or comparable sample) eliminates the OA citation Advantage (on the same or comparable sample).

Otherwise, you are simply comparing apples and oranges (or eggs and expectations, as the case may be) in reporting a failure to observe a significant OA citation Advantage in a one one-year (or 1.5 year) sample with randomized OA — along with a failure to observe a significant OA citation Advantage for nonrandomized OA for the same sample either (because the nonrandomized OA subsample was too small):

The many reports of the nonrandomized OA Citation Advantage are based on samples that were sufficiently large, and on a sufficiently long time-scale (almost never as short as a year) to detect a significant OA Citation Advantage.

A failure to observe a significant effect with small samples on short time-scales — whether randomized or nonrandomized — is simple that: a failure to observe a significant effect: Keep testing till the size and duration of your sample of randomized and nonrandomized OA is big enough to test your self-selection hypothesis (i.e., comparable with the other studies that have detected the effect).

Meanwhile, note that (as other studies have likewise reported), although a year is too short to observe a significant OA citation Advantage, it was long enough to observe a significant OA download Advantage — and other studies have also reported that early download advantages correlate significantly with later significant citation advantages.

Just as mating more is likely to lead to more progeny for platypusses (by whatever route) than mating less, so accessing and downloading more is likely to lead to more citations than accessing and downloading less.

Stevan Harnad
American Scientist Open Access Forum

Confirmation Bias and the Open Access Advantage: Some Methodological Suggestions for Davis’s Citation Study

SUMMARY: Davis (2008) analyzes citations from 2004-2007 in 11 biomedical journals. For 1,600 of the 11,000 articles (15%), their authors paid the publisher to make them Open Access (OA). The outcome, confirming previous studies (on both paid and unpaid OA), is a significant OA citation Advantage, but a small one (21%, 4% of it correlated with other article variables such as number of authors, references and pages). The author infers that the size of the OA advantage in this biomedical sample has been shrinking annually from 2004-2007, but the data suggest the opposite. In order to draw valid conclusions from these data, the following five further analyses are necessary:

    (1) The current analysis is based only on author-choice (paid) OA. Free OA self-archiving needs to be taken into account too, for the same journals and years, rather than being counted as non-OA, as in the current analysis.
    (2) The proportion of OA articles per journal per year needs to be reported and taken into account.
    (3) Estimates of journal and article quality and citability in the form of the Journal Impact Factor and the relation between the size of the OA Advantage and journal as well as article ?citation-bracket? need to be taken into account.
    (4) The sample-size for the highest-impact, largest-sample journal analyzed, PNAS, is restricted and is excluded from some of the analyses. An analysis of the full PNAS dataset is needed, for the entire 2004-2007 period.
    (5) The analysis of the interaction between OA and time, 2004-2007, is based on retrospective data from a June 2008 total cumulative citation count. The analysis needs to be redone taking into account the dates of both the cited articles and the citing articles, otherwise article-age effects and any other real-time effects from 2004-2008 are confounded.

The author proposes that an author self-selection bias for providing OA to higher-quality articles (the Quality Bias, QB) is the primary cause of the observed OA Advantage, but this study does not test or show anything at all about the causal role of QB (or of any of the other potential causal factors, such as Accessibility Advantage, AA, Competitive Advantage, CA, Download Advantage, DA, Early Advantage, EA, and Quality Advantage, QA). The author also suggests that paid OA is not worth the cost, per extra citation. This is probably true, but with OA self-archiving, both the OA and the extra citations are free.


Comments on: Davis, P.M. (2008) Author-choice open access publishing in the biological and medical literature: a citation analysis. Journal of the American Society for Information Science and Technology (JASIST) (in press) http://arxiv.org/pdf/0808.2428v1

The Davis (2008) preprint is an analysis of the citations from years c. 2004-2007 in 11 biomedical journals: c. 11,000 articles, of which c. 1,600 (15%) were made Open Access (OA) through ?Author Choice?: AC-OA (author chooses to pay publisher for OA). Author self-archiving, SA-OA was not measured.

The result was a significant OA citation advantage (21%) over time, of which 4% is correlated with variables other than OA and time (number of authors, pages, references; whether article is a Review and has a US co-author).

This result confirms the findings of numerous previous studies (some of them based on far larger samples of fields, journals, articles and time-intervals) of an OA citation advantage (ranging from 25%-250%) in all fields, across a 10-year range (Hitchcock 2008).

The preprint also states that the size of the OA advantage in this biomedical sample diminishes annually from 2004-2007. But the data seem to show the opposite: that as an article gets older, and its cumulative citations grow, its absolute and relative OA advantage grow too.

The preprint concludes, based on its estimate of the size of the OA citation Advantage, that AC-OA is not worth the cost, per extra citation. This is probably true, but with SA-OA the OA and the extra citations can be had at no cost at all.

Tha paper is accepted for publication in JASIST. It is not clear whether the linked text is the unrefereed preprint, or the refereed, revised postprint. On the assumption that it is the unrefereed preprint, what follows is an extended peer commentary with recommendations on what should be done in revising it for publication.

(It is very possible, however, that some or all of these revisions were also recommended by the JASIST referees and that some of the changes have already been made in the published version.)

As it stands currently, this study (i) confirms a significant OA citation Advantage, (ii) shows that it grows cumulatively with article age and (iii) shows that it is correlated with several other variables that are correlated with citation counts.

Although the author proposes that an author self-selection bias for providing OA to higher-quality articles (the Quality Bias, QB) is the primary causal factor underlying the observed OA Advantage, in fact this study does not test or show anything at all about the causal role of QB (or of any of the other potential causal factors underlying the OA Advantage, such as Accessibility Advantage, AA, Competitive Advantage, CA, Download Advantage, DA, Early Advantage, EA, and Quality Advantage, QA; Hajjem & Harnad 2007b).

The following 5 further analyses of the data are necessary. The size and pattern of the observed results, as well as their interpretations, could all be significantly altered (as well as deepened) by their outcome:

(1) The current analysis is based only on author-choice (paid) OA. Free author self-archiving OA needs to be taken into account too, for the same journals and years, rather than being counted as non-OA, as in the current analysis.
 
(2) The proportion of OA articles per journal per year needs to be reported and taken into account.
 
(3) Estimates of journal and article quality and citability in the form of the Journal Impact Factor (journal?s average citations) and the relation between the size of the OA Advantage and journal and article ?citation-bracket? need to be taken into account.
 
(4) The sample-size for the highest-impact, largest-sample journal, PNAS, is restricted and is excluded from some of the analyses. A full analysis of the full PNAS dataset is needed, for the entire 2004-2007 period.
 
(5) The analysis of the interaction between OA and time, 2004-2007, is based on retrospective data from a June 2008 total cumulative citation count. The analysis needs to be redone taking into account the dates of both the cited articles and the citing articles, otherwise article-age effects and any other real-time effects from 2004-2008 are confounded.

Commentary on the text of the preprint:

?ABSTRACT? there is strong evidence to suggest that the open access advantage is declining by about 7% per year, from 32% in 2004 to 11% in 2007?

It is not clearly explained how these figures and their interpretation are derived, nor is it reported how many OA articles there were in each of these years. The figures appear to be based on a statistical interaction between OA and article-age in a multiple regression analysis for 9 of the 11 journals in the sample. (a) The data from PNAS, the largest and highest-impact journal, are excluded from this analysis. (b) The many variables included in the (full) multiple regression equation omit one of the most obvious ones: journal impact factor. (c) OA articles that are self-archived rather than paid author-choice are not identified and included as OA, hence their citations are counted as being non-OA. (d) The OA/age interaction is not based on yearly citations after a fixed interval for each year, but on cumulative retrospective citations in June 2008.

The natural interpretation of Figure 1 accordingly seems to be the exact opposite of the one the author makes: Not that the size of the OA Advantage shrinks from 2004-2007, but that the size of the OA Advantage grows from 2007-2004 (as articles get older and their citations grow). Not only do cumulative citations grow for both OA and non-OA articles from year 2007 articles to year 2004 articles, but the cumulative OA advantage increases (by about 7% per year, even on the basis of this study?s rather slim and selective data and analyses).

This is quite natural, as not only do citations grow with time, but the OA Advantage — barely detectable in the first year, being then based on the smallest sample and the fewest citations — emerges with time.

?See Craig et al. [2007] for a critical review of the literature [on the OA citation advantage]?

Craig et al?s rather slanted 2007 review (Harnad 2007a) is the only reference to previous findings on the OA Advantage cited by the Davis preprint, Craig et al. had attempted to reinterpret the multiply reported positive finding of an OA citation advantage, on the basis of 4 negative findings (Davis & Fromerth, 2007; Kurtz et al., 2005; Kurtz & Henneken, 2007; Moed, 2007), in maths, astronomy and condensed matter physics, respectively. Apart from Davis?s own prior study, these studies were based mainly on preprints self-archived well before publication. The observed OA advantage consisted mostly of an Early Access Advantage for the OA prepublication preprint, plus an inferred Quality Bias (QB) on the part of authors toward preferentially providing OA to higher quality preprints (Harnad 2007b).

The Davis preprint does not cite any of the considerably larger number of studies that have reported large and consistent OA advantages for postprints, based on many more fields, and based on far larger samples and longer time intervals (Hitchcock 2008). Instead, Davis focuses rather single-mindedly on the hypothesis that most or all of the OA Advantage is the result for the self-selection bias (QB) toward preferentially making higher-quality (hence more citeable) articles OA:

?authors selectively choose which articles to promote freely? [and] highly cited authors disproportionately choose open access venues?

It is undoubtedly true that better authors are more likely to make their articles OA, and that authors in general are more likely to make their better articles OA. This Quality or ?Self-Selection? Bias (QB) is one of the probable causes of the OA Advantage.

However, no study has shown that QB is the only cause of the OA Advantage, nor even that it is the biggest cause. Three of the studies cited (Kurtz et al., 2005; Kurtz & Henneken, 2007; Moed, 2007) showed that another causal factor is Early Access (EA: providing OA earlier results in more citations).

There are several other candidate causal factors in the OA Advantage, besides QB and EA (Hajjem & Harnad 2007b):There is the Download (or Usage) Advantage (DA): OA articles are downloaded significantly more, and this early DA has also been shown to be predictive of a later citation advantage in Physics (Brody et al. 2006).

There is a Competitive Advantage (CA): OA articles are in competition with non-OA articles, and to the extent that OA articles are relatively more accessible than non-OA articles, they can be used and cited more. Both QB and CA, however, are temporary components of the OA advantage that will necessarily shrink to zero and disappear once all research is OA. EA and DA, in contrast, will continue to contribute to the OA advantage even after universal OA is reached, when all postprints are being made OA immediately upon publication, compared to pre-OA days (as Kurtz has shown for Astronomy, which has already reached universal post-publication OA).

There is an Accessibility Advantage (AA) for those users whose institutions do not have subscription access to the journal in which the article appeared. AA too (unlike CA) persists even after universal OA is reached: all users then have its benefit.

And there is at least one more important causal component in the OA Advantage, apart from AA, CA, DA and QB, and that is a Quality Advantage (QA), which has often been erroneously conflated with QB (Quality Bias):

Ever since Lawrence?s original study in 2001, the OA Advantage can be estmated in two different ways: (1) by comparing the average citations for OA and non-OA articles (log citation ratios within the same journal and year, or regression analyses like Davis?s) and (2) by comparing the proportion of OA articles in different ?citation brackets? (0, 1, 2, 3-4, 5-8, 9-16, 17+ citations).

In method (2), the OA Advantage is observed in the form of an increase in the proportion of OA articles in the higher citation brackets. But this correlation can be explained in two ways. One is QB, which is that authors are more likely to make higher-quality articles OA. But it is also at least as plausible that higher-quality articles benefit more from OA! It is already known that the top c. 10-20% of articles receive c. 80-90% of all citations (Seglen?s 1992 ?skewness of science?). It stands to reason, then, that when all articles are made OA, it is the top 20% of articles that are most likely to be cited more: Not all OA articles benefit from OA equally, because not all articles are of equally citable quality.

Hence both QB and QA are likely to be causal components in the OA Advantage, and the only way to tease them apart and estimate their individual contributions is to control for the QB effect by imposing the OA instead of allowing it to be determined by self-selection. We (Gargouri, Hajjem, Gingras, Carr & Harnad, in prep.) are completing such a study now, comparing mandated and unmandated OA; and Davis et al 2008. have just published another study on randomized OA for 11 journals:

?In the first controlled trial of open access publishing where articles were randomly assigned to either open access or subscription-access status, we recently reported that no citation advantage could be attributed to access status (Davis, Lewenstein, Simon, Booth, & Connolly, 2008)?

This randomized OA study by Davis et al. was very welcome and timely, but it had originally been announced to cover a 4-year period, from 2007-2010, whereas it was instead prematurely published in 2008, after only one year. No OA advantage at all was observed in that 1-year interval, and this too agrees with the many existing studies on the OA Advantage, some based on far larger samples of journals, articles and fields: Most of those studies (none of them randomized) likewise detected no OA citation advantage at all in the first year: It is simply too early. In most fields, citations take longer than a year to be made, published, ISI-indexed and measured, and to make any further differentials (such as the OA Advantage) measurable. (This is evident in Davis?s present preprint too, where the OA advantage is barely visible in the first year (2007).)

The only way the absence of a significant OA advantage in a sample with randomized OA can be used to demonstrate that the OA Advantage is only or mostly just a self-selection bias (QB) is by also demonstrating the presence of a significant OA advantage in the same (or comparable) sample with nonrandomized (i.e., self-selected) OA.

But Davis et al. did not do this control comparison (Harnad 2008). Finding no OA Advantage with randomized OA after one year merely confirms the (widely observed) finding that one year is usually too early to detect any OA Advantage; but it shows nothing whatsoever about self-selection QB.

?we examine the citation performance of author-choice open access?

It is quite useful and interesting to examine citations for OA and non-OA articles where the OA is provided through (self-selected) ?Author-Choice? (i.e., authors paying the publisher to make the article OA on the publisher?s website).

Most prior studies of the OA citation Advantage, however, are based on free self-archiving by authors on their personal, institutional or central websites. In the bigger studies, a robot trawls the web using ISI bibliographic metadata to find which articles are freely available on the web (Hajjem et al. 2005).

Hence a natural (indeed essential) control test that has been omitted from Davis?s current author-choice study ? a test very much like the control test omitted from the Davis et al randomized OA study ? is to identify the articles in the same sample that were made OA through author self-archiving. If those articles are identified and counted, that not only provides an estimate of the relative uptake of author-choice OA vs OA self-archiving in the same sample interval, but it allows a comparison of their respective OA Advantages. More important, it corrects the estimate of an OA Advantage based on author-choice OA alone: For, as Davis has currently done the analysis, any OA Advantage from OA self-archiving in this sample would in fact reduce the estimate of the OA Advantage based on author-choice OA (mistakenly counting as non-OA the articles and citation-counts for self-archived OA articles)

?METHODS? The uptake of the open access author-choice programs for these [11] journals ranged from 5% to 22% over the dates analyzed?

Davis?s preprint does not seem to give the data ? either for individual journals or for the combined totals ? on the percentage of author-choice OA (henceforth AC-OA) by year, nor on the relation between the proportion uptake of AC-OA and the size of the OA Advantage, by year.

As Davis has been careful to do multiple regression analyses on all the article-variables that might correlate with OA (article age, number of authors, number of references, etc.), it seems odd not to take into account the relation between the size of the AC-OA Advantage and the degree of uptake of AC-OA, by year. The other missing information is the corresponding data for self-archiving OA (henceforth SA-OA).

?[For] All of the journals? all articles roll into free access after an initial period [restricted to subscription access only for 12 months (8 journals), 6 months (2 journals) or 24 months (1 journal)]?

(This is important in relation to the Early Access (EA) Advantage, which is the biggest contributor to the OA Advantage in the two cited studies by Kurtz on Astronomy. Astronomy has free access to the postprints of all articles in all astronomy journals immediately upon publication. Hence Astronomy has scope for an OA Advantage only through an EA Advantage, arising from the early posting of preprints before publication. The size of the OA Advantage in other fields — in which (unlike in Astronomy) access to the postprint is restricted to subscribers-only for 6, 12, or 24 months — would then be the equivalent of an estimate of an ?EA Advantage? for those potential users who lack subscription access ? i.e., the Accessibility Advantage, AA.)

?Cumulative article citations were retrieved on June 1, 2008. The age of the articles ranged from 18 to 57 months?

Most of the 11 journals were sampled till December 2007. That would mean that the 2007 OA Advantage was based on even less than one year from publication.

?STATISTICAL ANALYSIS? Because citation distributions are known to be heavily skewed (Seglen, 1992) and because some of the articles were not yet cited in our dataset, we followed the common practice of adding one citation to every article and then taking the natural log?

(How well did that correct the skewness? If it still was not normal, then citations might have to be dichotomized as a 0/1 variable, comparing, by citation-bracket slices, (1) 0 citations vs 1 or more citations, (2) 0 or 1 vs more than 1, (3) 2 or fewer vs. more than 2, (4) 3 or fewer vs. more than 3? etc.)

?For each journal, we ran a reduced [2 predictor] model [article age and OA] and a full [7 predictor] regression model [age, OA; log no. of authors, references, pages; Review; US author]?

Both analyses are, of course, a good idea to do, but why was Journal Impact Factor (JIF) not tested as one of the predictor variables in the cross-journal analyses (Hajjem & Harnad 2007a)? Surely JIF, too, correlates with citations: Indeed, the Davis study assumes as much, as it later uses JIF as the multiplier factor in calculating the cost per extra citation for author-choice OA (see below).

Analyses by journal JIF citation-bracket, for example, can test QA (Quality Advantage) if the OA Advantage is bigger in the higher journal citation-brackets. (Davis?s study is preoccupied with the self-selection QB bias, which it does not and cannot test, but it fails to test other candidate contributors to the OA Advantage that it can test.)

(A logical point should also be noted about the correlates of citations and the direction of causation: The many predictor variables in the multiple regression equations predict not only the OA citation Advantage; they also predict citation counts themselves. It does not necessarily follow from the fact that, say, longer articles are more likely to be cited that article length is an artifact that must be factored out of citation counts in order to get a more valid estimate of how accurately citations measure quality. One possibility is that length is indeed an artifact; but another possibility is that length is a valid factor in quality! If length is indeed an artifact, then longer articles are being cited more just because they are longer, rather than because they are better, and this length bias needs to be subtracted out of citation counts as measures of quality. But if the extra length is a causal contributor to what makes the better articles better, then subtracting out the length effect is making citation counts a blunter, not a sharper instrument for measuring quality. The same reasoning applies to some of the other correlates of citation counts, as well as their relation to the OA citation Advantage.)

?Because we may lack the statistical power to detect small significant differences for individual journals, we also analyze our data on an aggregate level?

It is a reasonable, valid strategy, to analyze across journals. Yet this study still persists in drawing individual-journal level conclusions, despite having indicated (correctly) that its sample may be too small to have the power to detect individual-journal level differences (see below).

(On the other hand, it is not clear whether all the OA/non-OA citation comparisons were always within-journal, within-year, as they ought to be; no data are presented for the percentage of OA articles per year, per journal. OA/non-OA comparisons must always be within-journal/year comparisons, to be sure to compare like with like.)

?The first model includes all 11 journals, and the second omits the Proceedings of the National Academy of Sciences (PNAS), considering that it contributed nearly one-third (32%) of all articles in our dataset?

Is this a justification for excluding PNAS? Not only was the analysis done with and without PNAS, but, unlike all the other journals, whose data were all included, for the entire time-span, PNAS data were only included from the first and last six months.

Why? PNAS is a very high impact factor journal, with highly cited articles. A study of PNAS alone, with its much bigger sample size, would be instructive in itself ? and would almost certainly yield a bigger OA Advantage than the one derived from averaging across all 11 journals (and reducing the PNAS sample size, or excluding PNAS altogether).

There can be a QB difference between PNAS and non-PNAS articles (and authors), to be sure, because PNAS publishes articles of higher quality. But a within-PNAS year-by-year comparison of OA and non-OA that yielded a bigger OA Advantage than a within-journal OA/non-OA comparison for lower-quality journals would also reflect the contribution of QA. (With these data in hand, the author should not be so focused on confirming his hypotheses: take the opportunity to falsify them too!)

?we are able to control for variables that are well-known to predict future citations [but] we cannot control for the quality of an article?

This is correct. One cannot control for the quality of an article; but in comparing within a journal/year, one can compare the size of the OA Advantage for higher and lower impact journals; if the advantage is higher for higher-impact journals, that favors QA over QB.

One can also take target OA and non-OA articles (within each citation bracket), and match the title words of each target article with other articles (in the same journal/year):

If one examines N-citation OA articles and N-citation non-OA articles, are their title-word-matched (non-OA) control articles equally likely to have N or more citations? Or are the word-matched control articles for N-citation OA articles less likely to have N or more citations than the controls for N-citation non-OA articles (which would imply that the OA has raised the OA article?s citation bracket)? And would this effect be greater in the higher citation brackets than in the lower ones (N = 1 to N = >16)?

If one is resourceful, there are ways to control, or at least triangulate on quality indirectly.

?spending a fee to make one?s article freely available from a publisher?s website may indicate there is something qualitatively different [about that article]?

Yes, but one could probably tell a Just-So story either way about the direction of that difference: paying for OA because it?s better, or paying for OA because it?s worse! Moreover, this is AC-OA, which costs money; the stakes are different with SA-OA, which only costs a few keystrokes. But this analysis omitted to identify or measure SA-OA.

?RESULTS?The difference in citations between open access and subscription-based articles is small and non-significant for the majority of the journals under investigation?

(1) Compare the above with what is stated earlier: ?Because we may lack the statistical power to detect small significant differences for individual journals, we also analyze our data on an aggregate level.?

(2) Davis found an OA Advantage across the entire sample 11 journals, whereas the individual journal samples were too small. Why state this as if it were some sort of an empirical effect?

?where only time and open access status are the model predictors, five of the eleven journals show positive and significant open access effects.?

(That does not sound too bad, considering that the individual journal samples were small and hence lacked the statistical power to detect small significant differences, and that the PNAS sample was made deliberately small!)

?Analyzing all journals together, we report a small but significant increase in article citations of 21%.?

Whether that OA Advantage is small or big remains to be seen. The bigger published OA Advantages have been reported on the basis of bigger samples.

?Much of this citation increase can be explained by the influence of one journal, PNAS. When this journal is removed from the analysis, the citation difference reduces to 14%.?

This reasoning can appeal only if one has a confirmation bias: PNAS is also the journal with the biggest sample (of which only a fraction was used); and it is also the highest impact journal of the 11 sampled, hence the most likely to show benefits from a Quality Advantage (QA) that generates more citations for higher citation-bracket articles. If the objective had not been to demonstrate that there is little or no OA Advantage (and what little there is is just due to QB), PNAS would have been analyzed more closely and fully, rather than being minimized and excluded.

?When other explanatory predictors of citations (number of authors, pages, section, etc.) are included in the full model, only two of the eleven journals show positive and significant open access effects. Analyzing all journals together, we estimate a 17% citation advantage, which reduces to 11% if we exclude PNAS.?

In other words, in this sample, adding 5 more predictor variables reduces the uncorrelated OA Advantage by 4%. And excluding the biggest, highest-quality journal?s data, reduces it still further.

If there were not this strong confirmation bent on the author?s part, the data would be treated in a rather different way: The fact that a journal with a bigger sample enhances the OA Advantage would be treated as a plus rather than a minus, suggesting that still bigger samples might have the power to detect still bigger OA Advantages. And the fact that PNAS is a higher quality journal would also be the basis for looking more closely at the role of the Quality Advantage (QA). (With less of a confirmation bent, OA Self-archiving, too, would have been controlled for, instead of being credited to non-OA.)

Instead, the awkward persistence of a significant OA Advantage even after partialling out the effects of so many correlated variables, despite restricting the size of the PNAS sample, and even after removing PNAS entirely from the analysis, has to be further explained away:

?The modest citation advantage for author-choice open access articles also appears to weaken over time. Figure 1 plots the predicted number of citations for the average article in our dataset. This difference is most pronounced for articles published in 2004 (a 32% advantage), and decreases by about 7% per year (Supplementary Table S2) until 2007 where we estimate only an 11% citation advantage.?

(The methodology is not clearly described. We are not shown the percent OA per journal per year, nor what the dates of the citing articles were, for each cited-article year. What is certain is that a 1-year-old 2007 article differs from a 4-year-old 2004 article not just in its total cumulative citations in June 2008, but in that the estimate of its citations per year is based on a much smaller sample, again reducing the power of the statistic: This analysis is not based on 2005 citations to 2004 articles, plus 2006 citations to 2005 articles, plus 2007 citations to 2006 articles, etc. It is based on cumulative 2004-2008 citations to 2004, 2005, 2006 etc. articles, reckoned in June 2008. 2007 articles are not only younger: they are also more recent. Hence it is not clear what the Age/OA interaction in Table S2 really means: Has (1) the OA advantage for articles really been shrinking across those 4 years, or are citation rates for younger articles simply noisier, because based on smaller citation spans, hence (2) the OA Advantage grows more detectable as articles get older?)

From what is described and depicted in Figure 1, the natural interpretation of the Age/OA interaction seems to be the latter: As we move from one-year-old articles (2007) toward four-year-old articles, three things are happening: non-OA citations are growing with time, OA citations are growing with time, and the OA/non-OA Advantage is emerging with time.

?[To] calculate? the estimated cost per citation [$400 – $9000]? we multiply the open access citation advantage for each journal (a multiplicative effect) by the impact factor of the journal? Considering [the] strong evidence of a decline of the citation advantage over time, the cost?would be much higher??

Although these costs are probably overestimated (because the OA Advantage is underestimated, and there is no decline but rather an increase) the thrust of these figures is reasonable: It is not worth paying for AC-OA for the sake of the OA Advantage: It makes far more sense to get the OA Advantage for free, through OA Self-Archiving.

Note, however, that the potentially informative journal impact factor (JIF) was omitted from the full-model multiple regression equation across journals. It should be tested. So should the percentage OA for each journal/year. And after that the analysis should be redone separately for, say, the four successive JIF quartiles. If adding the JIF to the equation reduces the OA Advantage further, whereas without JIF the OA Advantage increases in each successive quartile, then that implies that a big factor in the OA Advantage is the Quality Advantage (QA).

?that we were able to explain some of the citation advantage by controlling for differences in article characteristics? strengthens the evidence that self-selection ? not access ? is the explanation for the citation advantage? more citable articles have a higher probability of being made freely accessible?   

Self-selection (QB) is undoubtedly one of the factors in the OA Advantage, but this analysis has not estimated the size of its contribution, relative to many other factors (AA, CA, DA, EA, QA). It has simply shown that some of the same factors that influence citation counts, influence the OA citation Advantage too.

By failing to test and control for the Quality Advantage in particular (by not testing JIFs in the full regression equation, by not taking percentage OA per journal/year into account, by restricting the sample-size for the highest impact, largest-sample journal, PNAS, by overlooking OA self-archiving and crediting it to non-OA, by not testing citation-brackets of JIF quartiles), the article arbitrarily misses the opportunity to analyze the factors contributing to the OA Advantage far more rigorously.

?earlier studies [on the OA Advantage] may be showing an early-adopter effect??

This is probably true. And early adopters also have a Competitive Advantage (CA). But with only about 20% OA being provided overall today, the CA is still there, except if it can be demonstrated ? as Davis certainly has not demonstrated ? that the c. 20% of articles that are being made OA today correspond sufficiently closely to that top 20% of articles that receive 80% of all citations. (Then the OA Advantage would indeed be largely QB.)

?authors who deposited their manuscripts in the arXiv tended to be more highly-cited than those who did not?

There is some circularity in this, but it is correct to say that this correlation is compatible with both QB and QA, and probably both are contributing factors. But none of the prior studies nor this one actually estimate their relative contributions (nor those of AA, CA, DA and EA).

?any relative citation advantage that was enjoyed by early adopters would disappear over time?

It is not that CA (Competitive Advantage) disappears simply because time elapses; CA only disappears if the competitors provide OA too. The same is true of QB (Quality Bias), which also disappears once everyone is providing OA. But at 20%, we are nowhere near 100% OA yet.

?If a citation advantage is the key motivation of authors to pay open access fees, then the cost/benefit of this decision can be quite expensive for some journals.?

This is certainly true, and would be true even if the OA citation Advantage were astronomically big ? but the reason it is true is that authors need not pay AC-OA fees for OA at all: they can self-archive for free (and indeed are being increasingly mandated by their funders and institutions to do so).

?Randomized controlled trials provide a more rigorous methodology for measuring the effect of access independently of other confounding effects (Davis et al., 2008)? the differences we report in our study? have more likely explained the effect of self-selection (or self-promotion) than of open access per se.?

The syntax here makes it a little difficult to interpret, but if what is meant is that Davis et al?s prior study has shown that the OA Advantage found in the present study was more likely to be a result of QB than of QA, AA, CA, DA, or EA, then it has to be replied that that prior study showed nothing of the sort (Harnad 2008). All it showed was that one cannot detect a significant OA Advantage at all one year after publication when OA is randomized. (The same is true when OA is not randomized.)

However, the prior Davis et al. study did find a significant DA (Download Advantage) for OA articles in the first year. And other studies have reported a significant correlation between early downloads and later citations (Brody et al. 2006).

So the prior Davis et al. study (1) confirmed the familiar failure to detect the OA Advantage in the first year, and (2) found a significant DA in the first year (probably predictive of a later OA citation Advantage). The present Davis study found (i) a significant OA Advantage, (ii) smallest in the first year (2007), much bigger in the fourth (2004).

?Retrospective analysis? our analysis is based on cumulative citations to articles taken at one point in time. Had we tracked the performance of our articles over time ? a prospective approach ? we would have stronger evidence to bolster our claim that the citation advantage is in decline. Still, we feel that cumulative citation data provides us with adequate inference.?

Actually, it would be possible, with a fuller analysis using the ISI database, to calculate not only the citation counts for each article, but the dates of the citing articles. So a ?prospective? analysis can be done in retrospect. Without performing that analysis, however, the present study does not provide evidence of a decline in the OA Advantage with time, just evidence of an improved signal/noise ratio for measuring the OA Advantage with time. A ?prospective? analysis, taking citing dates as well as cited dates into account, would be welcome (and is far more likely to show that the size of the OA Advantage is, if anything, growing, rather than confirming the author?s interpretation, unwarranted on the present data, that it is shrinking).

?all of the journals under investigation make their articles freely available after an initial period of time [hence] any [OA Advantage] would be during these initial months in which there exists an access differential between open access and subscription-access articles. We would expect therefore that the effect of open access would therefore be strongest in the earlier years of the life of the article and decline over time. In other words, we would expect our trend (Figure 1) to operate in the reverse direction.?

The reasoning here is a bit hard to follow, but the Kurtz studies that Davis cites show that in Astronomy, making preprints OA in the year or so before publication (after which all Astronomy postprints are OA) results in both ?a strong EA effect and a strong [QB] effect.? But even in a fast-moving field like Astronomy, the effect is not immediate! There is no way to predict from the data for Astronomy how quickly an EA effect for nonsubscribers during the embargo year in Biomedicine should make itself felt in citations, but it is a safe bet that, as with citation latency itself, and the latency of the OA citation Advantage, the ?EmA? (?Embargo Access?) counterpart of the EA effect in access-embargoed Biomedical journals will need a latency of a few years to become detectable. And since Davis?s age/OA interaction, based on static, cumulative, retrospective data, is just as readily interpretable as indicating that OA Advantages require time and sample-size growth in order to occur and be detected, the two patterns are perfectly compatible.

?we are at a loss to come up with alternative explanations to explain the monotonic decline in the citation advantage?

There is no monotonic decline to explain. Just (a) low power in initial years, (b) cumulative data not analysed to equate citing/cited year spans, (c) the failure to test for QA citation-bracket effects, and (d) the failure to reckon self-archiving OA into the OA Advantage (treating it instead as non-OA).

If this had been a JASIST referee report, I would have recommended several further analyses taking into account:

(1) self-archiving OA
(2) percentage OA per journal per year
(3) JIFs and citation-brackets
(4) the full PNAS dataset
(5) citing-article-date vs cited-article-date

and to make the interpretation of the resultant findings more even-handed, rather than slanting toward the author?s preferred hypothesis that the OA Advantage is due solely or mostly to QB.

References

Brody, T., Harnad, S. and Carr, L. (2006) Earlier Web Usage Statistics as Predictors of Later Citation Impact. Journal of the American Association for Information Science and Technology (JASIST) 57(8): 1060-1072

Craig, I. D., Plume, A. M., McVeigh, M. E., Pringle, J., & Amin, M. (2007). Do Open Access Articles Have  Greater Citation Impact? A critical review of the literature. Journal of Informetrics 1(3): 239-248

Davis, P.M. (2008) Author-choice open access publishing in the biological and medical literature: a citation analysis. Journal of the American Society for Information Science and Technology  (JASIST) (in press)>

Davis, P. M., & Fromerth, M. J. (2007). Does the arXiv lead to higher citations and reduced publisher  downloads for mathematics articles? Scientometrics 71(2): 203-215.

Davis, P. M., Lewenstein, B. V., Simon, D. H., Booth, J. G., & Connolly, M. J. L. (2008). Open access  publishing, article downloads and citations: randomised trial. British Medical Journal 337: a586>

Hajjem, C. and Harnad, S. (2007a) Citation Advantage For OA Self-Archiving Is Independent of Journal Impact Factor, Article Age, and Number of Co-Authors. Technical Report, Electronics and Computer Science, University of Southampton.

Hajjem, C. and Harnad, S. (2007b) The Open Access Citation Advantage: Quality Advantage Or Quality Bias? Technical Report, Electronics and Computer Science, University of Southampton

Hajjem, C., Harnad, S. and Gingras, Y. (2005) Ten-Year Cross-Disciplinary Comparison of the Growth of Open Access and How it Increases Research Citation Impact. IEEE Data Engineering Bulletin 28(4) 39-47.

Harnad, S. (2007a) Craig et al.’s Review of Studies on the OA Citation Advantage. Open Access Archivangelism 248.

Harnad, S. (2007b) Where There’s No Access Problem There’s No Open Access Advantage Open Access Archivangelism 389

Harnad, S. (2008) Davis et al’s 1-year Study of Self-Selection Bias: No Self-Archiving Control, No OA Effect, No Conclusion British Medical Journal: Rapid Responses 337 (a568): 199775

Hitchcock, S. (2008) The effect of open access and downloads (‘hits’) on citation impact: a bibliography of studies

Kurtz, M. J., Eichhorn, G., Accomazzi, A., Grant, C., Demleitner, M., Henneken, E., et al. (2005). The effect  of use and access on citations. Information Processing and Management 41: 1395-1402

Kurtz, M. J., & Henneken, E. A. (2007). Open Access does not increase citations for research articles from The Astrophysical Journal: Harvard-Smithsonian Center for Astrophysics

Lawrence, S. (2001)  Free online availability substantially increases a paper’s impactNature 31 May 2001>

Moed, H. F. (2007). The effect of ‘Open Access’ upon citation impact: An analysis of ArXiv’s Condensed Matter Section. Journal of the American Society for Information Science and Technology 58(13): 2047-2054

Seglen, P. O. (1992). The Skewness of Science. Journal of the American Society for Information Science 43(9): 628-638

Stevan Harnad
American Scientist Open Access Forum

Max Planck Society Pays for Gold OA and Still Fails to Mandate Green OA

One can only leave it to posterity to judge the wisdom of the Max Planck Society in being prepared to divert “central” funds toward funding the publication of (some) MPS research in (some) Gold OA journals (PLoS) without first mandating Green OA self-archiving for all MPS research output.

It is not as if MPS does not have an Institutional Repository (IR): It has EDOC, containing 108,933 records (although it is not clear how many of those are peer-reviewed research articles, how many of them are OA, and what percentage of MPS’s current annual research output is deposited and OA).

But, despite being a long-time friend of OA, MPS has no Green OA self-archiving mandate. I have been told, repeatedly, that “in Germany one cannot mandate self-archiving,” but I do not believe it, not for a moment. This is pure lack of reflection and ingenuity:

At the very least, Closed Access deposit in EDOC can certainly be mandated for all MPS published research output as a purely administrative requirement, for internal record-keeping and performance-assessment. This is called the “Immediate Deposit, Optional Access” (IDOA) Mandate.

And then the “email eprint request” Button can be added to EDOC to provide almost-OA to all those deposits that the authors don’t immediately make OA of their own accord (95% of journals already endorse immediate OA in some form).

Then the MPS can go ahead and spend any spare money it may have to fund publication instead of research.


This should not be construed as any sort of critique of PLoS, a superb Gold OA publisher, producing superb journals. Nor is it a critique of paying for Gold OA, for those who have the funds.

It is a critique of paying for Gold OA without first having mandated Green OA.

(For that is rather like an institution offering to pay for its employees’ medical insurance for car accidents without first having mandated seat-belts; or, more luridly, offering to pay for the treatment of its employees’ secondary-smoke-induced illnesses without first having mandated that the workplace must be smoke-free.)

Stevan Harnad
American Scientist Open Access Forum

51st Green OA Self-Archiving Mandate: European Union’s 7th Framework

The European Commission has now mandated Green OA self-archiving for 20% of its 7th Framework Funding. This is the 51st Green OA Mandate worldwide (and the 26th funder mandate: The European Research Council (ERC), another European research funder, had earlier likewise mandated Green OA.)

See ROARMAP: Institution’s/Department’s OA Self-Archiving Policy

The pilot covers approximately 20% of the FP7 budget and will apply to specific areas of research under the 7th Research Framework Programme (FP7): Health, Energy, Environment, Information and Communication Technologies (Cognitive Systems, Interaction, Robotics), Research Infrastructures (e-Infrastructures), Socio-economic Sciences and Humanities, Science in Society.

New grant agreements in the areas covered by the pilot will contain a clause requiring grant recipients to deposit peer reviewed research articles or final manuscripts resulting from their FP7 projects into their institutional or if unavailable a subject-based repository… within six or twelve months after publication, depending on the research area.

Use And Misuse Of Bibliometric Indices In Scholarly Performance Evaluation

Ethics In Science And Environmental Politics (ESEP)

ESEP Theme Section: The Use And Misuse Of Bibliometric Indices In Evaluating Scholarly Performance + accompanying Discussion Forum

Editors: Howard I. Browman, Konstantinos I. Stergiou

Quantifying the relative performance of individual scholars, groups of scholars, departments, institutions, provinces/states/regions and countries has become an integral part of decision-making over research policy, funding allocations, awarding of grants, faculty hirings, and claims for promotion and tenure. Bibliometric indices (based mainly upon citation counts), such as the h-index and the journal impact factor, are heavily relied upon in such assessments. There is a growing consensus, and a deep concern, that these indices — more-and-more often used as a replacement for the informed judgement of peers — are misunderstood and are, therefore, often misinterpreted and misused. The articles in this ESEP Theme Section present a range of perspectives on these issues. Alternative approaches, tools and metrics that will hopefully lead to a more balanced role for these instruments are presented.

Browman HI, Stergiou KI  INTRODUCTION: Factors and indices are one thing, deciding who is scholarly, why they are scholarly, and the relative value of their scholarship is something else entirely
ESEP 8:1-3

Campbell P  Escape from the impact factor
ESEP 8:5-7

Lawrence PA    Lost in publication: how measurement harms science
ESEP 8:9-11

Todd PA, Ladle RJ    Hidden dangers of a ‘citation culture’
ESEP 8:13-16

Taylor M, Perakakis P, Trachana V  The siege of science
ESEP 8:17-40

Cheung WWL    The economics of post-doc publishing
ESEP 8:41-44

Tsikliras AC  Chasing after the high impact
ESEP 8:45-47

Zitt M, Bassecoulard E    Challenges for scientometric indicators: data demining, knowledge flows measurements and diversity issues
ESEP 8:49-60

Harzing AWK, van der Wal R  Google Scholar as a new source for citation analysis
ESEP 8:61-73

Pauly D, Stergiou KI  Re-interpretation of ‘influence weight’ as a citation-based Index of New Knowledge (INK)
ESEP 8:75-78

Giske J  Benefitting from bibliometry
ESEP 8:79-81

Butler L Using a balanced approach to bibliometrics: quantitative performance measures in the Australian Research Quality Framework
ESEP 8:83-92
Erratum

Bornmann L, Mutz R, Neuhaus C, Daniel HD  Citation counts for research evaluation: standards of good practice for analyzing bibliometric data and presenting and interpreting results
ESEP 8:93-102

Harnad S  Validating research performance metrics against peer rankings
ESEP 8:103-107

Estimating Annual Growth in OA Repository Content


SUMMARY: Re: Deblauwe, F. (2008) OA Academia in Repose: Seven Academic Open-Access Repositories Compared: A useful way to benchmark OA progress would be to focus on OA’s target content — peer-reviewed scientific and scholarly journal articles — and to indicate, year by year, the proportion of the total annual output of the content-providers, rather than just absolute annual deposit totals. The OA content-providers are universities and research institutions. The denominator for all measures should be the number of articles the institution publishes in a given year, and the numerator should be the number of articles published in that year (full-texts) that are deposited in that institution’s Institutional Repository (IR). (If an institution does not know its own annual published articles output — as is likely, since such record-keeping is one of the many functions that the OA IRs are meant to perform — an estimate can be derived from the Institute of Scientific Information’s (ISI’s) annual data for that institution.)


Deblauwe, Francis (2008) OA Academia in Repose: Seven Academic Open-Access Repositories Compared

This is a useful beginning in the analysis of the growth of Open Access (OA), but it is mostly based on central collections of a variety of different kinds of content.

A useful way to benchmark OA progress would be to focus on OA’s target content — this would be, first and foremost, peer-reviewed scientific and scholarly journal articles — and to indicate, year by year, the proportion of the total annual output of the content-providers, rather than just absolute annual deposit totals.

The OA content-providers are universities and research institutions. The denominator for all measures should be the number of articles the institution publishes in a given year, and the numerator should be the number of articles published in that year (full-texts) that are deposited in that institution’s Institutional Repository (IR).

Just counting total deposits, without specifying the year of publication, the year of deposit, and the total target output of which they are a fraction (as well as making sure they are article full-texts rather than just metadata) is only minimally informative.

Absolute totals for Central Repositories (CRs), based on open-ended input from distributed institutions, are even less informative, as there is no indication of the size of the total output, hence what fraction of that has been deposited.

If an institution does not know its own annual published articles output — as is likely, since such record-keeping is one of the many functions that the OA IRs are meant to perform — an estimate can be derived from the Institute of Scientific Information’s (ISI’s) annual data for that institution. The estimate is then simple: Determine what proportion of the full-texts of the annual ISI items for that institution are in the IR. (ISI does not index everything, but it probably indexes the most important output, and this ratio is hence an estimate of what proportion of the most important output is being made OA annually by that institution).

This calculation could easily be done for the only university IR among the 7 analyzed above, Cambridge University‘s. It was probably chosen because it is the IR containing the largest total number of items (see ROAR) and one of the few IRs with a total item count big enough to be comparable with the total counts of the multi-institutional collections such as Arxiv. However, it is unclear what proportion of the items in Cambridge’s IR are the full-texts of journal articles — and what percentage of Cambridge’s annual journal article output this represents.

CERN is an institution, but not a multidisciplinary university: High Energy Physics only. CERN has, however, done the recommended estimate of its annual OA growth in 2006 and found its IR “Three Quarters Full and Counting. http://library.cern.ch/HEPLW/12/papers/2/
CERN, moreover, is one of the 25 institutions, universities and departments that have mandated deposit in their IR. Those are also the IRs that are growing the fastest.

(Deblauwe notes that”Resources… remain a big issue, e.g., in 2006, after the initially-funded three years, DSpace@Cambridge’s growth rate slowed down due to underestimation of the expenses and difficulty of scaling up.” I would suggest that what Cambridge needs is not more resources for the IR but a deposit mandate, like Southampton’s, QUT’s, Minho’s, CERN’s, Harvard’s, Stanford’s, and the rest of the 25 mandates to date: See ROARMAP.)

Stevan Harnad
American Scientist Open Access Forum

Self-Promotion Bias in Arxiv Deposit Listings

Dietrich, JP (2008) Disentangling visibility and self-promotion bias in the arXiv: astro-ph positional citation effect. PUBLICATIONS OF THE ASTRONOMICAL SOCIETY OF THE PACIFIC 120 (869): 801-804

Abstract: We established in an earlier study that articles listed at or near the top of the daily arXiv:astro-ph mailings receive on average significantly more citations than articles further down the list. In our earlier work we were not able to decide whether this positional citation effect was due to author self-promotion of intrinsically more citable papers or whether papers are cited more often simply because they are at the top of the astro-ph listing. Using new data we can now disentangle both effects. Based on their submission times we separate articles into a selfpromoted sample and a sample of articles that achieved a high rank on astro-ph by chance and compare their citation distributions with those of articles in lower astro-ph positions. We find that the positional citation effect is a superposition of self-promotion and visibility bias.

This interesting paper reports that in the physics Arxiv (astrophysics sector), where virtually all current articles in astrophysics are OA in preprint form (with no postprint OA problem in astrophysics either) several factors significantly influence citation counts:

(1) Arxiv provides a daily list of articles deposited. The articles higher on that list are more cited than the articles lower on that list.

(2) Whether an article appears higher on that list does not depend on merit. It depends on what time the article was deposited.

(3) Timing is predictable from time zones and geography, so if these two factors are controlled for, one can also identify which articles were (probably) deliberately timed by their authors so as to appear near the top of the list (“self-promotion”).

(4) This study shows that even after one has removed any effect of self-promotion, appearing nearer the top of the list randomly still increases an article’s citation count.

(5) In addition, self-promotion itself increases an article’s citation count too. (The assumption is that the more self-promoted papers are better, hence more likely to have higher citation counts; this may or may not be the only or main reason why self-promotion further increases citations over and above the list position effect.)

The authors rightly point out that in a high-output field like astrophysics, visibility is an important factor in usage and citations, and authors need alerting and navigation aids based on importance, relevance and quality, rather than on random timing and author self-promotion biasses.

I would add that in fields — whether high- or low-output — that, unlike astrophysics, are not yet OA, accessibility itself probably has much the same sort of effect on citations that visibility does in an OA field like astrophysics. (Even maximized visibility cannot make articles accessible to those who cannot afford access to the full-text.)

Stevan Harnad
American Scientist Open Access Forum

Are Online and Free Online Access Broadening or Narrowing Research?

Evans, James A. (2008) Electronic Publication and the Narrowing of Science and Scholarship Science 321(5887): 395-399 DOI:10.1126/science.1150473

Excerpt:[Based on] a database of 34 million articles, their citations (1945 to 2005), and online availability (1998 to 2005),… as more journal issues came online, the articles [cited] tended to be more recent, fewer journals and articles were cited, and more of those citations were to fewer journals and articles… [B]rowsing of print archives may have [led] scientists and scholars to [use more] past and present scholarship. Searching online… may accelerate consensus and narrow the range of findings and ideas built upon.

Evans found that as more and more journal issues are becoming accessible online (mostly only the older back-issues for free), journals are not being cited less overall, but citations are narrowing down to fewer articles, cited more.

In one of the few fields where this can be and has been analyzed thoroughly, astrophysics, which effectively has 100% Open Access (OA) (free online access) already, Michael Kurtz too found that with free online access to everything, reference lists became (a little) shorter, not longer, i.e., people are citing (somewhat) fewer papers, not more, when everything is accessible to them free online.

The following seems a plausible explanation: 

Before OA, researchers cited what they could afford to access, and that was not necessarily all the best work, so they could not be optimally selective for quality, importance and relevance. (Sometimes — dare one say it? — they may even have resorted to citing “blind,” going by just the title and abstract, which they could afford, but not the full text, to which they had no subscription.) 

In contrast, when everything becomes accessible, researchers can be more selective and can cite only what is most relevant, important and of high quality. (It has been true all along that about 80-90% of citations go to the top 10-20% of articles. Now that the top 10-20% (along with everything else in astrophysics), is accessible to everyone, everyone can cite it, and cull out the less relevant or important 80-90%.

This is not to say that OA does not also generate some extra citations for lesser articles too; but the OA citation advantage is bigger, the better the article — the “quality advantage” — (and perhaps most articles are not that good!).  Since the majority of published articles are uncited (or only self-cited), there is probably a lot published that no amount of exposure and access can render worth citing!

(I think there may also exist some studies [independent of OA] on “like citing like” — i.e., articles tending to be cited more at their own “quality” level rather than a higher one. [Simplistically, this means within their own citation bracket, rather than a higher one.] If true, this too could probably be analyzed from an OA standpoint.)

But the trouble is that apart from astrophysics and high energy physics, no other field has anywhere near 100% OA: It’s closer to 15% in other fields. So aside from a (slightly negative) global correlation (between the growth of OA and the average length of the reference list), the effect of OA cannot be very deeply analyzed in most fields yet.

In addition, insofar as OA is concerned, much of the Evans effect seems to be based on “legacy OA,” in which it is the older literature that is gradually being made accessible online or freely accessible online, after a long non-online, non-free interval. Fields differ in their speed of uptake and their citation latencies. In physics, which has a rapid turnaround time, there is already a tendency to cite recent work more, and OA is making the turnaround time even faster. In longer-latency fields, the picture may differ. For the legacy-OA effect especially, it is important to sort fields by their citation turnaround times; otherwise there can be biases (e.g. if short- or long-latency fields differ in the degree to which they do legacy OA archiving).

If I had to choose between the explanation of the Evans effect as a recency/bandwagon effect, as Evans interprets it, or as an increased overall quality/selectivity effect, I’d choose the latter (though I don’t doubt there is a bandwagon effect too). And that is even without going on to point out that Tenopir & King, Gingras and others have shown that — with or without OA — there is still a good deal of usage and citation of the legacy literature (though it differs from field to field).

I wouldn’t set much store by “skimming serendipity” (the discovery of adjacent work while skimming through print issues), since online search and retrieval has at least as much scope for serendipity. (And one would expect more likelihood of a bandwagon effect without OA, where authors may tend to cite already cited but inaccessible references “cite unseen.”)

Are online and free online access broadening or narrowing research? They are broadening it by making all of it accessible to all researchers, focusing it on the best rather than merely the accessible, and accelerating it.

Stevan Harnad
American Scientist Open Access Forum

Open Access: “Gratis” and “Libre”

Re-posted from Peter Suber’s Open Access News. (This is to register 100% agreement on this definition of “Gratis” and “Libre” OA, and on the new choice of terms.)


Peter Suber:

    Green/Gold OA and Gratis/Libre OA

This table is to accompany an article in the August issue of SOAN, which I [{eter Suber] just mailed.  But I hope it will also be useful in its own right.  (SOAN uses plain text and doesn’t support tables.) 

  Gratis OA
(removing price barriers)
Libre OA
(removing both price and permission barriers)
Green OA
(through repositories)
1 2
Gold OA
(through journals)
3 4

Some observations:

  • In April 2008, Stevan Harnad and I proposed some terms to describe two kinds of free online access:  the kind which removes price barriers alone and the kind which removes price barriers and at least some permission barriers as well.  The distinction is fundamental and widely-recognized, but we saw right away that our terms (weak OA and strong OA) were ill-chosen and we stopped using them.  However, all of us who work for OA and talk about OA still need vocabulary to describe this basic distinction.  The most neutral and descriptive terms I’ve been able to find so far are "gratis OA" and "libre OA", and I’ve decided use them myself until I find better ones.  This choice of terms is personal and provisional.  But to make it more effective, I wanted to explain it in public.
  • "Gratis" and "libre" may not be familiar terms in the domain of scholarly communication and OA.  But in the neighboring domain of free and open source software, they exactly express the distinction I have in  mind.
  • The main point of this table is to show that the gratis/libre distinction is not synonymous with the green/gold distinction.  The green/gold distinction is about venues.  The gratis/libre distinction is about user rights or freedoms. 
  • All four cells of the table are non-empty.  Green OA can be gratis or libre, and gold OA can be gratis or libre. 
  • Libre OA includes or presupposes gratis OA.  But neither green nor gold OA presupposes the other, although they are entirely compatible and much literature is both.
  • All four cells can contain peer-reviewed literature.  None of these parameters is about bypassing peer review.   
  • Because there are many different permission barriers to remove, there many different degrees or kinds of libre OA.  Gratis OA is just one thing, but libre OA is a range of things. 
  • The BBB definition describes one kind or subset of libre OA.  But not all libre OA is BBB OA. 
  • I’m  not proposing a change in the BBB definition, and I haven’t retreated an inch in my support for it.  I’m simply proposing vocabulary to help us talk unambiguously about two species of free online access.

This blog post is just a sketch.  For more detail, see the full SOAN article.


Peter Suber


Davis et al’s 1-year Study of Self-Selection Bias: No Self-Archiving Control, No OA Effect, No Conclusion

Davis, PN, Lewenstein, BV, Simon, DH, Booth, JG, & Connolly, MJL (2008) Open access publishing, article downloads, and citations: randomised controlled trial British Medical Journal 337: a568

Overview (by SH):

Davis et al.‘s study was designed to test whether the “Open Access (OA) Advantage” (i.e., more citations to OA articles than to non-OA articles in the same journal and year) is an artifact of a “self-selection bias” (i.e., better authors are more likely to self-archive or better articles are more likely to be self-archived by their authors).

The control for self-selection bias was to select randomly which articles were made OA, rather than having the author choose. The result was that a year after publication the OA articles were not cited significantly more than the non-OA articles (although they were downloaded more).

The authors write:

“To control for self selection we carried out a randomised controlled experiment in which articles from a journal publisher?s websites were assigned to open access status or subscription access only”

The authors conclude:

“No evidence was found of a citation advantage for open access articles in the first year after publication. The citation advantage from open access reported widely in the literature may be an artefact of other causes.”

Commentary:

To show that the OA advantage is an artefact of self-selection bias (or of any other factor), you first have to produce the OA advantage and then show that it is eliminated by eliminating self-selection bias (or any other artefact).

This is not what Davis et al. did. They simply showed that they could detect no OA advantage one year after publication in their sample. This is not surprising, since most other studies, some based based on hundreds of thousands of articles, don’t detect an OA advantage one year after publication either. It is too early.

To draw any conclusions at all from such a 1-year study, the authors would have had to do a control condition, in which they managed to find a sufficient number of self-selected, self-archived OA articles (from the same journals, for the same year) that do show the OA advantage, whereas their randomized OA articles do not. In the absence of that control condition, the finding that no OA advantage is detected in the first year for this particular sample of 247 out of 1619 articles in 11 physiological journals is completely uninformative.

The authors did find a download advantage within the first year, as other studies have found. This early download advantage for OA articles has also been found to be correlated with a citation advantage 18 months or more later. The authors try to argue that this correlation would not hold in their case, but they give no evidence (because they hurried to publish their study, originally intended to run four years, three years too early.)

(1) The Davis study was originally proposed (in December 2006) as intended to cover 4 years:

Davis, PN (2006) Randomized controlled study of OA publishing (see comment

It has instead been released after a year.

(2) The Open Access (OA) Advantage (i.e., significantly more citations for OA articles, always comparing OA and non-OA articles in the same journal and year) has been reported in all fields tested so far, for example:

Hajjem, C., Harnad, S. and Gingras, Y. (2005) Ten-Year Cross-Disciplinary Comparison of the Growth of Open Access and How it Increases Research Citation ImpactIEEE Data Engineering Bulletin 28(4) pp. 39-47.

(3) There is always the logical possibility that the OA advantage is not a causal one, but merely an effect of self-selection: The better authors may be more likely to self-archive their articles and/or the better articles may be more likely to be self-archived; those better articles would be the ones that get more cited anyway.

(4) So it is a very good idea to try to control methodologically for this self-selection bias: The way to control it is exactly as Davis et al. have done, which is to select articles at random for being made OA, rather than having the authors self-select.

(5) Then, if it turns out that the citation advantage for randomized OA articles is significantly smaller than the citation advantage for self-selected-OA articles, the hypothesis that the OA advantage is all or mostly just a self-selection bias is supported.

(6) But that is not at all what Davis et al. did.

(7) All Davis et al. did was to find that their randomized OA articles had significantly higher downloads than non-OA articles, but no significant difference in citations.

(8) This was based on the first year after publication, when most of the prior studies on the OA advantage likewise find no significant OA advantage, because it is simply too early: the early results are too noisy! The OA advantage shows up in later years (1-4).

(9) If Davis et al. had been more self-critical, seeking to test and perhaps falsify their own hypothesis, rather than just to confirm it, they would have done the obvious control study, which is to test whether articles that were made OA through self-selected self-archiving by their authors (in the very same year, in the very same journals) show an OA advantage in that same interval. For if they do not, then of course the interval was too short, the results were released prematurely, and the study so far shows nothing at all: It is not until you have actually demonstrated an OA advantage that you can estimate how much of that might due to a self-selection artefact!

(10) The study shows almost nothing at all, but not quite nothing, because one would expect (based on our own previous study, which showed that early downloads, at 6 months, predict enhanced citations at a year and a half or later) that Davis’s increased downloads too would translate into increased citations, once given enough time.

Brody, T., Harnad, S. and Carr, L. (2006) Earlier Web Usage Statistics as Predictors of Later Citation ImpactJournal of the American Association for Information Science and Technology (JASIST) 57(8) pp. 1060-1072.

(11) The findings of Michael Kurtz and collaborators are also relevant in this regard. They looked only at astrophysics, which is special, in that (a) it is a field with only about a dozen journals, to which every research astronomer has subscription access — these days they also have free online access via ADS — and (b) it is a field in which most authors self-archive their preprints very early in arxiv — much earlier than the date of publication.

Kurtz, M. J. and Henneken, E. A. (2007) Open Access does not increase citations for research articles from The Astrophysical Journal. Preprint deposited in arXiv September 6, 2007.

(12) Kurtz & Henneken, too, found the usual self-archiving advantage in astrophysics (i.e., about twice as many citations for OA papers than non-OA), but when they analyzed its cause, they found that most of the cause was the Early Advantage of access to the preprint, as much as a year before publication of the (OA) postprint. In addition, they found a self-selection bias (for preprints — which is all that were involved here, because, as noted, in astrophysics, as of publication, everything is OA): The better articles by the better authors were more likely to have been self-archived as preprints.

(13) Kurtz’s results do not generalize to all fields, because it is not true of other fields either that (a) they already have 100% OA for their published postprints, or that (b) many authors tend to self-archive preprints before publication.

(14) However, the fact that early preprint self-archiving (in a field that is 100% OA as of postprint publication) is sufficient to double citations is very likely to translate into a similar effect, in a no-OA, no-preprint field, if one reckons on the basis of the one-year access embargo that many publishers are imposing on the postprint. (The yearlong “No-Embargo” advantage provided by postprint OA in other fields might not turn out to be so big as to double citations, as the preprint Early Advantage in astrophysics did, because at least there is some subscription access to the postprint; but the counterpart of the Early Advantage for the postprint is likely to be there too.)

(15) Moreover, the preprint OA advantage is primarily Early Advantage, and only secondarily Self-Selection.

(16) The size of the postprint self-selection bias would have been what Davis et al. tested — if they had done the proper control, and waited long enough to get an actual OA effect to compare against.

(17) We had reported in an unpublished 2007 pilot study that there was no statistically significant difference between the size of the OA advantage for mandated (i.e., obligatory) and unmandated (i.e., self-selected) self-archiving:

Hajjem, C & Harnad, S. (2007) The Open Access Citation Advantage: Quality Advantage Or Quality Bias?Preprint deposited in arXiv January 22, 2007. 

(18) We will soon be reporting the results of a 4-year study on the OA advantage in mandated and unmandated self-archiving that confirms these earlier findings: Mandated self-archiving is like Davis et al.‘s randomized OA, but we find that it does not reduce the OA advantage at all — once enough time has elapsed for there to be an OA Advantage at all. 

Stevan Harnad
American Scientist Open Access Forum

50th Green OA Self-Archiving Mandate Worldwide: France’s ANR/SHS

The Humanities and Social Sciences branch of France’s Agence Nationale de la recherche has just announced its Green OA self-archiving mandate — France’s first funder mandate (France’ second mandate overall, and the world’s 50th). See ROARMAP

Note that the situation in France with central repositories is very different from the case of NIH’s PMC repository: France’s HAL is a national central repository where (in principle) (1) all French research output –from every field, and every institution– can be deposited and (again, in principle) (2) every French institution (or department or funder) can have its own interface and “look” in HAL, a “virtual” Institutional Repository (IR), saving it the necessity of creating an IR of its own if it does not feel it needs to.

The crucial underlying question — and several OA advocates in France are raising the question, notably, Hélène Bosc, in a forthcoming article (meanwhile, see this) — is whether the probability of adopting institutional OA mandates in France is increased or decreased by the HAL option: Are universities more inclined to adopt a window on HAL, and to mandate central deposit of all their institutional research output, or would they be more inclined to mandate deposit in their own autonomous university IRs, which they manage and control?

Again, the SWORD protocol for automatic import and export between IRs and CRs is pertinent, because then it doesn’t matter which way institutions prefer to do it.


Agence Nationale de la recherche (ANR) (Humanities and Social Sciences Branch) (FRANCE*
funder-mandate)

Institution’s/Department’s OA Eprint Archives

Institution’s/Department’s OA Self-Archiving Policy

[Paraphrase by T. Chanier]

“The Humanities and Social Sciences branch of the French National Research Agency (ANR) mandates [requires] researchers involved in projects that it funds to deposit their scienfically validated (refereed postprint) publications in the HAL-SHS open archive, without any delay.

“HAL is a nation-wide open archive supported by all public French Research Insitutions. Hal-SHS is a sub-part dedicated to Humanities and Social Sciences

“In November 2007, the general ANR agency had merely invited its researchers to deposit in HAL

“This time (July 2008), the ANR’s SHS branch mandates that its researchers deposit (and requests project leaders to confirm that the deposit is done by everyone).”

** text extracted (July 2008) from the ANR SHS text **

“Par un communiqué :en date du 14 novembre 2007, l?ANR incite les chercheurs, porteurs ou partenaires de projets financés par elle, à intégrer leurs publications dans le système d?archives ouvertes HAL avec lequel elle collabore.

“Le Département SHS de l?ANR et la cellule-support de l?ENS LSH souhaitent donner un écho particulier à cette directive qu?ils jugent essentielle pour une visibilité accrue de la recherche française en SHS.

“La communauté des porteurs et partenaires de projets ANR en SHS doit ainsi se mobiliser autour d?un objectif commun qui est celui d?un dépôt systématique de leurs productions scientifiques dans HAL- SHS, interface SHS de l?archive HAL.

“Il est demandé aux porteurs et responsables de projets ANR de s?assurer, au sein de leurs équipes (chercheurs, universitaires, post-doc, doctorants, qu?ils soient français et étrangers), de l?intégration dans HAL de l?ensemble des publications (articles, communications, contributions à ouvrages collectifs, ou autres productions éligibles) réalisées dans le cadre du projet, et ce, au fur et à mesure de leur élaboration (par exemple dès la soumission à une revue puis à nouveau au moment de la publication effective).

Registered by: Thierry Chanier (Professor, leader of ANR SHS-funded project) on 29 Jul 2008

Hybrid-Gold Discount From Publishers That Embargo Green OA: No Deal

I am not at all sure that Kudos are in order for Oxford University Press (OUP), just because they offer authors at subscribing institutions a discount on their hybrid Gold OA fee:

Unlike the American Psychological Association (yes, the much maligned APA!), the American Physical Society, Elsevier, Cambridge University Press and all the other 232 publishers (57%) of the 6457 journals (63%) that are on the side of the angelsfully Green on immediate post-print self-archiving — OUP is among the Pale-Green minority of 48 publishers (12%) of 3228 journals (32%) (such as Nature, which back-slid to a postprint embargo ever since 2005).

OUP’s post-print policy is:

12 month embargo on science, technology, medicine articles
24 month embargo on arts and humanities articles
Pre-print can only be posted prior to acceptance
Pre-print must not be replaced with post-print, instead a link to published [toll] version
Articles in some journals can be made Open Access on payment of additional charge

Should we really be singing the praises of each publisher’s discount on their hybrid Gold OA fee for the double-payment they are exacting (from the subscribers as well as the authors)?

I would stop applauding as progress for OA every self-interested step taken by those publishers who do not first take the one essential OA-friendly step: going (fully) Green.

Yes, OUP are lowering fees annually in proportion to hybrid Gold OA uptake, but they are meanwhile continuing to hold the post-print hostage for 12-24 months.

In reality, all the fee reduction means is an adjustment for double-dipping — plus a lock-in on the price of Gold OA, and a lockout of Green OA.

Stevan Harnad
American Scientist Open Access Forum

Alma Swan on “Where researchers should deposit their articles”

Alma Swan has just posted an excellent overview of “Where researchers should deposit their articles

This clear, solid, sensible essay converges on the essence of a rather divergent series of discussion threads currently ongoing in the American Scientist Open Access Forum.

It is followed up with the preliminary posting of some results from a survey of Institutional Repository (IR) managers which indicate that

(1) The IRs with mandated deposit have the least difficulty collecting content (compared to IRs with no institutional deposit policy at all or merely a policy encouraging deposit).

(2) The IRs with author-only deposit have the least difficulty collecting content (compared to IRs with librarian-only deposit or both author- and librarian-deposit).

(3) The IRs with author deposit have the least difficulty collecting metadata (compared to IRs with librarian-only deposit or both author- and librarian-deposit).

Excerpts from the Alma Swan’s essay:

The issue of which model for Open Access self-archiving is best ? asking researchers to deposit their work in centralised, subject-based repositories or in their own institutional repository ? is again being discussed at length….

“…Chris Awre and I argued three years ago, in our study on ‘Linking UK Repositories‘ (and in a short paper from that study here) that distributed deposit was the best model to aim for, [but] we were arguing from a theoretical standpoint. Only a handful of universities in the UK had at the time shown any sign of understanding what opportunities lay ahead in the way universities disseminate the results of their efforts, and of the responsibilities they have towards society.

[Since then] subject-based collections have been making the running and… until recently most institutions have seemed to be disinterested in supporting the efforts to make research more widely available and used…

“The universities continued to snore but while they did so at least the funders were out of bed, showered and breakfasted. Unfortunately, instead of nudging awake the universities – their partners in research endeavour and the employers of the people to whom they hand out funds ? some big funders let them lie, circumventing them in the mechanics of the Open Access process. I would suggest that in doing this they were failing to take the whole research community’s interests into account…

“Now there are stirrings in the academy… universities finally ‘get it’, which is great for them, for research and for society. Unfortunately, they are getting it later than would have been ideal [because] …many [funder] mandates stipulate [a central repository] as the deposit locus (not so good for the employers of the fundees – the universities).

“[W]e shouldn’t get too wound up about this… but it is a shame that we have arrived at a point where universities, the mainstays of our societies’ research endeavours, have to develop more complex policies than would otherwise have been the case had funders simply directed their grantees to deposit their work in their institutional collections and harvested from there. The funders know where their grantees are, the repository software has a metadata field for funder, so the mechanics are simple. The benefit of such a move would have been to help the universities see the overall plan (earlier than they have done), ensure they put the right infrastructure in place and encouraged them to apply similar polices to cover all the research their employees do. The whole research community would thus be included and benefiting by this time, not just the… communities covered by big funder mandates. I would say that the research funders have rather let down their partners, the universities, in this sense.

“Deposit rates for [funder-mandated repositories] are not yet all they should be…. [P]eople are taking steps to remedy this, but how much easier it is for universities to attain a high level of compliance: they say, quite simply, that the repository is where they will be looking for material to be included in research assessment (and for staff appraisals, promotions boards, tenure committees …)…. [T]here is one thing more important to a researcher than a hypothetical risk of not getting future funding, and that is a non-hypothetical risk of not being employed for too much longer. It sharpens the focus just a tad.

“…So subject-specific collections… should be harvesting from the university repositories all the material that is relevant to that subject. They can provide all manner of nice services on that collection, tailored to the needs of that particular subject community.

Distributed, local deposit works with human nature, researcher preferences and the structure of the international research system, which remains institutionally-based; and the universities – those large, expensive edifices we all pay for and wish to see operate at maximum efficiency – get to collect their own research together and use the collection to manage their research effort so much better than ever before.”

The OA Deposit-Fee Kerfuffle: APA’s Not Responsible; NIH Is. PART II.

      [see also PART I and PART 0]

SUMMARY: The concept underlying the OAI metadata harvesting protocol is that local, distributed, content-provider sites each provide their own content and global service-provider sites harvest that content and provide global services over it, such as indexing, search, and other added values. (This is not a symmetric process. It does not make sense to think of the individual content-providers as “harvesting” their own content (back) from global service-providers.)
    The question is accordingly whether OA deposit mandates should be (1) convergent, with both institutional and funder mandates requiring deposit in the author’s own OA Institutional Repository (IR), for harvesting by global overlay OA services and collections (such as PubMed Central, PMC) or (2) divergent, requiring authors to deposit all over the map, locally or distally, possibly multiple times, depending on field and funding. It seems obvious that coordinated, convergent IR deposit mandates from both institutions and funders will bring universal OA far more surely and swiftly than needless and counterproductive divergence.
    In the interests of a swift, seamless, systematic, global transition to universal OA, NIH should accordingly make one tiny change (entailing no loss at all in content or functionality) in its otherwise invaluable, historic, and much-imitated mandate: NIH should mandate IR deposit and harvest to PMC from there.
    The spirit of the Congressional directive that publicly funded research should be made publicly accessible online, free for all, is fully met once everyone, webwide, can click on the link to an item whose metadata they have found in PMC, and the article instantly appears, just as if they had retrieved it via Google, regardless of whether the item’s URL happens to be in an IR or in PMC itself.
    A possible reason the NIH mandate took the divergent form it did may have been a conflation of access archiving with preservation archiving: But the version that NIH has (rightly) stipulated for OA deposit (each “investigator’s… electronic version of their final, peer-reviewed manuscripts upon acceptance for publication“) is not even the draft that is in the real need of preservation; it is just a supplementary copy, provided for access purposes: The definitive version, the one that really stands in need of preservation, is not this author-copy but the publisher’s official proprietary version of record.
    For preservation, the definitive document needs to be deposited in an archival depository (preferably several, for safe-keeping, updating and migration as technology evolves), not an OA collection like PMC. But that essential archival deposit/preservation function has absolutely nothing to do with either the author or with OA.


Peter Suber: “At the moment, I see two conflicting APA statements and no evidence that either statement [2002 or 2008] took the other into account. So I’m still waiting for a definitive clarification from the APA. But as I say, if the APA reaffirms the 2002 policy to allow no-fee, no-embargo self-archiving to IRs, then I will applaud it.”

That will shortly sort itself out.


[See APA update, which appeared after this posting. Peter has since responded to that update too. The only point to add is that Stuart Shieber‘s concern about a remaining ambiguity in yet another APA document will no doubt likewise be resolved in the same way. (Stuart was the architect of Harvard FAS’s institutional OA mandate and has since been appointed director of Harvard’s newly formed Office for Scholarly Communication.)]


It seems obvious to me that the only coherent resolution is that APA’s 2002 Green OA policy takes precedence over the contradictory passages in APA’s 2008 PMC addendum. It would be arbitrary bordering on dementia to declare that:

“Our policy is that any APA author may self-archive their own refereed final draft in their own IR for free as long they are not mandated to do so by NIH; but if they are mandated to do so by NIH, then they must pay us $2500 to do it!”

I predict that the proposed APA policy will first be:

“All we meant was that, as before, any APA author may self-archive their own refereed final draft in their own IR for free, but depositing APA’s proprietary published version in PMC will cost $2500.”

And then they will back down from the surcharge altogether. (I do have a bit of a track-record for correctly second-guessing APA policy!)

Peter Suber: “However, if the APA retains the “deposit fee” for NIH-funded authors, then I will continue to criticize it. The APA will still be charging for green OA, which is utterly unnecessary.”

Do continue to criticize it, Peter, but please make sure the criticism is on target: As long as APA authors are free to provide green OA by depositing in their own IRs, APA can definitely not be said to be “charging for green OA” if APA charges authors for depositing in PMC (any more than I can be said to be charging for water if I say “water is free but bring your own container” and you insist on water in a container).

The $2500 fee is indeed absurd, but that absurdity (and a many other counterproductive consquences) would be completely remedied by NIH’s simply dropping its supererogatory requirement to deposit directly in PMC, and harvesting the metadata from the IRs instead. A central collection like PMC is just that: a collection. It is sufficient for such collections to harvest the metadata (as Google does) and to link to the full-text where it is actually deposited, i.e., the IR of the institution it came from.

Peter Suber: “[APA] will still fail to deliver immediate OA, or OA to the published edition, which fee-based [Gold or optional-Gold] OA journals always deliver in exchange for their fees.”

You mean the publisher’s proprietary version? But even the NIH mandate is only requiring deposit of the author’s final refereed draft, not the publisher’s proprietary version:

The NIH Public Access Policy implements Division G, Title II, Section 218 of PL 110-161 (Consolidated Appropriations Act,2008).  The law states:

The Director of the National Institutes of Health shall require that all investigators funded by the NIH submit or have submitted for them to the National Library of Medicine?s PubMed Central an electronic version of their final, peer-reviewed manuscripts upon acceptance for publication, to be made publicly available no later than 12 months after the official date of publication: Provided, That the NIH shall implement the public access policy in a manner consistent with copyright law.

I also think you may be equating the $2500 fee with a (hybrid) optional-Gold OA fee (from a non-Green publisher such as ACS). But it is not that. APA’s is a PMC deposit fee, from a Green publisher. (There is no relevant category for a requirement to deposit in a 3rd-party CR, because it is arbitrary to have to do so, and has nothing to do with OA itself, which APA authors can already provide via Green OA in their own IRs.)

Moreover, to heap absurdity upon absurdity, we both know, Peter, that (1) not only does it not matter one bit, for OA accessibility to one and all, webwide, whether a document’s locus is an IR or a CR, but (2) if and when all of OA’s target content is made OA, one way or the other, then the distinction between 1st-party (author-institution), 2nd-party (publisher) and 3rd party (PMC, UKPMC, EuroPMC, Google, or any other CR) archiving becomes irrelevant, the game is over, universal OA has at last arrived, and all these trivial locus and party details as well as this absurd talk of deposit surcharges becomes moot.

The problem is with first reaching that universal OA, which is already long, long overdue (after many, many false starts, including a prior one by NIH itself, 3 years ago, which elicited a compliance rate below 4%, less than a third of the global average for spontaneous — i.e., unmandated — self-archiving.)

And coordinated, convergent IR deposit mandates — funder mandates complementing institutional mandates — will get us there far more surely and swiftly than the needless and counterproductive divergence we have imposed on ourselves by not thinking the PMC locus stipulation through in advance (or fixing it as it becomes more and more apparent that it creates unanticipated and unnecessary problems).

Peter Suber: “If the APA reaffirms its 2002 green policy, then NIH-funded authors could bypass the deposit fee when self-archiving to their IRs. But they couldn’t bypass the fee when self-archiving to PMC, and they are bound by the NIH policy to deposit in PMC (or have their journal do so for them).”

Correct, but isn’t this reasoning a bit circular, if not fatalistic? Which one is cluttering the path to universal OA (now that we have the invaluable NIH mandate)? APA, which blesses OA self-archiving in the author’s own OA IR, for free, or NIH, which (unnecessarily) insists on mandating more than “merely” OA?

Would it not be better for NIH to think it through, and then — patiently, in the interests of a swift, seamless, systematic, global progression to universal OA — make in its otherwise invaluable, historic, and much-imitated mandate the one tiny change that (with no loss at all in content or functionality) will create the optimal conditions for a full-scale transition to universal OA, rather than only (the NIH/PMC) part of it?

Let NIH mandate IR deposit and harvest from there.

Peter Suber: “Stevan hopes that policies like the APA’s will pressure the NIH to drop this requirement and allow deposits in an IR to suffice. But even if that ought to happen, it won’t happen soon and very likely won’t happen at all. One reason is simply that the requirement to deposit in PMC was mandated by Congress. The NIH undoubtedly supports the Congressional directive, but it’s not an in-house policy decision that the agency is free to reverse at will.”

Deposits in IRs can be harvested into PMC. The issue here is merely the locus of the point of direct deposit.

Does anyone imagine that the spirit of the Congressional directive — to the effect that publicly funded research should be made publicly accessible online, free for all — would not be fully met once everyone, webwide, can click on the link to an item whose metadata they have retrieved from PMC, and the article instantly appears, just as if they had retrieved it via Google, but the item’s URL happens to be in an IR rather than in PMC!

Or are OA self-archiving issues being conflated with preservation archiving issues here (yet again, as so often happens, and inevitably at OA’s expense)? If so, the preservation of what: “final, peer-reviewed manuscripts”?


Access Archiving or Preservation Archiving? One discerns the dead hand of digital preservationists here, pushing their worthy but distinct agenda, oblivious to the fact that the content they seek to preserve is mostly not even OA yet, and that the version that NIH has (rightly) stipulated for OA deposit (each “investigator’s… electronic version of their final, peer-reviewed manuscripts upon acceptance for publication“) is not even the draft that is in the real need of preservation, but just a supplementary copy, provided for access purposes: The definitive version, the one that really stands in need of preservation, is not this author-copy but the original itself: the publisher’s official proprietary version of record. But is it not crucial, here especially, to raise the fundamental question: Is the NIH mandate an access mandate or is it a preservation mandate? For preservation, one needs to deposit a (digital and analog) original in an archival depository (preferably several, for safe-keeping, updating and migration as technology evolves), not an OA collection like PMC. That essential archival deposit/preservation function has absolutely nothing to do with either the author or with OA, and APA would certainly have no problem with a digital deposit requirement like that…


Peter Suber: “But should Congress and the NIH prefer PMCs to IRs? Maybe, maybe not. I see good arguments on both sides.”

For OA functionality, the locus of deposit makes zero difference. For preservation, OA is beside the point and unnecessary. But for OA content-provision itself — and not just for NIH-funded content, but for all of OA’s target content, across all disciplines, institutions and nations — locus of deposit matters enormously. There’s no functionality without content. And I know of no good argument at all in favor of institution-external direct deposit, insofar as OA content-provision is concerned; only a lot of good arguments against it.

Peter Suber: “But they are irrelevant here because (1) the APA deposit fee would still [be] unnecessary”

Why is it just APA’s absurd $2500 fee for PMC deposit that is singled out as being unnecessary (given that the APA is Green on free OA IR deposit): Is NIH’s gratuitous stipulation of PMC deposit not likewise unnecessary (for OA)?

(This question is all the more germane given that the global transition to universal OA stands to benefit a lot more from NIH’s dropping its gratuitous (and alas much imitated) deposit-locus stipulation than from APA’s dropping its absurd bid for a PMC deposit fee.)

Peter Suber: “(2) there’s no evidence that the APA was motivated, as Stevan is, to protest the preference for PMC –as opposed to (say) mandatory OA.”

But I never said the APA was motivated to protest the preference for PMC! That really would be absurd. I am certain that APA (and every other non-OA publisher) is none too thrilled about either author self-archiving or mandatory OA, anywhere, in any form!

But APA nevertheless did the responsible thing, and bit the bullet on formally endorsing institutional self-archiving. There’s no (OA) reason they should have to bite it on institution-external, 3rd-party archiving in PMC too (even though the distinction will eventually be mooted by universal OA) — though the response of the OA community, if directed, myopically, at APA alone, and not NIH, will no doubt see to it that they will.

Frankly, I think APA just saw an opportunity to try to make a buck, and maybe also to put the brakes on an overall process that they saw as threatening to their current revenue streams. Can’t blame them for thinking that; it may turn out to be true. But as long as they’re Green, they’re “gold,” as far as OA is concerned (though, to avoid conflicting terminology, let us just say they are “on the side of the angels“).

Peter Suber: “(For the record, my position is close to Stevan’s: institutional and disciplinary repositories should harvest from one another; that would greatly lower the stakes in the question where an OA mandate should require initial deposit; if we got that far, I’d be happy to see a policy require deposit in IRs.)”

I’m afraid I can’t quite follow Peter’s reasoning here:

The issue is whether deposit mandates should be convergent — requiring all authors to deposit in their own OA IRs, for harvesting by global overlay OA services and collections therefrom — or divergent, requiring authors to deposit all over the map, possibly multiply, depending on field and funding, possibly necessitating “reverse-harvesting,” with each institution’s software having to trawl the web, looking to retrieve its own institutional output, alas deposited institution-externally.

(That last is not really “harvesting” at all; rather, it involves a functional misunderstanding of the very concept of harvesting: The OAI concept is that there are local content-providers and global service-providers. Content-providers are local and distributed, each providing its own content — in this case, institutional IRs. Then there are service-providers, who harvest that content [or just the content’s metadata and URL] from the distributed, interoperable content-providers, and provide global services on it, such as indexing, search, and other added values. This is not a symmetric process. It does not make sense to think of the content-providers as “harvesting” their own content (back) from the service-providers! Another way to put this is that — although it was not evident at the time — OAI-interoperability really meant the end of the need for “central repositories” (CRs) for direct deposit. Now there would just be central collections (services), harvested from distributed local content-providers. No need to deposit distally. And certainly no sense in depositing distally only to “harvest” it back home again! Institutional content-provision begins and ends with the institution’s own local IR; the rest is just global, webwide harvesting and service-provision.)

Peter Suber: “Stevan does call the deposit fee absurd. So we agree on that as well. But he adds that the NIH preference for PMC over IRs “reduced us to this absurdity”. I’m afraid that’s absurd too. If the NIH preference for PMC somehow compelled publishers to respond with deposit fees, then we’d see many of them. But in fact we see almost none.”

(1) Of course APA’s $2500 deposit fee is absurd. But — given that APA is Green on OA, and given the many reasons why convergent IR deposit, mandated by institutions as well as funders, not only makes more sense but is far more likely to scale up, coherently and systematically, to universal OA across disciplines, institutions and nations than divergent willy-nilly deposit of institutional content here, there and everywhere — I welcome this absurd outcome (the $2500 PMC deposit fee) and hope the reductio ad absurdum it reveals helps pinpoint (and fix) the real source of the absurdity, which is not APA’s wistful surcharge, but NIH’s needless insistence on direct deposit institution-externally in PMC.

(2) I have no idea whether the OA community’s hew and cry about the $2500 APA surcharge for PMC deposit will be targeted exclusively at APA (and any other publishers that get the same bright idea), forcing them to withdraw it, while leaving the dysfunctional NIH constraint on locus of deposit in place.

(3) I hope, instead, that the OA community will have the insight to target NIH’s constraint on deposit locus as well, so as to persuade NIH to optimize its widely-imitated policy in the interests of its broader implications for the prospects of global OA — one small step for NIH but a giant leap for mankind — by fixing the one small bug in an otherwise brilliant policy.

Peter Suber: “Even if the NIH preference for PMC were a choice the agency could reverse at will, the APA deposit fee is another choice, not necessitated by the NIH policy and not justified by it.”

Where there’s a will, there’s a way, and here it’s an extremely simple way, a mere implementational detail: Instead of depositing directly in PMC, authors deposit in their IRs and send PMC the URL. If NIH adopted that, the APA’s PMC deposit surcharge bid would instantly become moot.

If the furor evoked by the APA $2500 surcharge proved to be the factor that managed to inspire NIH to take the rational step that rational argument alone has so far been powerless to inspire, then that will be a second (unintentional) green feather in APA’s cap, and another of the ironies and absurdities of our long, somnambulistic trek toward the optimal and inevitable outcome for scientific and scholarly research.

A Simple Way to Optimize the NIH Public Access Policy (Oct 2004)

Please Don’t Copy-Cat Clone NIH-12 Non-OA Policy! (Jan 2005)

National Institutes of Health: Report on the NIH Public Access Policy. In: Department of Health and Human Services (Jan 2006, reporting 3.8% compliance rate after 8 months for its first, non-mandatory deposit policy)

Central versus institutional self-archiving (Sep 2006)

Optimizing OA Self-Archiving Mandates: What? Where? When? Why? How? (Sep 2006)

THE FEEDER AND THE DRIVER: Deposit Institutionally, Harvest Centrally (Jan 2008)

Optimize the NIH Mandate Now: Deposit Institutionally, Harvest Centrally (Jan 2008)

Yet Another Reason for Institutional OA Mandates: To Reinforce and Monitor Compliance With Funder OA Mandates (Feb 2008)

How To Integrate University and Funder Open Access Mandates (Mar 2008)

One Small Step for NIH, One Giant Leap for Mankind (Mar 2008)

NIH Invites Recommendations on How to Implement and Monitor Compliance with Its OA Self-Archiving Mandate (Apr 2008)

Institutional Repositories vs Subject/Central Repositories (Jun 2008)

Stevan Harnad
American Scientist Open Access Forum