Public draft: OA eBook Usage Data Analytics and Reporting Use-cases by Stakeholder. Feedback invited through July 10, 2021

Publishers, libraries, and a diverse array of scholarly communications platforms and services generate information about how OA books are accessed online. Since its launch in 2015, the OA eBook Usage Data Trust (@OAEBU_project) effort has brought together these thought leaders to document the barriers facing OA eBook usage analytics. To start addressing these challenges and to understand the role of a usage data trust, the effort has spent the last year studying and documenting the usage data ecosystem. Interview-based research led to the documentation of the OA book data supply chain, which maps related metadata and usage data standards and workflows. Dozens worldwide have engaged in human-centered design workshops and communities of practice that went virtual during 2020. Together these communities revealed how OA book publishers, platforms, and libraries are looking beyond their need to provide usage and impact reports. Workshop findings are now documented within use-cases that list the queries and activities where usage data analytics can help scholars and organizations to be more effective and strategic. Public comment is invited for the OA eBook Usage Data Analytics and Reporting Use Cases Report through July 10, 2021.

Recognition and rewards – Open Science – Universiteit Utrecht

“Open science means action. And the way we offer recognition and reward to academics and university staff is key in bringing about the transition that Utrecht University aims for. Over the course of the past year the working group on Recognition and Rewards, part of the Open Science Programme, has reflected and thoroughly debated a novel approach to ensuring that we offer room for everyone’s talent, resulting in a new vision (pdf)….

In the current system, researchers and their research are judged by journal impact factors, publisher brands and H-indices, and not by actual quality, real use, real impact and openness characteristics….

Under those circumstances, at best open science practices are seen as posing an additional burden without rewards. At worst, they are seen as actively damaging chances of future funding and promotion & tenure. Early career researchers are perhaps the most dependent on traditional evaluation culture for career progression, a culture held in place by established researchers, as well as by institutional, national and international policies, including funder mandates….”

 

 

Utrecht University Recognition and Rewards Vision

“By embracing Open Science as one of its five core principles1, Utrecht University aims to accelerate and improve science and scholarship and its societal impact. Open science calls for a full commitment to openness, based on a comprehensive vision regarding the relationship with society. This ongoing transition to Open Science requires us to reconsider the way in which we recognize and reward members of the academic community. It should value teamwork over individualism and calls for an open academic culture that promotes accountability, reproducibility, integrity and transparency, and where sharing (open access, FAIR data and software) and public engagement are normal daily practice. In this transition we closely align ourselves with the national VSNU program as well as developments on the international level….”

Data tracking in research: aggregation and use or sale of usage data by academic publishers

“This briefing paper issued by the Committee on Scientific Library Services and Information Systems (AWBI) of the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) on the subject of data tracking in digital research resources describes options for the digital tracking of research activities. It outlines how academic publishers are becoming data analytics specialists, indicates the consequences for research and its institutions, and identifies the types of data mining that are being used. As such, it primarily serves to present contemporary practices with a view to stimulating discussion so that positions can be adopted regarding the consequences of these practices for the academic community. It is aimed at all stakeholders in the research landscape….

Potentially, research tracking of this kind can fundamentally contradict academic freedom and informational self-determination. It can endanger scientists and hinder the freedom of competition in the field of information provision. For this reason, scholars and academic institutions must become aware of the problem and clarify the legal, technical and ethical framework conditions of their information supply – not least so as to avoid involuntarily violating applicable law, but also to ensure that academics are appropriately informed and protected. AWBI’s aim in issuing this briefing paper is to encourage a broad debate within the academic community – at the level of academic decision-makers, among academics, and within information infrastructure institutions – so as to reflect on the practice of tracking, its legality, the measures required for compliance with data protection and the consequences of the aggregation of usage data, thereby enabling such measures to be adopted. The collection of data on research and research activity can be useful as long as it follows clear-cut, transparent guidelines, minimises risks to individual researchers and ensures that academic organisations are able to use such data if not have control over it.” 

Reporting Global Usage and Usage of Open Content Not Attributed to Institutions

“The COUNTER Code of Practice currently states about the Institution_Name in the report header that ‘For OA publishers and repositories, where it is not possible to identify usage by individual institutions, the usage should be attributed to “The World”’ (Section 3.2.1, Table 3.f). When this rule was added the focus was on fully Open Access publishers, and the expectation – which obviously was wrong and has caused some confusion – was that the fully OA publishers would not try to attribute usage to institutions. So, a report to “The World” was intended to include all global usage, whether attributed to institutions or not.

This document shows how usage could be reported to “The World” and how the global usage could be broken down and filtered.

Please note, that these reports would NOT be a mandatory requirement. Those content providers that wished to use them, could do so.

We are seeking your thoughts about how useful these reports might be, and more specifically on some of the technical details. Please provide your feedback at https://www.surveymonkey.co.uk/r/3CQZVH2  The survey questions are included at the end of this document, so that you can discuss them with colleagues before submitting your responses online….”

Science as a Public Good: Public Use and Funding of Science

Abstract:  Knowledge of how science is consumed in public domains is essential for a deeper understanding of the role of science in human society. While science is heavily supported by public funding, common depictions suggest that scientific research remains an isolated or ‘ivory tower’ activity, with weak connectivity to public use, little relationship between the quality of research and its public use, and little correspondence between the funding of science and its public use. This paper introduces a measurement framework to examine public good features of science, allowing us to study public uses of science, the public funding of science, and how use and funding relate. Specifically, we integrate five large-scale datasets that link scientific publications from all scientific fields to their upstream funding support and downstream public uses across three public domains – government documents, the news media, and marketplace invention. We find that the public uses of science are extremely diverse, with different public domains drawing distinctively across scientific fields. Yet amidst these differences, we find key forms of alignment in the interface between science and society. First, despite concerns that the public does not engage high-quality science, we find universal alignment, in each scientific field and public domain, between what the public consumes and what is highly impactful within science. Second, despite myriad factors underpinning the public funding of science, the resulting allocation across fields presents a striking alignment with the field’s collective public use. Overall, public uses of science present a rich landscape of specialized consumption, yet collectively science and society interface with remarkable, quantifiable alignment between scientific use, public use, and funding.

 

Open access book usage data – how close is COUNTER to the other kind?

Abstract:  In April 2020, the OAPEN Library moved to a new platform, based on DSpace 6. During the same period, IRUS-UK started working on the deployment of Release 5 of the COUNTER Code of Practice (R5). This is, therefore, a good moment to compare two widely used usage metrics – R5 and Google Analytics (GA). This article discusses the download data of close to 11,000 books and chapters from the OAPEN Library, from the period 15 April 2020 to 31 July 2020. When a book or chapter is downloaded, it is logged by GA and at the same time a signal is sent to IRUS-UK. This results in two datasets: the monthly downloads measured in GA and the usage reported by R5, also clustered by month. The number of downloads reported by GA is considerably larger than R5. The total number of downloads in GA for the period is over 3.6 million. In contrast, the amount reported by R5 is 1.5 million, around 400,000 downloads per month. Contrasting R5 and GA data on a country-by-country basis shows significant differences. GA lists more than five times the number of downloads for several countries, although the totals for other countries are about the same. When looking at individual tiles, of the 500 highest ranked titles in GA that are also part of the 1,000 highest ranked titles in R5, only 6% of the titles are relatively close together. The choice of metric service has considerable consequences on what is reported. Thus, drawing conclusions about the results should be done with care. One metric is not better than the other, but we should be open about the choices made. After all, open access book metrics are complicated, and we can only benefit from clarity.

 

New Open Access Business Models – What’s Needed to Make Them Work? – The Scholarly Kitchen

“The third CHORUS Forum meeting, held last week, is a relatively new entrant into the scholarly communication meeting calendar. The meeting has proven to be a rare opportunity to bring together publishers, researchers, librarians, and research funders. I helped organize and moderated a session during the Forum, on the theme of “Making the Future of Open Research Work.” You can watch my session, which looked at new models for sustainable and robust open access (OA) publishing, along with the rest of the meeting in the video below.

The session focuses on the operationalization of the move to open access and the details of what it takes to experiment with a new business model. The model the community has the most experience with, the individual author paying an article-processing-charge (APC), works really well for some authors, in some subject areas, in some geographies. But it is not a universal solution to making open access work and it creates new inequities as it resolves others….

Some of the key takeaways for me were found in the commonalities across all of the models. The biggest hurdle that each organization faced in executing its plans was gathering and analyzing author data. As Sara put it, “Data hygiene makes or breaks all of these models.” For PLOS and the ACM, what they’re asking libraries to support is authorship – the model essentially says “this many papers had authors from your institution and what you pay will largely be based on the volume of your output.” But disambiguating author identity, and especially identifying which institutions each represents, remains an enormous problem. While we do have persistent identifiers (PIDs) like ORCID, and the still-under-development ROR, their use is not universal, and we still lack a unifying mechanism to connect the various PIDs into a simple, functional tool to support this type of analysis.

One solution would be requiring authors to accurately identify their host institutions from a controlled vocabulary, but this runs up against most publishers’ desire to streamline the article submission process. There’s a balance to be struck, but probably one that’s going to ask authors to provide more accurate and detailed information….

[M]oving beyond the APC is essential to the long-term viability of open access, and there remains much experimentation to be done….”

Open Access Resources and Evaluation; or: why OA journals might fare badly in terms of conventional usage | Martin Paul Eve | Professor of Literature, Technology and Publishing

“I am frequently asked, by libraries, to provide usage statistics for their institutions at the Open Library of Humanities. I usually resist this, since there are a number of ways in which the metrics are not usually a fair comparison to subscription resources. A few notes on this.

We do not have or require any login information. This means that the only way that we can provide usage information is by using the institutional IP address. This, in turn, means that we can only capture on-site access. This is not the same for journals that have paywalls. They can capture a login, from off-site, and attribute these views to the institution. Therefore, if you compare usage of OA journals vs. paywalled journals, the paywalled journals will likely have higher usage stats, because they include off-site access, which is not possible for OA journals (though Knowledge Unlatched did some interesting work on geo-tracking of off-site access). Further, our authors may deposit copies of their work in institutional repositories or anywhere else – and we encourage this. Again, though, the decentralization makes it very hard to get any meaningful statistical tracking.

Different institutions want us to report on different things. Some want to know “are our academics publishing in OLH journals?” while others want to know “are our academics reading OLH journals?” The reporting requirements for these are different and it seems that OLH is judged differently by different institutional desires.

We run a platform that is composed of several different pieces of journal technology: we have journals at Ubiquity Press; we have journals running on Janeway; and we have journals running on proprietary systems at places like Liverpool University Press. These all run on different reporting systems and require us to interact with different vendors for different usage requests. Reporting in this way requires me to take time out of running other parts of the platform. In short: the labour overhead of this type of reporting is fairly large and adds to the overall costs that we have in running the platform.

There is a privacy issue in tracking our readers. When the US Government has banned the use of the term “climate change”, it seems reasonable to worry that tracking users, by IP address, in logs that could be subpoenaed, could genuinely carry some risk. Indeed, as a library, it feels important to us to protect our readers.

View counts are a terrible proxy for actual reading.

Our mission is to change subscription journals to an OA basis. Libraries have been asked, at each stage, to vote on this. They have done so enthusiastically. We hope that, in doing so, libraries recognise what we are doing and will not just resort to crude rankings of usage in continuing to support us (and, indeed, most do). But I can also see the temptation, in the current budget difficulties, to fall back on usage stats as a ranking of where to invest….”

Web analytics for open access academic journals: justification, planning and implementation | BiD: textos universitaris de biblioteconomia i documentació

Abstract:  An overview is presented of resources and web analytics strategies useful in setting solutions for capturing usage statistics and assessing audiences for open access academic journals. A set of complementary metrics to citations is contemplated to help journal editors and managers to provide evidence of the performance of the journal as a whole, and of each article in particular, in the web environment. The measurements and indicators selected seek to generate added value for editorial management in order to ensure its sustainability. The proposal is based on three areas: counts of visits and downloads, optimization of the website alongside with campaigns to attract visitors, and preparation of a dashboard for strategic evaluation. It is concluded that, from the creation of web performance measurement plans based on the resources and proposals analysed, journals may be in a better position to plan the data-driven web optimization in order to attract authors and readers and to offer the accountability that the actors involved in the editorial process need to assess their open access business model.

 

 

Library Vendor Platforms Need a Strategic Reboot to Meet Librarian Curriculum Development Needs – The Scholarly Kitchen

“Our industry must create an equal handshake between paid and open content if our platform is to solve the problem that brings a user to the platform. If I am seeking the best aligned and most comprehensive set of resources to design a course, I must have equal access to open and paid content. To achieve this handshake, I propose three key principles:

Platforms need full-text, complete video files, audiobooks, etc. of the relevant content, paid and open, to improve the metadata searched for discovery and the user experience once an item is selected as appropriate.
The search results pages and content entity pages must  clearly display the open access/OER symbol, and the Creative Commons license applied to the content for future uses. In addition, an explanation of the license will often be required to reduce faculty uncertainty about reuse. For example, CC BY-NC 2.0 allows for remixing and re-use but not for commercial gain. A patron may struggle to understand this rights limitation without clear guidance from the platform.
Content providers, publishers, distributors, etc. are the lifeblood of the platform. Platforms invest heavily in services and functionality, but without content there is no user experience. To this end, and especially for providers of open content, we need to deliver robust data and insight into usage, engagement, and impact. Publishers need to see open and paid content usage by account, to include time viewed/pages turned, etc. Publishers need to see how the content is engaged with and when (time of day, device used) and publishers need to see how the content has impacted the recipient, e.g., student performance metrics….”

Coleridge Initiative – Show US the Data | Kaggle

“This competition challenges data scientists to show how publicly funded data are used to serve science and society. Evidence through data is critical if government is to address the many threats facing society, including; pandemics, climate change, Alzheimer’s disease, child hunger, increasing food production, maintaining biodiversity, and addressing many other challenges. Yet much of the information about data necessary to inform evidence and science is locked inside publications.

Can natural language processing find the hidden-in-plain-sight data citations? Can machine learning find the link between the words used in research articles and the data referenced in the article?

Now is the time for data scientists to help restore trust in data and evidence. In the United States, federal agencies are now mandated to show how their data are being used. The new Foundations of Evidence-based Policymaking Act requires agencies to modernize their data management. New Presidential Executive Orders are pushing government agencies to make evidence-based decisions based on the best available data and science. And the government is working to respond in an open and transparent way.

This competition will build just such an open and transparent approach. …”

An analysis of use statistics of electronic papers in a Korean scholarly information repository

Abstract:  Introduction. This study aimed to analyse the current use status of Korean scholarly papers accessible in the repository of the Korea Institute of Science and Technology Information in order to assess the economic validity of the maintenance and operation of the repository.

Method. This study used the modified historical cost method and performed regression analysis on the use of Korean scholarly papers by year and subject area.

Analysis. The development cost of the repository and the use volumes were analysed based on 1,154,549 Korean scholarly papers deposited in the Institute repository.

Results. Approximately 86% of the deposited papers were downloaded at least once and on average, a paper was downloaded over twenty-six times. Regression analysis showed that the ratio of use of currently deposited papers is likely to decrease by 7.6% annually, as new ones are added.

Conclusions. The need to manage currently deposited papers for at least thirteen years into the future and provide empirical proof that the repository has contributed to Korean researchers conducting research and development in the fields of science and technology. The benefit-cost ratio was above nineteen, confirming the economic validity of the repository.

What We Talk About When We Talk About… Book Usage Data

“Over the last two-and-a-half years, we have been working as part of the EU-funded HIRMEOS (High Integration of Research Monographs in the European Open Science Infrastructure) project to create open source software and databases to collectively gather and host usage data from various platforms for multiple publishers. As part of this work, we have been thinking deeply about what the data we collect actually means. Open Access books are read on, and downloaded from, many different platforms – this availability is one of the benefits of making work available Open Access, after all – but each platform has a different way of counting up the number of times a book has been viewed or downloaded.

Some platforms count a group of visits made to a book by the same user within a continuous time frame (known as a session) as one ‘view’ – we measure usage in this way ourselves on our own website – but the length of a session might vary from platform to platform. For example, on our website we use Google Analytics, according to which one session (or ‘view’) lasts until there is thirty minutes of inactivity. But platforms that use COUNTER-compliant figures (the standard that libraries prefer) have a much shorter time-frame for a single session – and such a platform would record more ‘views’ than a platform that uses Google Analytics, even if it was measuring the exact same pattern of use.[2]

Other platforms simply count each time a book is accessed (known as a visit) as one ‘view’. There might be multiple visits by the same user within a short time frame – which our site would count as one session, or one ‘view’ – but which a platform counting visits rather than sessions would record as multiple ‘views’.

Downloads (which we also used to include in the number of ‘views’) also present problems. For example, many sites only allow chapter downloads (e.g. JSTOR), others only whole book downloads (e.g. OAPEN), and some allow both (e.g. our own website). How do you combine these different types of data? Somebody who wants to read the whole book would need only one download from OAPEN, but as many downloads as there are chapters from JSTOR – thus inflating the number of downloads for a book that has many chapters.

So aggregating this data into a single figure for ‘views’ isn’t only comparing apples with oranges – it’s mixing apples, oranges, grapes, kiwi fruit and pears. It’s a fruit salad….”

Visualizing Book Usage Statistics with Metabase · punctum books

“There is an inherent contradiction between publishing open access books and gathering usage statistics. Open access books are meant to be copied, shared, and spread without any limit, and the absence of any Digital Rights Management (DRM) technology in our PDFs makes it indeed impossible to do so. Nevertheless, we can gather an approximate impression of book usage among certain communities, such as hardcopy readers and those connected to academic infrastructures, by gathering data from various platforms and correlating them. These data are useful for both our authors and supporting libraries to gain insight into the usage of punctum publications.undefined

As there exists no ready-made open-source solution that we know of to accomplish this, for many years we struggled to import these data from various sources into ever-growing spreadsheets, with ever more complicated formulas to extract meaningful data and visualize them. This year, we decided to split up the database and correlation/visualization aspects, by moving the data into a MySQL database managed via phpMyAdmin, while using Metabase for the correlation and visualization part. This allows us to expose our usage data publicly, while also keeping them secure….”