Producing Open Data

Abstract:  Open data offer the opportunity to economically combine data into large-scale datasets, fostering collaboration and re-use in the interest of treating researchers’ resources as well as study participants with care. Whereas advantages of utilising open data might be self-evident, the production of open datasets also challenges individual researchers. This is especially true for open data that include personal data, for which higher requirements have been legislated. Mainly building on our own experience as scholars from different research traditions (life sciences, social sciences and humanities), we describe best-practice approaches for opening up research data. We reflect on common barriers and strategies to overcome them, condensed into a step-by-step guide focused on actionable advice in order to mitigate the costs and promote the benefit of open data on three levels at once: society, the disciplines and individual researchers. Our contribution may prevent researchers and research units from re-inventing the wheel when opening data and enable them to learn from our experience.


‘All your data are belong to us’: the weaponisation of library usage data and what we can do about it | UKSG

By Caroline Ball – Academic Librarian, University of Derby, #ebookSOS campaigner
Twitter: @heroicendeavour, Mastodon:

and Anthony Sinnott, Access and Procurement Development Manager, University of York; Twitter: @librarianth

What do 850 football players and their performance data have in common with academic libraries and online resources? More than you’d think! The connecting factor is data, how it is collected, used and for what purposes.

‘Project Red Card’ is demanding compensation for the use of footballers’ performance data by betting companies, video game manufacturers, scouts and others, arguing that players should have more control over how their personal data is collected and particularly how it is monetized and commercialised.

Similarly, libraries’ online resources, whether a single ebook or vast databases, are producing enormous amounts of data, utilised by librarians to assist us in our vital functions: assessing usage and value, determining demand and relevance.

But are we the only ones using this data generated by our users? What other uses is this data being put to? We know for certain that vendors have access to more data than they provide to us via COUNTER statistics etc, but we have no way of knowing how much, what types, or what is done with it.

Witness the recent controversy generated by Wiley’s removal of 1,379 e-books from Academic Complete. Publishers like Wiley determine high use by accessing statistics generated by our end-users via the various e-book platforms through which they access the content. This in itself is indicative of our end-user/library data being provided to third parties without our knowledge or consent, particularly concerning given our licences are with vendors and not publishers. We are also not privy to what data-sharing agreements exist between vendors and publishers. Should we allow library usage data to be weaponized against us in this fashion? What recourse do we have to push back against this practice of ‘data extractivism’, to either withhold this data from publishers and vendors or prohibit them from using it for their own commercial gain?



The Quiet Invasion of ‘Big Information’ | WIRED

This story is adapted from Data Cartels: The Companies That Control and Monopolize Our Information, by Sarah Lamdan.


When people worry about their data privacy, they usually focus on the Big Five tech companies: Google, Apple, Facebook, Amazon, and Microsoft. Legislators have brought Facebook’s CEO to the capitol to testify about the ways the company uses personal data. The FTC has sued Google for violating laws meant to protect children’s privacy. Each of the tech companies is followed by a bevy of reporters eager to investigate how it uses technology to surveil us. But when Congress got close to passing data privacy legislation, it wasn’t the Big Five that led the most urgent effort to prevent the law from passing, it was a company called RELX.

You might not be familiar with RELX, but it knows all about you. Reed Elsevier LexisNexis (RELX) is a Frankensteinian amalgam of publishers and data brokers, stitched together into a single information giant. There is one other company that compares to RELX—Thomson Reuters, which is also an amalgamation of hundreds of smaller publishers and data services. Together, the two companies have amassed thousands of academic publications and business profiles, millions of data dossiers containing our personal information, and the entire corpus of US law. These companies are a culmination of the kind of information market consolidation that’s happening across media industries, from music and newspapers to book publishing. However, RELX and Thomson Reuters are uniquely creepy as media companies that don’t just publish content but also sell our personal data.


How to protect privacy in open data | Nature Human Behaviour

“When sharing research data for verification and reuse, behavioural researchers should protect participants’ privacy, particularly when studying sensitive topics. Because personally identifying data remain present in many open psychology datasets, we urge researchers to mend privacy via checks of re-identification risk before sharing data. We offer guidance for sharing responsibly….”

ORCID’s 2022 Public Data File Now Available – ORCID

“ORCID was founded on a set of 10 principles, some of which directly mirror the goals of the Open Access initiative. In particular, our 7th founding principle states: All data contributed to ORCID by researchers or claimed by them will be available in standard formats for free download (subject to the researchers’ own privacy settings) that are updated once a year and released under a CC0 waiver. This is why we publish our annual public data file, as we do each year, usually during Open Access Week. 

Our 2021 data file was downloaded 14,299 times and received one citation. In 2020, the data file contributed to the data visualization that showed the digital footprint of Covid-19 research in an astounding map produced by the Research Graph Foundation. 

The file is available in XML format, however, if you prefer JSON, you can use our ORCID Conversion Library available in our Github repository. This Java application enables the generation of JSON from XML in the default version ORCID schema format.

The data is divided into 12 subsets for easier download and use. The first set contains the full record summary for each record. The other 11 contain the activities for each record, including full work data. We also have an article for those who need help working with bulk data.


We look forward to seeing how the research community will take advantage of this free, open source of data that is an asset to the research ecosystem. Do you have plans to use the public data file? Let us know by contacting us at or Tweet us @ORCID_org to let us know. ”

Webinar: Dr. Chris Gilliard aka HyperVisible on Educational Surveillance. 20 Oct 2022, 6pm (EDT)| The Feminist and Accessible Publishing, Communications, + Tech Series @Eventbrite

Dr. Chris Gilliard (@hypervisible) is a leading critic of surveillance technology, digital privacy, and the problematic ways that tech intersects with race and social class. He will talk about the digital forms of surveillance that are coming into schools, colleges, and universities.

Dr. Chris Gilliard is a writer, professor and speaker. His scholarship concentrates on digital privacy, and the intersections of race, class, and technology. He is an advocate for critical and equity-focused approaches to tech in education. His work has been featured in The Chronicle of Higher Ed, EDUCAUSE Review, Fast Company, Vice, and Real Life Magazine. He was recently a a research fellow with the Technology and Social Change Research Project at Harvard Kennedy School’s Shorenstein Center.

This event is part of the 4th Season of the Feminist and Accessible Publishing and Communications Technologies Speaker and Workshop Series (, organized by Dr. Alex Ketchum.

Our series was made possible thanks to our sponsors: SSHRC, the Institute for Gender, Sexuality, and Feminist Studies (IGSF), the DIGS Lab, Milieux, Initiative for Indigenous Futures, MILA, Dean of Arts Grant, ReQEF, and more (see our website!)

There is no fee required to attend this event. We will provide professional captions in english. This event will NOT be recorded and NOT bemade available on our website after the event. However, you can watch other past events at:


Navigating Risk in Vendor Data Privacy Practices: An Initial Analysis of Elsevier’s ScienceDirect

“As libraries spend more and more resources licensing platforms, the terms around how vendors treat user data have become complex and difficult to understand. This poses serious concerns given the ever-increasing incentive for vendors to monetize this data—many in ways fundamentally at odds with libraries’ commitment to privacy. Protecting user privacy is a challenge for libraries trying to navigate through the vague and abstruse vendor contracts and policies. To assist libraries in this challenge, SPARC has partnered with Becky Yoose of LDH Consulting Services to analyze vendor contracts and privacy policies to provide libraries a better understanding of the potential risks they pose to user privacy….”

Statement by Library Futures and SPARC on Wiley E-Textbook Withdrawal | SPARC

In late August, at the start of the Fall 2022 school semester, Wiley Publishing Company abruptly withdrew 1,379 multidisciplinary titles from Proquest, a vendor for university ebook collections around the world. As a result, librarians and faculty members in the United States and internationally have scrambled to identify alternative textbook options for their students as the pandemic amplified the trouble with restrictive licensing and e-textbooks.

Library Futures and SPARC strongly condemn this action by Wiley, which seriously hinders students’ access to equitable, affordable course materials. The full list of titles and public contact information for their authors was compiled by Johanna Anderson of #ebookSOS.



Open science and data sharing in trauma research: Developing a trauma-informed protocol for archiving sensitive qualitative data. – PsycNET

Abstract:  Objective: The open science movement seeks to make research more transparent, and to that end, researchers are increasingly expected or required to archive their data in national repositories. In qualitative trauma research, data sharing could compromise participants’ safety, privacy, and confidentiality because narrative data can be more difficult to de-identify fully. There is little guidance in the traumatology literature regarding how to discuss data-sharing requirements with participants during the informed consent process. Within a larger research project in which we interviewed assault survivors, we developed and evaluated a protocol for informed consent for qualitative data sharing and engaging participants in data de-identification. Method: We conducted qualitative interviews with N = 32 adult sexual assault survivors regarding (a) how to conduct informed consent for data sharing, (b) whether participants should have input on sharing their data, and (c) whether they wanted to redact information from their transcripts prior to archiving. Results: No potential participants declined participation after learning about the archiving mandate. Survivors indicated that they wanted input on archiving because the interview is their story of trauma and abuse and it would be disempowering not to have control over how this information was shared and disseminated. Survivors also wanted input on this process to help guard their privacy, confidentiality, and safety. None of the participants elected to redact substantive data prior to archiving. Conclusions: Engaging participants in the archiving process is a feasible practice that is important and empowering for trauma survivors. (PsycInfo Database Record (c) 2022 APA, all rights reserved)

Reasons for qualitative psychologists to share human data – Karhulahti – British Journal of Social Psychology – Wiley Online Library

Abstract:  Qualitative data sharing practices in psychology have not developed as rapidly as those in parallel quantitative domains. This is often explained by numerous epistemological, ethical and pragmatic issues concerning qualitative data types. In this article, I provide an alternative to the frequently expressed, often reasonable, concerns regarding the sharing of qualitative human data by highlighting three advantages of qualitative data sharing. I argue that sharing qualitative human data is not by default ‘less ethical’, ‘riskier’ and ‘impractical’ compared with quantitative data sharing, but in some cases more ethical, less risky and easier to manage for sharing because (1) informed consent can be discussed, negotiated and validated; (2) the shared data can be curated by special means; and (3) the privacy risks are mainly local instead of global. I hope this alternative perspective further encourages qualitative psychologists to share their data when it is epistemologically, ethically and pragmatically possible.


Journal of Medical Internet Research – A Study of Publicly Available Resources Addressing Legal Data-Sharing Barriers: Systematic Assessment

Abstract:  Background:

United States data protection laws vary depending on the data type and its context. Data projects involving social determinants of health often concern different data protection laws, making them difficult to navigate.

Objective:We systematically aggregated and assessed useful online resources to help navigate the data-sharing landscape.

Methods:We included publicly available resources that discussed legal data-sharing issues with some health relevance and published between 2010 and 2019. We conducted an iterative search with a common string pattern using a general-purpose search engine that targeted 24 different sectors identified by Data Across Sectors for Health. We scored each online resource for its depth of legal and data-sharing discussions and value for addressing legal barriers.

Results:Out of 3710 total search hits, 2721 unique URLs were reviewed for scope, 322 received full-text review, and 154 were selected for final coding. Legal agreements, consent, and agency guidance were the most widely covered legal topics, with HIPAA (The Health Insurance Portability and Accountability Act), Family Educational Rights and Privacy Act (FERPA), Title 42 of the Code of Federal Regulations Part 2 being the top 3 federal laws discussed. Clinical health care was the most prominent sector with a mention in 73 resources.

Conclusions:This is the first systematic study of publicly available resources on legal data-sharing issues. We found existing gaps where resources covering certain laws or applications may be needed. The volume of resources we found is an indicator that real and perceived legal issues are a substantial barrier to efforts in leveraging data from different sectors to promote health.

Mündiges Datensubjekt statt Laborratte: Rechtsschutz gegen Wissenschaftstracking | Jahrbuch Technikphilosophie

 by Felix Reda

Bei der Debatte um das Wissenschaftstracking stand bislang vor allem die Sensibilisierung für den Datenschutz im Vordergrund. Das ist ein wichtiger erster Schritt, denn nur wenn Forschende sich darüber bewusst sind, dass ihr Forschungsverhalten Klick für Klick überwacht und kommerziell verwertet wird, können sie sich dafür engagieren, dieser Praxis Einhalt zu gebieten. Doch wie so oft bei Datenschutzthemen droht sich Fatalismus breitzumachen, wenn die Debatte in der Problembeschreibung steckenbleibt.

Viel zu wenige Universitäten bieten ihren Forschenden proaktiv eine eigene, datenschutzsensible Software-Infrastruktur an, die kollaboratives wissenschaftliches Arbeiten auch institutionenübergreifend ermöglichen würde. Große Teile der wissenschaftlichen Literatur sind ausschließlich über die Portale der kommerziellen Wissenschaftsverlage verfügbar, die mit verwirrenden Cookie-Bannern aufwarten. Allein sich einen Überblick zu verschaffen, welche Daten ein Konzern wie Elsevier über einen gespeichert hat, ist ein aufwändiges Unterfangen[1]. Im ohnehin schon stressigen Forschungsalltag ist es unrealistisch, dass einzelne Forschende sich selbst vor dem Tracking durch diese Unternehmen schützen, indem sie deren Produkte meiden.


OSF Preprints | Open science practices in psychiatric genetics: a primer

Abstract:  Open science is a set of practices to ensure that all research elements are transparently reported and freely accessible for all to learn, assess, and build on. Psychiatric genetics has led among the health sciences in implementing some open science practices in common study designs, such as replication as part of genome-wide association studies. However, while additional open science practices could be embedded in genetics research to further improve its quality and accessibility, guidelines for doing so are limited. They are largely not specific to data, privacy, and research conduct challenges in psychiatric genetics. Here, we present a primer of open science practices in psychiatric genetics for multiple steps of the research process, including deciding on a research topic with patients/non-academic collaborators, equitable authorship and citation practices, considerations in designing a replicable, reproducible study, pre-registrations, open data, and privacy issues. We provide tips for creating informative figures, using inclusive, precise language, and following reporting standards. We also discuss considerations in working with non-academic research collaborators (citizen scientists) and outline ways of disseminating research through preprints, blogs, social media, and accessible lecture materials. Finally, we provide a list of extra resources to support every step of the research process.


UPC member states to vote on full access to judgments – JUVE Patent

“On 8 July, the UPC [Unified Patent Court] Administrative Committee will vote on the final draft of the Rules of Procedure, proposing public access to all judgments and orders. …

Participating EU member states first signed off on the Unified Patent Court in 2013. However, ongoing GDPR developments have threatened the transparency of its administration and judicial output. Now a revised final version of the UPC Rules of Procedure stipulates that the public will have access to the content of all decisions and orders.

Judges will remain responsible for redacting any confidential or personal data before formally issuing the decision or order.

The proposals follow multiple professional bodies demanding that, in the interest of transparency, the Rules of Procedure should stipulate full public access to all documents. As such, the committee will consider the Rules of Procedure revisions, along with issues such as which judges will preside over UPC cases, on and from 8 July 2022. All in all, the day will be crucial for the court’s development….”

Many researchers say they’ll share data — but don’t

“Most biomedical and health researchers who declare their willingness to share the data behind journal articles do not respond to access requests or hand over the data when asked, a study reports1. …

But of the 1,792 manuscripts for which the authors stated they were willing to share their data, more than 90% of corresponding authors either declined or did not respond to requests for raw data (see ‘Data-sharing behaviour’). Only 14%, or 254, of the contacted authors responded to e-mail requests for data, and a mere 6.7%, or 120 authors, actually handed over the data in a usable format. The study was published in the Journal of Clinical Epidemiology on 29 May….

Puljak’s results square with those of a study that Danchev led, which found low rates of data sharing by authors of papers in leading medical journals that stipulate all clinical trials must share data2. …

Past research suggests that some fields, such as ecology, embrace data sharing more than others. But multiple analyses of COVID-19 clinical trials — including some from Li4,5 and Tan6 — have reported that anywhere from around half to 80% of investigators are unwilling or not planning to share data freely….

To encourage researchers to prepare their data, Li says, journals could make data-sharing statements more prescriptive. They could require authors to detail where they will share raw data, who will be able to access it, when and how.


Funders could also raise the bar for data sharing. The US National Institutes of Health, in an effort to curb wasteful, irreproducible research, will soon mandate that grant applicants include a data-management and sharing plan in their applications. Eventually, they will be required to share data publicly….”