Now available: Open educational resource of Building Legal Literacies for Text Data Mining – UC Berkeley Library Update

“Last summer we hosted the Building Legal Literacies for Text Data Mining institute. We welcomed 32 digital humanities researchers and professionals to the weeklong virtual training, with the goal to empower them to confidently navigate law, policy, ethics, and risk within digital humanities text data mining (TDM) projects. Building Legal Literacies for Text Data Mining (Building LLTDM) was made possible through a grant from the National Endowment for the Humanities. 

Since the remote institute in June 2020, the participants and project team reconvened in February 2021 to discuss how participants had been thinking about, performing, or supporting TDM in their home institutions and projects with the law and policy literacies in mind.

To maximize the reach and impact of Building LLTDM, we have now published a comprehensive open educational resource (OER) of the contents of the institute. The OER covers copyright (both U.S. and international law), technological protection measures, privacy, and ethical considerations. It also helps other digital humanities professionals and researchers run their own similar institutes by describing in detail how we developed and delivered programming (including our pedagogical reflections and take-aways), and includes ideas for hosting shorter literacy teaching sessions. The resource (available as a web-book or in downloadable formats such as PDF, EPUB, and MOBI) is in the public domain under the CC0 Public Domain Dedication, meaning it can be accessed, reused, and repurposed without restriction. …”

Texas Adopts Transparency Measure for Automatic Textbook Billing – SPARC

“The U.S. state of Texas has enacted the nation’s first law to increase transparency for automatic textbook billing programs. Sponsored by Representative Tan Parker and Senator Brandon Creighton, House Bill 1027 received bipartisan approval from the state legislature last month and was signed into law by Governor Greg Abbot last week.

Often marketed using the term “inclusive access,” automatic textbook billing is the practice of charging the cost of digital course materials to a student’s tuition and fee bill. While some of these programs are implemented on a voluntary “opt-in” basis, others are implemented without confirming a student’s consent, which can lead to unexpected charges and limited ability to seek cost-saving alternatives such as used books. Moreover, these programs effectively force students to accept the publisher’s terms of service, which can open the door to the extensive collection and processing of their personal data….”

Gelenkte Wissenschaft: Die DFG warnt vor Einfluss des Plattformkapitalismus (“Guiding” science: DFG warns against influence of platform capitalism) | Frankfurter Allgemeine

German Research Foundation warns against the growing influence of major publishers on research. Scientific freedom is under threat from two sides.

 

 

Die Deutsche Forschungsgemeinschaft warnt vor dem wachsenden Einfluss der Großverlage auf die Forschung. Die Wissenschaftsfreiheit ist hier von zwei Seiten bedroht.

Qualitative data are shareable – Open Science Future

“Three key learnings:

Sharing qualitative data does not mean depositing them somewhere on the internet.
Sharing qualitative data through data repositories enables controlling secondary use and is safe.
Research data archives offer help in processing data for reuse and some even offer financial support….”

Joint Statement on transparency and data integrity International Coalition of Medicines Regulatory Authorities (ICMRA) and WHO

“ICMRA1 and WHO call on the pharmaceutical industry to provide wide access to clinical data for all new medicines and vaccines (whether full or conditional approval, under emergency use, or rejected). Clinical trial reports should be published without redaction of confidential information for reasons of overriding public health interest….

Regulators continue to spend considerable resources negotiating transparency with sponsors. Both positive and negative clinically relevant data should be made available, while only personal data and individual patient data should be redacted. In any case, aggregated data are unlikely to lead to re-identification of personal data and techniques of anonymisation can be used….

 

Providing systematic public access to data supporting approvals and rejections of medicines reviewed by regulators, is long overdue despite existing initiatives, such as those from the European Medicines Agency and Health Canada. The COVID-19 pandemic has revealed how essential to public trust access to data is. ICMRA and WHO call on the pharmaceutical industry to commit, within short timelines, and without waiting for legal changes, to provide voluntary unrestricted access to trial results data for the benefit of public health.”

 

 

genomeRxiv: a microbial whole-genome database for classification, identification, and data sharing

“genomeRxiv is a newly-funded US-UK collaboration to provide a public, web-accessible database of public genome sequences, accurately catalogued and classified by whole-genome similarity independent of their taxonomic affiliation. Our goal is to supply the basic and applied research community with rapid, precise and accurate identification of unknown isolates based on genome sequence alone, and with molecular tools for environmental analysis….”

Addressing the Alarming Systems of Surveillance Built By Library Vendors – SPARC

“On April 2nd, news broke that RELX subsidiary LexisNexis signed a multi-million dollar contract with U.S. Immigration and Customs Enforcement (ICE). According to reporting on the ICE contract by the Intercept, LexisNexis’ databases “offer an oceanic computerized view of a person’s existence” and will provide the agency with “the data it needs to locate people with little if any oversight.” 

While this contract may be new, it is just the latest development in an alarming trend that SPARC is following. Two major library vendors—RELX and Thomson Reuters—have been building sophisticated, global systems of surveillance that include online tracking technologies, massive aggregation of user data, and the sale of services based on this tracking, including to governments and law enforcement. 

Dollars from library subscriptions, directly or indirectly, now support these systems of surveillance. This should be deeply concerning to the library community and to the millions of faculty and students who use their products each day and further underscores the urgency of privacy protections as library services—and research and education more generally—are now delivered primarily online. …

As alarming as these surveillance technologies are in their own right, they may already be crossing into academic products. Surveillance researcher Wolfie Christl has reported ThreatMetrix tracking code is now embedded in the ScienceDirect website, raising serious questions about what patron information is being collected and toward what purposes….

The Library Freedom Project’s Vendor Privacy Scorecard highlights the many privacy concerns across a wide selection of library vendors….”

Openness in Big Data and Data Repositories | SpringerLink

Abstract:  There is a growing expectation, or even requirement, for researchers to deposit a variety of research data in data repositories as a condition of funding or publication. This expectation recognizes the enormous benefits of data collected and created for research purposes being made available for secondary uses, as open science gains increasing support. This is particularly so in the context of big data, especially where health data is involved. There are, however, also challenges relating to the collection, storage, and re-use of research data. This paper gives a brief overview of the landscape of data sharing via data repositories and discusses some of the key ethical issues raised by the sharing of health-related research data, including expectations of privacy and confidentiality, the transparency of repository governance structures, access restrictions, as well as data ownership and the fair attribution of credit. To consider these issues and the values that are pertinent, the paper applies the deliberative balancing approach articulated in the Ethics Framework for Big Data in Health and Research (Xafis et al. 2019) to the domain of Openness in Big Data and Data Repositories. Please refer to that article for more information on how this framework is to be used, including a full explanation of the key values involved and the balancing approach used in the case study at the end.

 

Openness in Big Data and Data Repositories | SpringerLink

Abstract:  There is a growing expectation, or even requirement, for researchers to deposit a variety of research data in data repositories as a condition of funding or publication. This expectation recognizes the enormous benefits of data collected and created for research purposes being made available for secondary uses, as open science gains increasing support. This is particularly so in the context of big data, especially where health data is involved. There are, however, also challenges relating to the collection, storage, and re-use of research data. This paper gives a brief overview of the landscape of data sharing via data repositories and discusses some of the key ethical issues raised by the sharing of health-related research data, including expectations of privacy and confidentiality, the transparency of repository governance structures, access restrictions, as well as data ownership and the fair attribution of credit. To consider these issues and the values that are pertinent, the paper applies the deliberative balancing approach articulated in the Ethics Framework for Big Data in Health and Research (Xafis et al. 2019) to the domain of Openness in Big Data and Data Repositories. Please refer to that article for more information on how this framework is to be used, including a full explanation of the key values involved and the balancing approach used in the case study at the end.

 

Ants-Review: A Privacy-Oriented Protocol for Incentivized Open Peer Reviews on Ethereum

Abstract. Peer review is a necessary and essential quality control step for scienti?c publications but lacks proper incentives. Indeed, the process, which is very costly in terms of time and intellectual investment, not only is not remunerated by the journals but it is also not openly recognized by the academic community as a relevant scienti?c output for a researcher. Therefore, scienti?c dissemination is a?ected in timeliness, quality and fairness. Here, to solve this issue, we propose a blockchainbased incentive system that rewards scientists for peer reviewing other scientists’ work and that builds up trust and reputation. We designed a privacy-oriented protocol of smart contracts called Ants-Review that allows authors to issue a bounty for open anonymous peer reviews on Ethereum. If requirements are met, peer reviews will be accepted and paid by the approver proportionally to their assessed quality. To promote ethical behaviour and inclusiveness the system implements a gami?ed mechanism that allows the whole community to evaluate the peer reviews and vote for the best ones.

A View Of The Future Of Our Data

“Similarly, many well-intentioned advocates of open data failed to see how free information has always concentrated power in the owners of the fastest information-processing machines. Like the publishers of centuries past, the richest technology companies will always lead in extracting value from open data, giving them unearned leverage over the rest of society. So putting data into the public domain actually does precisely the opposite of leveling the playing field.

If individual data ownership is Scylla, the mythical sea monster who devoured unwary sailors, then open data is Charybdis, the whirlpool near Scylla’s cave. Finding the narrow path between the two means treating data like a police force or a water system — that is, as the subject of widely shared yet deeply responsible governance….”

Large-scale ICU data sharing for global collaboration: the first 1633 critically ill COVID-19 patients in the Dutch Data Warehouse | SpringerLink

“Given these considerations, a large-scale ICU data sharing collaboration in The Netherlands was initiated for the COVID-19 pandemic, resulting in the Dutch Data Warehouse (DDW, Fig. 1). While the database is growing, at this point, the DDW combines pseudonymized EHR data from 23 intensive care units covering the entire ICU admission of all adult COVID-19 patients treated in these ICUs. Collected data include data from monitoring and life support devices, demographics, medication, fluid balance, comorbidities, laboratory results, and outcomes. All parameters were manually reviewed by intensive care professionals and mapped to a common ontology. A software data pipeline converted units, filtered data entry errors, and calculated derived clinical parameters. Data validation was a continuous process including hospital data verification and visual inspection of distribution plots….”

Balancing Privacy With Data Sharing for the Public Good – The New York Times

“This data protection agency could be combined with Data.gov, a government website created in 2009 that assembles and hosts hundreds of thousands of data sets for public use. Together they could form a kind of federal data library, democratizing knowledge for the digital age.

Just as traditional libraries curate and organize their collections, so could a digital library, adding new data sources and cleaning and assembling them for public use. A federal data library could also take the lead in developing and using new tools such as differential privacy, a technique designed to preserve important features of data while protecting individual identities.

Data’s increasing value as an economic resource requires a new way of thinking. Strict privacy protections are needed to make socially valuable data available for the public good.”

Open Science to Address COVID-19: Sharing Data to Make Our Research Investment Go Further | SpringerLink

“Over 1000 randomized clinical trials (RCTs) for the treatment and prevention of COVID-19 have been initiated. With access to the data from RCTs, researchers can integrate and summarize findings, evaluate new hypotheses, design future trials, and prioritize the next research questions to be addressed. This ensures that the value from the investment in the RCTs goes beyond the original intent of the trial protocols. None of this is possible without first having easy and responsible systems to allow access to data: the primary tenets of the open science FAIR principles dictate a proactive intent to share results and patient data from clinical trials [Wilkinson]. While much has been written and progress has been made, there is more to be done in this journey to true openness [Rockhold]. Reasons for this include (1) the well-known complexities of data access (patient privacy, content of the trial’s informed consent and the primary data holder’s decision rights as to sharing), (2) concerns about mis-interpretation of data in the context of secondary research (beyond the original intent of the trial), and (3) the use of platform trials where multiple intervention arms are studied relative to a single control arm.

The International COVID-19 Data Alliance (ICODA) is one of the groups initiating concerted data sharing as a powerful mechanism to address COVID-19. We focus our attention to RCTs recognizing that the Alliance will encompass many other data types….”