What academic research is ChatGPT accessing?

“QUESTION: I don’t know if this is a stupid one. Does LLM in the form of ChatGPT use information in research papers that are behind a paywall? If yes, what is the technology going on there to provide access? If no, is this a huge argument for OA?

It has triggered a pretty robust discussion in Twitter (surprisingly, I thought everyone had left). Because of the tendril-like nature of Twitter discussions I am pulling the salient points together here because some people (including myself) might like to look into some of these references. The discussion broke into a series of questions (elaborated below):

Do these AI systems have access to paywalled content?
How well can ChatGPT actually ‘read’ the internet?
Is this an argument for open access to research output?
Does research output even matter given its small proportion of the internet?
Should we be worried we don’t know?
What are the bigger implications here?
What’s the take home?…”

A framework for improving the accessibility of research papers on arXiv.org

Abstract:  The research content hosted by arXiv is not fully accessible to everyone due to disabilities and other barriers. This matters because a significant proportion of people have reading and visual disabilities, it is important to our community that arXiv is as open as possible, and if science is to advance, we need wide and diverse participation. In addition, we have mandates to become accessible, and accessible content benefits everyone. In this paper, we will describe the accessibility problems with research, review current mitigations (and explain why they aren’t sufficient), and share the results of our user research with scientists and accessibility experts. Finally, we will present arXiv’s proposed next step towards more open science: offering HTML alongside existing PDF and TeX formats. An accessible HTML version of this paper is also available at https://info.arxiv.org/about/accessibility_research_report.html 

Access is not the same as accessibility: A framework for making research papers truly open – arXiv.org blog

“arXiv has pioneered open access for more than 30 years by removing financial, institutional, and geographic barriers to research. No paywalls or fees, no login required for reading. This approach – which gives researchers maximum control over the release of their results and broad visibility – transformed the research process and launched the open access movement.

However, access is not the same as accessibility, which is the practice of ensuring access regardless of disability. The vast majority of research papers posted to any journal or platform do not meet basic accessibility standards.

In 2022, arXiv completed intensive user research with over 40 people to determine the extent of the problem, evaluate current mitigation efforts, and consider solutions. This work, informed by arXiv staff, accessibility experts, and arXiv readers and authors who use assistive technology, is posted on arXiv in PDF and HTML formats (arXivID: 2212.07286).

In extensive interviews, our research participants shared that finding research, reading it, preparing documents, and submitting work are all steps in the research process where people encounter barriers. In particular, interpreting math equations, figures, and charts is problematic.

Flexible content can help address these issues. Offering well-formatted HTML, alongside PDF and TeX source, will lead to critical accessibility gains. arXiv’s collaboration with ar5iv, which currently renders HTML for approximately 70% of arXiv papers, is a first step in this process. Next, we expect to reduce the error rate and add a link to HTML on arXiv abstract pages….”

Bypass Paywalls Clean – Get this Extension for ? Firefox (en-GB)

“Add-on allows you to read articles from websites that implement a paywall.

Not everyone is able to afford multiple subscriptions on many different news sites, especially when they just want to read a single article (from Twitter) without being enrolled in a monthly/yearly membership.

Notice: if you use this add-on regularly on the same website, please consider paying a subscription for it. Don’t forget that free press can’t be sustainable without funding….”

Global impact or national accessibility? A paradox in China’s science | SpringerLink

Abstract:  During the past decades, Chinese science policy has emphasized the international dissemination of research. Such policies were associated with exponential growth of English-language publications and have led China to become the largest contributor to international scientific literature. However, due to the paywalls and language barriers, China’s international publications are less accessible to local Chinese scholars, which suggests that the dissemination to the international scientific community may come at the expense of dissemination to the local Chinese community. This paper investigates the local accessibility of China’s international publications and finds that publishing internationally limits the visibility of Chinese research for the national Chinese scientific community, and the restriction is even worse for immediate access.

 

Open access to research can close gaps for people with disabilities

In a long-overdue move, the federal Office of Science and Technology Policy has issued guidance on making federally supported research and publications available to all without delay or embargo. This remarkable announcement about open access has the potential to remove information barriers that have long held back social and scientific progress.

Even with immediate open access to research results, however, people with disabilities face unique barriers to information access. These issues must be considered as this policy takes shape.

As disabled researchers with vision impairments, we do not have equitable access to scientific information. This includes barriers to accessing data and peer-reviewed publications, which too often are not available in accessible formats. This gap in access is in opposition to federal laws, including the Americans with Disabilities Act and the Rehabilitation Act, which support equal access to information.

But scientific information is not limited to downloading journals and databases. Accessing research data can mean using online software, interactive websites or maps, and attending webinars or conferences. When scientific results are not accessible, people with disabilities — researchers, policymakers, advocates, and others —are blocked from full access to information, limiting their research knowledge, participation, and inclusion.

 

Archives, Access and Artificial Intelligence bei Transcript Publishing

“Digital archives are transforming the Humanities and the Sciences. Digitized collections of newspapers and books have pushed scholars to develop new, data-rich methods. Born-digital archives are now better preserved and managed thanks to the development of open-access and commercial software. Digital Humanities have moved from the fringe to the center of academia. Yet, the path from the appraisal of records to their analysis is far from smooth. This book explores crossovers between various disciplines to improve the discoverability, accessibility, and use of born-digital archives and other cultural assets….

 

Introduction
Seiten 7 – 28

Chapter 1: Artificial Intelligence and Discovering the Digitized Photoarchive
Seiten 29 – 60

Chapter 2: Web Archives and the Problem of Access: Prototyping a Researcher Dashboard for the UK Government Web Archive
Seiten 61 – 82

Chapter 3: Design Thinking, UX and Born-digital Archives: Solving the Problem of Dark Archives Closed to Users
Seiten 83 – 108

Chapter 4: Towards Critically Addressable Data for Digital Library User Studies
Seiten 109 – 130

Chapter 5: Reviewing the Reviewers: Training Neural Networks to Read Peer Review Reports
Seiten 131 – 156

Chapter 6: Supervised and Unsupervised: Approaches to Machine Learning for Textual Entities
Seiten 157 – 178

Chapter 7: Inviting AI into the Archives: The Reception of Handwritten Recognition Technology into Historical Manuscript Transcription
Seiten 179 – 204

AFTERWORD: Towards a new Discipline of Computational Archival Science (CAS)
Seiten 205 – 218 …

[From the Introduction:]

The closure of libraries, archives and museums due to the COVID-19 pandemic has highlighted the urgent need to make archives and cultural heritage materials accessible in digital form. Yet too many born-digital and digitized collections remain closed to researchers and other users due to privacy concerns, copyright and other issues. Born-digital archives are rarely accessible to users. For example, the archival emails of the writer Will Self at the British Library are not listed on the Finding Aid describing the collection, and they are not available to users either onsite or offsite. At a time when emails have largely replaced letters, this severely limits the amount of content openly accessible in archival collections. Even when digital data is publicly available (as in the case of web archives), users often need to physically travel to repositories to consult web pages. In the case of digitized collections, copyright can also be a major obstacle to access. For instance, copyrightprotected texts are not available for download from HathiTrust, a not-for-profit collaborative of academic and research libraries preserving 17+ million digitized items (including around 61% not in the public domain)….

It is important to recognize that “dark” archives contain vast amounts of data essential to scholars – including email corres

A Fork in the Road: OA Books and Visibility-Value in the Humanities · COPIM

“What we see emerging at this time, as a result, is a dual system in which all scientific research will be available to anyone to read, free of charge, while the most significant work in the humanities and social sciences will remain extremely expensive and less visible in the digital world.

This should be grave cause for concern. The humanities, in particular, face a perpetual crisis of value, in which these subjects are called to account for their existence and are asked to re-articulate their societal virtues. But the arguments grow thinner. How can the humanities parrot the oft-repeated liberal humanist line that they exist to produce an educated citizenry capable of participating critically in democracy, when most humanities work remains unreadable by most people?…

Learned societies in the humanities should be concerned (and they are). However, this concern should not be for the revenue streams that they feel are threatened by open access to journal subscriptions, but instead for the future of their disciplines in a world where they cannot justify themselves….

Q&A with Peter Kaufman: Open Access Publishing and Access to Knowledge

In today’s post, as a part of our series of open access success stories that spotlight noteworthy openly accessible books and their authors, we’re featuring Peter Kaufman of MIT Open Learning. Kaufman made his new book, The New Enlightenment and the Fight to Free Knowledge, available for free under a CC-BY license upon its publication by Seven Stories Press. In the book, Kaufman discusses “the powerful forces that have purposely crippled our efforts to share knowledge widely and freely.” By releasing his work under an open access license, Kaufman has pushed back on these forces while also ensuring that his work reaches a wide audience. You can find the open access edition of the book here.

Mellon Foundation awards ITHAKA $1.5 million to make JSTOR accessible to incarcerated college students – ITHAKA

“The Andrew W. Mellon Foundation has awarded ITHAKA a new $1.5 million grant to provide incarcerated college students with access to JSTOR, a digital library of journals, books, and other materials. Our aim is for every incarcerated college student in the United States to have access to JSTOR, along with the research skills to use this and other digital resources.

One of the most significant educational challenges that incarcerated college students face is easy, reliable access to high-quality library resources to support their learning. Prisons often do not provide internet access to individuals or offer only limited access to digital resources, sometimes at high cost. This challenge has only grown in the last 12 to 18 months as the COVID-19 pandemic ramped up the need for digital learning solutions and higher education became more accessible to incarcerated individuals through financial aid expansions, including Second Chance Pell….”

Briefing for library directors: Publishers and the textbook market in the higher education sector – publishers-and-the-textbook-market-in-he-library-directors-briefing.pdf

Yhis briefing paper created by the Jisc Learning Content Group provides an overview of the current e-textbook  licensing landscape within higher education institutions. It outlines current practices and their impact on the library and suggests ways in which the sector can exert influence on publishers to change their pricing and access models

Digital sequence information: free access is crucial | Leibniz Institut DSMZ

Global problems such as the extinction of species and the decline of biological diversity, climate change, pandemics and hunger can only be solved with free access to digital sequence information”, states Prof. Jörg Overmann PhD, Scientific Director of the Leibniz Institute DSMZ-German Collection of Microorganisms and Cell Cultures. “Without free access to digital sequence information [DSI], research on a national, European or international level will simply fail to work. Digital sequence information must be preserved as common good”, stresses Prof Overmann. 

Frontiers | The Ethic of Access: An AIDS Activist Won Public Access to Experimental Therapies, and This Must Now Extend to Psychedelics for Mental Illness | Psychiatry

“If patients with mental illnesses are to be treated fairly in comparison with other categories of patients, they must be given access to promising experimental therapies, including psychedelics. The right of early access to promising therapies was advanced as an ethical principle by activist Larry Kramer during the AIDS pandemic, and has now largely been adopted by the medical establishment. Patients are regularly granted access to experimental drugs for many illness categories, such as cancer and infectious diseases. The need for expanded access is especially relevant during evolving crises like the AIDS and the coronavirus pandemics. In contrast to non-psychiatric branches of medicine, psychiatry has failed to expedite access to promising drugs in the face of public health emergencies, psychological crises, the wishes of many patients, and the needs of the community. Psychiatry must catch up to the rest of medicine and allow the preferences of patients for access to guide policy and law regarding unapproved medications like psychedelics….

Open questions include how to amplify the voices of patients regarding experimental therapies like psychedelics, how to implement early access, how to educate the public about this option once it exists, and how to ensure equitable access for multiple marginalized groups. A model of political engagement like ACT UP may not work for patients whose symptoms include lack of motivation and will, and who are at risk for re-traumatization. The authors are exploring an entirely patient-led counterpart to traditional academic peer review, which allows diverse patient communities to provide meaningful input into therapies that result from trials….”

 

COAR releases resource types vocabulary version 3.0 for repositories with new look and feel – COAR

“We are pleased to announce the release of version 3.0 of the resource types vocabulary. Since 2015, three COAR Controlled Vocabularies have been developed and are maintained by the Controlled Vocabulary Editorial Board: Resource types, access rights and version types.  These vocabularies have a new look and are now being managed using the iQvoc platform, hosted by the University of Vienna Library.

Using controlled vocabularies enables repositories to be consistent in describing their resources, helps with search and discovery of content, and allows machine readability for interoperability. The COAR vocabularies are available in several languages, supporting multilingualism across repositories. They also play a key role in making semantic artifacts and repositories compliant with the FAIR Principles, in particular when it comes to findability and interoperability….”