A framework for improving the accessibility of research papers on arXiv.org

Abstract:  The research content hosted by arXiv is not fully accessible to everyone due to disabilities and other barriers. This matters because a significant proportion of people have reading and visual disabilities, it is important to our community that arXiv is as open as possible, and if science is to advance, we need wide and diverse participation. In addition, we have mandates to become accessible, and accessible content benefits everyone. In this paper, we will describe the accessibility problems with research, review current mitigations (and explain why they aren’t sufficient), and share the results of our user research with scientists and accessibility experts. Finally, we will present arXiv’s proposed next step towards more open science: offering HTML alongside existing PDF and TeX formats. An accessible HTML version of this paper is also available at https://info.arxiv.org/about/accessibility_research_report.html 

Access is not the same as accessibility: A framework for making research papers truly open – arXiv.org blog

“arXiv has pioneered open access for more than 30 years by removing financial, institutional, and geographic barriers to research. No paywalls or fees, no login required for reading. This approach – which gives researchers maximum control over the release of their results and broad visibility – transformed the research process and launched the open access movement.

However, access is not the same as accessibility, which is the practice of ensuring access regardless of disability. The vast majority of research papers posted to any journal or platform do not meet basic accessibility standards.

In 2022, arXiv completed intensive user research with over 40 people to determine the extent of the problem, evaluate current mitigation efforts, and consider solutions. This work, informed by arXiv staff, accessibility experts, and arXiv readers and authors who use assistive technology, is posted on arXiv in PDF and HTML formats (arXivID: 2212.07286).

In extensive interviews, our research participants shared that finding research, reading it, preparing documents, and submitting work are all steps in the research process where people encounter barriers. In particular, interpreting math equations, figures, and charts is problematic.

Flexible content can help address these issues. Offering well-formatted HTML, alongside PDF and TeX source, will lead to critical accessibility gains. arXiv’s collaboration with ar5iv, which currently renders HTML for approximately 70% of arXiv papers, is a first step in this process. Next, we expect to reduce the error rate and add a link to HTML on arXiv abstract pages….”

When XML Marks the Spot: Machine-readable journal articles for discovery and preservation

“If you work with a campus-based journal program and you’re looking to expand the readership and reputation of the articles you publish, adding them to relevant archives and indexes (A&Is) presents a treasure trove of opportunities. A&Is serve as valuable content distribution networks, and inclusion in selective ones is a signal of research quality. You may have heard about XML, one of the primary machine-readable formats academic databases use to ingest content, and wonder if that’s something you need to reach your archiving and indexing goals.

This free webinar, co-hosted by Scholastica, UOregon Libraries, and the GWU Masters in Publishing program, will offer a crash course in the benefits of XML production and use cases, including:

What XML is and the different types required or preferred by academic indexes and archives (with an overview of JATS)
How producing metadata and/or full-text articles in XML can unlock discovery and archiving opportunities with examples
Additional benefits of XML for journal accessibility as well as publishing program and professional development
When XML is needed and when it may not be the best use of journal resources
Ways you can produce XML, including an overview of Scholastica’s production service…”

Open Inaccessibility

“When a PDF is downloaded, who can read it?

At the start of the year I discussed the social model of disability and inaccessibility in relation to open scholarship, but since then I have not done much more in a practical sense. Here’s the best explanation of the social model of disability I have seen…

Content inaccessibility came back on my radar again when I read a recent study about content accessibility improvements for arXiv. This paper calls content accessibility “the next frontier of open science.” As we see a simultaneous increase in user-generated content platforms for publishing, where there is less control over what and how things get published, I would agree and argue that accessibility will become a bigger topic quickly.

Some of my main takeaways and juxtapositions from this paper include:

There is clear content inaccessibility: only 30% of people using assistive technologies rate all research as accessible (vs. 59% of people not using assistive technologies).
HTML is preferred for accessibility, but non-disabled people prefer PDFs.
Biggest improvement areas for accessibility are (1) PDF formatting, (2) images (alt texts), (3) math accessibility (e.g., MathML for screenreaders), (4) making data in figures parseable by screen readers.
People who don’t use assistive technologies don’t know what is required of them to make accessible documents
PDF is often preferred because it is easy/easier to save to reference managers….”


“Quarto® is an open-source scientific and technical publishing system built on Pandoc

Create dynamic content with Python, R, Julia, and Observable.
Author documents as plain text markdown or Jupyter notebooks.
Publish high-quality articles, reports, presentations, websites, blogs, and books in HTML, PDF, MS Word, ePub, and more.
Author with scientific markdown, including equations, citations, crossrefs, figure panels, callouts, advanced layout, and more….”

Welcome to the Single Source Publishing Community | The Single Source Publishing Community (SSPC) is a network stakeholders from the Open Science community that are interested in Single Source Publishing (SSP) for scholarly purposes – developing open-source software and advocacy.

“The Single Source Publishing Community (SSPC) is a network of stakeholders from the Open Science community that are interested in Single Source Publishing (SSP) for scholarly purposes – developing open-source software and advocacy.”

The PDF is not enough: why science needs open formats – University Library

“In the project period from 2019 to 2021 , the project bundled modern publishing as part of the Hamburg Open Science (HOS) initiativeMany years of experience at the Hamburg University of Technology (TUHH) and the Hamburg State and University Library (SUB). The goal: The development of a socio-technical system for single source publishing, i.e. for generating different output formats from one source format. It was based on open source solutions such as GitLab and Open Journal Systems (OJS) to enable an open alternative approach to the publication of scientific results compared to commercial and proprietary publishing offers….

Former team members of the project have created the Single Source Publishing Community (SSPC)founded. This focuses on scientific writing and publishing with open tools and formats and is a meeting point for researchers, lecturers, publishers and developers. Under the motto “Collaborate more, compete less”, the active members of the community exchange ideas in their monthly meetingson current developments in their projects and discuss strategies for cultural change in the field of scientific publication….

Numerous open-source tools favor the desired sovereignty: software projects such as Open Journal Systems, Viviliostyle, Paged.js, Swapfire , FidusWriter, HedgeDoc, quartoand last but not least pandocare combined in different ways in the community projects to create alternative open systems.

Many projects use the Markdown format as a source, to generate complementary versions of PDF in the form of HTML, JATS/XMLand create EPUB. The latter offer the advantage that they retain the semantic labeling of the information they contain and thus open up a wide range of possible applications in automated text mining processes. At the same time, the usability and reach of published scientific findings increases….”

Discover, Create, and Publish your research paper | SciSpace by Typeset

“Our struggle with Word and LaTeX in formatting journal submissions and academic assignments led us to build SciSpace. We realised that no one had designed a platform that was dedicated to meet the needs of people like you, who generate billions of pieces of academic work each year. We found that Word and Google Docs are unstructured and need constant re-editing and re-formatting, while LaTeX is too hard for most researchers. SciSpace intends to be the perfect bridge – ease of intuitive writing and collaboration, with the rigor and power of LaTeX.

We have been working at it since 2014 and have been in beta for over a year. During this period we’ve collected feedback from thousands of you, and we are grateful to our early users. It helped us identify pain points and build industry-leading features on SciSpace. What you see today, is the work of thousands of man-hours that have created self-learning journal and thesis builders, that make sure you have a 100% compliant submission with zero errors.

We are committed to Open Standards as well as keeping our platform open, and you can export every letter you write on SciSpace without any ado, if we fail to live up to your expectations. Till date, we’ve created journal builders for over 14,000 journals and scores of assignment, and thesis templates.

We are adding to our library by the hundreds every week, and every dollar that you spend on SciSpace is invested in building out more features that will help you save time, get accuracy and enjoy the process of writing research.

Go ahead, give our baby a test drive and let us know what you feel on feedback@typeset.io

And yes, if you like our work, please do consider joining the growing SciSpace Community and spread the word.”

What Does EPUB 3.3 Mean For Accessibility? – Inclusive Publishing

“The publishing community eagerly awaits the new version of the EPUB standard, EPUB 3.3, the related EPUB 1.1 accessibility specification and the updated version of EPUBCheck. We asked EPUB 3.3 editor and DAISY developer Matt Garrish; “What does this mean for accessible publishing?’

Can We Expect Major Changes For Accessibility?

Neither the EPUB 3.3 nor the Accessibility 1.1 revisions represent major changes. Most of our efforts are focused on taking the work we’ve already done and moving the documents through the W3C process to make formal recommended specifications of them (i.e., to be fully recognized by W3C membership). EPUB 3.2 was published by the W3C publishing community group, so those documents did not have any formal standing (they didn’t have to go through W3C membership votes, they didn’t have to show independent implementations, etc.). So, EPUB 3.3 will formalize the standard….”

Recommendations on the Transformation of Academic Publishing: Towards Open Access

“Three central arguments support this transformation: 1 ? Openly accessible publications can be read, reviewed and used more quickly and more widely by other researchers. This increases the quality of research and accelerates scientific progress. 2 ? OA makes scientific knowledge more widely available outside of the scientific community and lowers the threshold for various transfer activities. This increases the social effectiveness of (publicly funded) research. 3 ? Up to now, the business model of publishers has been based on rights of use. As they will no longer be granted exclusive rights under OA, publishers will become publication service providers and will compete with other providers. This may strengthen the negotiating position of scientific institutions vis-à-vis such service providers and improve the innovative capacity, cost transparency and cost efficiency of the publication system.

As far as the Council is concerned, the goal of the transformation is for academic publications to be made freely available immediately, permanently, at the original publication venue and in the citable, peer-reviewed and typeset version of record under an open licence (CC BY). This so-called gold route to OA (gold OA) is compatible with various business models…. 

For orientation in this market, the Council recommends that the Alliance of Science Organisations in Germany agree on common requirements for quality assurance of content (especially in terms of peer review processes) as well as for high-quality publication services. In the medium term, academic publications should not only be openly accessible, but also machine-readable through open, structured formats and semantic annotations….

“Gold OA” should not be equated with funding via article processing charges (APC)….

As the WR sees it, all third-party funders are obliged to fully finance the publication costs arising from publishing the results of the research they are funding….”


New Leaves: Riffling the History of Digital Pagination

Abstract:  This article presents a new history of digital pagination. Virtual pagination works very differently from its print correlate. Despite this, encapsulated and paginated formats have gained a solid digital foothold. Nonetheless, many commentators have argued that we must overcome such a reliance on and continuity with print in the digital space. This article charts a fresh history of the development of digital pagination through a revisionist interrogation of three interrelated phenomena: 1. That digital pages do not behave as do their physical correlates but instead mimic earlier historical forms of print that fused pagination, scrolling, and the tablet form. 2. That the development of PDF was almost abandoned by Adobe’s board of directors, who could see no audience for it. 3. That there are other more robust lineages of constraint for digital pages from cinema and television. Drawing on new correspondence with the creators of the PDF format I argue from these historical tracings that nothing was sure about the development of textual pagination in the digital space. Further, the digital page almost never came to the prominence and dominance now presumed in discussions of digital reading.

ar5iv – Articles from arXiv.org as responsive HTML5 web documents

Converted from TeX with LaTeXML.
Sources upto the end of 2021. Not a live preview service.
For articles with multiple revisions, only the initial v1 is made available.
Goal: incremental improvement until worthy of native arXiv adoption.

Sample: A Simple Proof of the Quadratic Formula (1910.06709)

View any arXiv article URL by changing the X to a 5


Harmon | ETDplus Toolkit [Tool Review] | Journal of Librarianship and Scholarly Communication

Abstract:  Electronic theses and dissertations (ETDs) have traditionally taken the form of PDFs and ETD programs and their submission and curation procedures have been built around this format. However, graduate students are increasingly creating non-PDF files during their research, and in some cases these files are just as or more important than the PDFs that must be submitted to satisfy degree requirements. As a result, both graduate students and ETD administrators need training and resources to support the handling of a wide variety of complex digital objects. The Educopia Institute’s ETDplus Toolkit provides a highly usable set of modules to address this need, openly licensed to allow for reuse and adaption to a variety of potential use cases.


The Other Diversity in Scholarly Publishing – The Scholarly Kitchen

“If we explore other aspects of scholarly publishing — publication format, workflow, data sharing mechanisms, copyright, or licensing — we will find diverse options in practice. We may explain such diversity as a manifestation of the vibrant innovation culture of this industry driven by the needs from its stakeholders. To understand what value such diversity brings about, let’s compare it with the biodiversity we see around us….”