Textbooks play an important role in defining fields of research and summarising key academic ideas for a wider audience. But how do you do this for an open access audience that is potentially unlimited? We talked to Filipe Campante, Federico Sturzenegger and Andrés Velasco¸ authors of the recently published LSE Press book Advanced Macroeconomics: An Easy Guide, about how the field has changed in recent times, what makes their approach to macro-economics distinctive, and what rationales and ambitions lie behind producing an open access textbook.
“In his Medium article “Scholarly publishing is stuck in 1999,”
Springer Nature product manager Stephen Cornelius reproaches the outdated publishing practices many academic journals are using to produce online content. He notes that, despite decades of technological advancement, “research publishing seems stuck with those that were employed when it first went online.” Cornelius points to many areas of digital journal publishing that have been designed to mirror print publishing, such as journals formatting online articles as print-based PDFs, despite there being better ways to produce and present content online….
PDFs are rife with limitations as compared to HTML because, unlike HTML, PDFs:
Cannot support embedded multi-media research files such as videos
Have a poor layout for online reading, generally using columns that require readers to scroll up and down to read content on the same page
Are nearly impossible to read on mobile devices because PDFs are a static page (whereas HTML can be made to have a responsive design)
Do not easily allow for clickable references within the text
Are overwhelmingly not search-optimized for online browsers…
A recent article in The Atlantic titled “The Scientific Paper Is Obsolete“ explores the limitations of PDFs and the need for journals, particularly in STEM fields, to adopt internet-based publishing formats in order to support more dynamic presentations of research as well as to make it easier for readers to find articles online….”
Abstract: The COVID-19 pandemic catalyzed the rapid dissemination of papers and preprints investigating the disease and its associated virus, SARS-CoV-2. The multifaceted nature of COVID-19 demands a multidisciplinary approach, but the urgency of the crisis combined with the need for social distancing measures present unique challenges to collaborative science. We applied a massive online open publishing approach to this problem using Manubot. Through GitHub, collaborators summarized and critiqued COVID-19 literature, creating a review manuscript. Manubot automatically compiled citation information for referenced preprints, journal publications, websites, and clinical trials. Continuous integration workflows retrieved up-to-date data from online sources nightly, regenerating some of the manuscript’s figures and statistics. Manubot rendered the manuscript into PDF, HTML, LaTeX, and DOCX outputs, immediately updating the version available online upon the integration of new content. Through this effort, we organized over 50 scientists from a range of backgrounds who evaluated over 1,500 sources and developed seven literature reviews. While many efforts from the computational community have focused on mining COVID-19 literature, our project illustrates the power of open publishing to organize both technical and non-technical scientists to aggregate and disseminate information in response to an evolving crisis.
Abstract: Many in the scientific community, particularly in publicly funded research, are pushing to adhere to more accessible data standards to maximize the findability, accessibility, interoperability, and reusability (FAIR) of scientific data, especially with the growing prevalence of machine learning augmented research. Online FAIR data repositories, such as the Open Science Framework (OSF), help facilitate the adoption of these standards by providing frameworks for storage, access, search, APIs, and other features that create organized hubs of scientific data. However, the wider acceptance of such repositories is hindered by the lack of support of hierarchical data formats, such as Technical Data Management Streaming (TDMS) and Hierarchical Data Format 5 (HDF5), that many researchers rely on to organize their datasets. Various tools and strategies should be used to allow hierarchical data formats, FAIR data repositories, and scientific organizations to work more seamlessly together. A pilot project at Los Alamos National Laboratory (LANL) addresses the disconnect between them by integrating the OSF FAIR data repository with hierarchical data renderers, extending support for additional file types in their framework. The multifaceted interactive renderer displays a tree of metadata alongside a table and plot of the data channels in the file. This allows users to quickly and efficiently load large and complex data files directly in the OSF webapp. Users who are browsing files can quickly and intuitively see the files in the way they or their colleagues structured the hierarchical form and immediately grasp their contents. This solution helps bridge the gap between hierarchical data storage techniques and FAIR data repositories, making both of them more viable options for scientific institutions like LANL which have been put off by the lack of integration between them.
“HighWire’s journals hosting solution JCORE can now transform XML to industry standard Journal Article Tag Suite (JATS) XML version 1.3, the latest version. This means that publishers using the platform can now seamlessly comply with the specifications of this interoperable standard.
The British Medical Journal (BMJ) is the first JCORE publisher to offer the JATS 1.3 download for all of their OA content which includes over 57,000 articles. This option may be attractive to BMJ and other publishers as the ability to download full text in a machine readable format is one of the strong recommendations made by Coalition S within the Plan S Principles and Implementation guidance for publishers. …”
“This response to the White House Office of Science and Technology Policy’s “Request for Information To Improve Federal Scientific Integrity Policies” is submitted on behalf of the Open Research Funders Group….
The Open Research Funders Group is supportive of the White House Office of Science and Technology Policy’s commitment to explore good practices Federal agencies can adopt to improve scientific integrity, promote transparency, prioritize evidence-based decision making, and promote equity. We believe that the promotion of and adherence to open science principles is a catalytic enabling strategy in support of these goals. Specifically, we recommend that the OSTP prioritize making as much of the research lifecycle openly available to access and reuse. This includes, but is not limited to, preregistrations, protocols, preprints, articles, data, code, and software. The rationale is simple. Research cannot be considered reliable unless it can be tested, replicated, and built upon. Making critical components of the research lifecycle unavailable hampers OSTP’s pursuit of scientific integrity at best, and renders it impossible at worst. Limiting access to research outputs has the further effect of rendering science opaque, which negatively impacts public trust in the research endeavor writ large….”
“The Open Research Funders Group (ORFG) is pleased to submit a formal response to the White House Office of Science and Technology Policy’s “Request for Information To Improve Federal Scientific Integrity Policies”. The comments, which may be found in their entirety here, encourage the federal government to prioritize making as much of the research lifecycle openly available to access and reuse. This includes, but is not limited to, preregistrations, protocols, preprints, articles, data, code, and software. The rationale is simple. Research cannot be considered reliable unless it can be tested, replicated, and built upon. Making critical components of the research lifecycle unavailable hampers OSTP’s pursuit of scientific integrity at best, and renders it impossible at worst. Limiting access to research outputs has the further effect of rendering science opaque, which negatively impacts public trust in the research endeavor writ large….”
Abstract: The PDF Data Extractor (PDE) R package is designed to perform comprehensive literature reviews for scientists at any stage in a user-friendly way. The PDE_analyzer_i() function permits the user to filter and search thousands of scientific articles using a simple user interface, requiring no bioinformatics skills. In the additional PDE_reader_i() interface, the user can then quickly browse the sentences with detected keywords, open the full-text article, when required, and convert tables conveniently from PDF files to Excel sheets (pdf2table). Specific features of the literature analysis include the adaptability of analysis parameters and the detection of abbreviations of search words in articles. In this article, we demonstrate and exemplify how the PDE package allows the user-friendly, efficient, and automated extraction of meta-data from full-text articles, which can aid in summarizing the existing literature on any topic of interest. As such, we recommend the use of the PDE package as the first step in conducting an extensive review of the scientific literature. The PDE package is available from the Comprehensive R Archive Network at https://CRAN.R-project.org/package=PDE.
Table of contents:
1. Project Gutenberg, a visionary project
2. The milestones of Project Gutenberg
3. PDF, a pioneer format created by Adobe
4. Gabriel, a portal for European national libraries
5. The British Library and its treasures
6. From PDAs to smartphones
7. The first e-readers
8. E Ink, an electronic ink technology
9. Online dictionaries and encyclopedias
10. Experiments by best-selling authors
11. From OeB to EPUB as a standard format
12. Wikipedia, an encyclopedia for the world
13. The Creative Commons licence
14. From Google Print to Google Books
15. The Internet Archive, a library for the world
16. eBooks seen by some pioneers
17. A tribute to librarians around the world
18. A timeline from 1971 until now
“IFLA has endorsed the WikiLibrary Manifesto, aimed at connecting libraries and Wikimedia projects such as Wikibase in order to promote the dissemination of knowledge in open formats, especially in linked open data networks….”
Abstract: Opening Up Scholarship in the Humanities: Digital Publishing, Knowledge Translation, and Public Engagement considers the concept of humanistic, open, social scholarship and argues for its value in the contemporary academy as both a set of socially oriented activities and an organizing framework for such activities. This endeavour spans the interrelated areas of knowledge creation, public engagement, and open access, and demonstrates the importance of considering this triad as critical for the pursuit of academic work moving forward—especially in the humanities. Under the umbrella of open social scholarship, I consider open access as a baseline for public engagement and argue for the vitalness of this sort of work. Moreover, I suggest that there is a strong connection between digital scholarship and social knowledge creation. I explore the knowledge translation lessons that other fields might have for the humanities and include a journalist–humanist case study to this end. I also argue for the value of producing research output in many different forms and formats. Finally, I propose that there are benefits to explicitly popularizing the humanities. In sum, this dissertation speculates on past, current, and future scholarly communication activities, and proposes that such activities might be opened up for wider engagement and, thus, social benefit.
“When SMRJ was started, the editors used email and Word docs to track peer review, and they published all articles in PDF format. However, with the journal continuing to expand, the editors realized they were in need of an easier way to track submissions and a new publishing system to improve the journal’s online reading experience and chances of being added to relevant indexes. As a result, Chief Editor William Corser and Assistant Editor Sam Wisniewski began searching for publishing tools and services, focused on three key areas: streamlining peer review, modernizing the journal’s website, and producing XML for all articles.
After considering different options, Corser and Wisniewski chose to use Scholastica’s peer review and open access publishing software, as well as Scholastica’s typesetting service to produce PDF, HTML, and XML article files. Since making the switch, they’ve found that peer review is smoother for editors and authors and they’re making strides towards reaching their article discovery and indexing goals….”
Abstract: During the previous Ebola and Zika outbreaks, researchers shared their data, allowing many published epidemiological studies to be produced only from open research data, to speed up investigations and control of these infections. This study aims to evaluate the dissemination of the COVID-19 research data underlying scientific publications. Analysis of COVID-19 publications from December 1, 2019, to April 30, 2020, was conducted through the PubMed Central repository to evaluate the research data available through its publication as supplementary material or deposited in repositories. The PubMed Central search generated 5,905 records, of which 804 papers included complementary research data, especially as supplementary material (77.4%). The most productive journals were The New England Journal of Medicine, The Lancet and The Lancet Infectious Diseases, the most frequent keyword was pneumonia, and the most used repositories were GitHub and GenBank. An expected growth in the number of published articles following the course of the pandemics is confirmed in this work, while the underlying research data are only 13.6%. It can be deduced that data sharing is not a common practice, even in health emergencies, such as the present one. High-impact generalist journals have accounted for a large share of global publishing. The topics most often covered are related to epidemiological and public health concepts, genetics, virology and respiratory diseases, such as pneumonia. However, it is essential to interpret these data with caution following the evolution of publications and their funding in the coming months.
From the body of the paper: “In global public health emergencies, it should be mandatory to disseminate any information that may be of value in fighting the crisis. For this to be done efficiently, there is a need to develop agreed global standards for sharing data and results for scientists, institutions and governments.”
“As more data is made openly accessible as a part of journal articles or federal funder requirements, the importance of data curation can not be over-emphasized. Data is not intrinsically useful. Furthermore, datasets do not simply become useful because they are publicly available. Data is useful only insofar as it meets the needs of the user. Likewise, more data does not mean more value (Binggeser, 2017). Data is of the highest value for those who collected it. Others who were not involved in the data collection and analysis efforts can find data less useful for their needs, especially if the data is not properly curated. Including as supplemental information a dataset that has not been properly prepared for public use reduces the usefulness of the data. Data must be cleaned and prepared properly for it to be useful. And this process does not happen by accident; it must be purposely conducted by someone trained in properly curating a dataset for public use (Johnston et al, 2018)….
What value does the curation process provide for data? The data curation steps formalized by the DCN in the C.U.R.A.T.E.D. acronym include the following: Check (the files for completeness and viability), Understand (the contents), Request (additional information), Augment (metadata), Transform (to open formats), Evaluate (for FAIRness), and Document (the curation process) (Johnston et al, 2018). …”
Abstract: Electronic theses and dissertations (ETDs) have traditionally taken the form of PDFs and ETD programs and their submission and curation procedures have been built around this format. However, graduate students are increasingly creating non-PDF files during their research, and in some cases these files are just as or more important than the PDFs that must be submitted to satisfy degree requirements. As a result, both graduate students and ETD administrators need training and resources to support the handling of a wide variety of complex digital objects. The Educopia Institute’s ETDplus Toolkit provides a highly usable set of modules to address this need, openly licensed to allow for reuse and adaption to a variety of potential use cases.