PDF Data Extractor (PDE) – A Free Web Application and R Package Allowing the Extraction of Tables from Portable Document Format (PDF) Files and High-Throughput Keyword Searches of Full-Text Articles | bioRxiv

Abstract:  The PDF Data Extractor (PDE) R package is designed to perform comprehensive literature reviews for scientists at any stage in a user-friendly way. The PDE_analyzer_i() function permits the user to filter and search thousands of scientific articles using a simple user interface, requiring no bioinformatics skills. In the additional PDE_reader_i() interface, the user can then quickly browse the sentences with detected keywords, open the full-text article, when required, and convert tables conveniently from PDF files to Excel sheets (pdf2table). Specific features of the literature analysis include the adaptability of analysis parameters and the detection of abbreviations of search words in articles. In this article, we demonstrate and exemplify how the PDE package allows the user-friendly, efficient, and automated extraction of meta-data from full-text articles, which can aid in summarizing the existing literature on any topic of interest. As such, we recommend the use of the PDE package as the first step in conducting an extensive review of the scientific literature. The PDE package is available from the Comprehensive R Archive Network at https://CRAN.R-project.org/package=PDE.

 

Lebert, Marie, A short history of ebooks

Table of contents:

1. Project Gutenberg, a visionary project

2. The milestones of Project Gutenberg

3. PDF, a pioneer format created by Adobe

4. Gabriel, a portal for European national libraries

5. The British Library and its treasures

6. From PDAs to smartphones

7. The first e-readers

8. E Ink, an electronic ink technology

9. Online dictionaries and encyclopedias

10. Experiments by best-selling authors

11. From OeB to EPUB as a standard format

12. Wikipedia, an encyclopedia for the world

13. The Creative Commons licence

14. From Google Print to Google Books

15. The Internet Archive, a library for the world

16. eBooks seen by some pioneers

17. A tribute to librarians around the world

18. A timeline from 1971 until now

IFLA signs the WikiLibrary Manifesto

“IFLA has endorsed the WikiLibrary Manifesto, aimed at connecting libraries and Wikimedia projects such as Wikibase in order to promote the dissemination of knowledge in open formats, especially in linked open data networks….”

Opening Up Scholarship in the Humanities: Digital Publishing, Knowledge Translation, and Public Engagement

Abstract:  Opening Up Scholarship in the Humanities: Digital Publishing, Knowledge Translation, and Public Engagement considers the concept of humanistic, open, social scholarship and argues for its value in the contemporary academy as both a set of socially oriented activities and an organizing framework for such activities. This endeavour spans the interrelated areas of knowledge creation, public engagement, and open access, and demonstrates the importance of considering this triad as critical for the pursuit of academic work moving forward—especially in the humanities. Under the umbrella of open social scholarship, I consider open access as a baseline for public engagement and argue for the vitalness of this sort of work. Moreover, I suggest that there is a strong connection between digital scholarship and social knowledge creation. I explore the knowledge translation lessons that other fields might have for the humanities and include a journalist–humanist case study to this end. I also argue for the value of producing research output in many different forms and formats. Finally, I propose that there are benefits to explicitly popularizing the humanities. In sum, this dissertation speculates on past, current, and future scholarly communication activities, and proposes that such activities might be opened up for wider engagement and, thus, social benefit.

Streamlined peer review and PubMed-ready XML: How Spartan Medical Research Journal is using Scholastica to grow

“When SMRJ was started, the editors used email and Word docs to track peer review, and they published all articles in PDF format. However, with the journal continuing to expand, the editors realized they were in need of an easier way to track submissions and a new publishing system to improve the journal’s online reading experience and chances of being added to relevant indexes. As a result, Chief Editor William Corser and Assistant Editor Sam Wisniewski began searching for publishing tools and services, focused on three key areas: streamlining peer review, modernizing the journal’s website, and producing XML for all articles.

After considering different options, Corser and Wisniewski chose to use Scholastica’s peer review and open access publishing software, as well as Scholastica’s typesetting service to produce PDF, HTML, and XML article files. Since making the switch, they’ve found that peer review is smoother for editors and authors and they’re making strides towards reaching their article discovery and indexing goals….”

The sharing of research data facing the COVID-19 pandemic | SpringerLink

Abstract:  During the previous Ebola and Zika outbreaks, researchers shared their data, allowing many published epidemiological studies to be produced only from open research data, to speed up investigations and control of these infections. This study aims to evaluate the dissemination of the COVID-19 research data underlying scientific publications. Analysis of COVID-19 publications from December 1, 2019, to April 30, 2020, was conducted through the PubMed Central repository to evaluate the research data available through its publication as supplementary material or deposited in repositories. The PubMed Central search generated 5,905 records, of which 804 papers included complementary research data, especially as supplementary material (77.4%). The most productive journals were The New England Journal of Medicine, The Lancet and The Lancet Infectious Diseases, the most frequent keyword was pneumonia, and the most used repositories were GitHub and GenBank. An expected growth in the number of published articles following the course of the pandemics is confirmed in this work, while the underlying research data are only 13.6%. It can be deduced that data sharing is not a common practice, even in health emergencies, such as the present one. High-impact generalist journals have accounted for a large share of global publishing. The topics most often covered are related to epidemiological and public health concepts, genetics, virology and respiratory diseases, such as pneumonia. However, it is essential to interpret these data with caution following the evolution of publications and their funding in the coming months.

From the body of the paper: “In global public health emergencies, it should be mandatory to disseminate any information that may be of value in fighting the crisis. For this to be done efficiently, there is a need to develop agreed global standards for sharing data and results for scientists, institutions and governments.”

ETDplus Toolkit [Tool Review]

Abstract:  Electronic theses and dissertations (ETDs) have traditionally taken the form of PDFs and ETD programs and their submission and curation procedures have been built around this format. However, graduate students are increasingly creating non-PDF files during their research, and in some cases these files are just as or more important than the PDFs that must be submitted to satisfy degree requirements. As a result, both graduate students and ETD administrators need training and resources to support the handling of a wide variety of complex digital objects. The Educopia Institute’s ETDplus Toolkit provides a highly usable set of modules to address this need, openly licensed to allow for reuse and adaption to a variety of potential use cases.

 

HighWire at 25: Richard Sever (bioRxiv) looks back – Highwire Press

“10 years later I ended up working at Cold Spring Harbor myself, and continuing my relationship with HighWire from a new perspective. The arXiv preprint server for physics had launched in 1991, and my colleague John Inglis and I had often talked about whether we could do something similar for biology. I remember saying we could put together some of HighWire’s existing components, adapt them in certain ways and build something that would function as a really effective preprint server—and that’s what we did, launching bioRxiv in 2013. It was great then to be able to take that experiment to HighWire meetings to report back on. Initially there was quite a bit of skepticism from the community, who thought there were cultural barriers that meant preprints wouldn’t work well for biology, but 7 years and almost 100,000 papers later it’s still there, and still being served very well by HighWire.

When we launched bioRxiv we made it very explicit that we would not take clinical work, or anything involving patients. But the exponential growth of submissions to bioRxiv demonstrated that there was a demand and a desire for this amongst the biomedical community, and people were beginning to suggest that a similar model be trialed for medicine. A tipping point for me was an OpEd in the New York Times (Don’t Delay News of Medical Breakthroughs, 2015) by Eric Topol (Scripps Research) and Harlan Krumholz (Yale University), who would go on to become a co-founder of medRxiv….”

The broken promise that undermines human genome research

“Data sharing was a core principle that led to the success of the Human Genome Project 20 years ago. Now scientists are struggling to keep information free….

So in 1996, the HGP [Human Genome Project] researchers got together to lay out what became known as the Bermuda Principles, with all parties agreeing to make the human genome sequences available in public databases, ideally within 24 hours — no delays, no exceptions.

 

Fast-forward two decades, and the field is bursting with genomic data, thanks to improved technology both for sequencing whole genomes and for genotyping them by sequencing a few million select spots to quickly capture the variation within. These efforts have produced genetic readouts for tens of millions of individuals, and they sit in data repositories around the globe. The principles laid out during the HGP, and later adopted by journals and funding agencies, meant that anyone should be able to access the data created for published genome studies and use them to power new discoveries….

The explosion of data led governments, funding agencies, research institutes and private research consortia to develop their own custom-built databases for handling the complex and sometimes sensitive data sets. And the patchwork of repositories, with various rules for access and no standard data formatting, has led to a “Tower of Babel” situation, says Haussler….”

UNESCO launches new publication on accessible documentary heritage

“Marking the International Day of Persons with Disabilities on 3 December 2020, UNESCO has released a new publication aiming at assisting stakeholders in the preparation of documentary heritage in accessible formats for persons with disabilities.

 

The publication, Accessible Documentary Heritage, offers a set of guidelines for parties involved in the digitization of heritage documents, including librarians, archivists, museums workers, curators, and other stakeholders in carefully planning digital platforms and contents with a view to incorporating disability and accessibility aspects….”

Containers, genres, and formats, oh my: Creating sustainable concepts by connecting theory, research, practice, and education – Brannon – 2020 – Proceedings of the Association for Information Science and Technology – Wiley Online Library

Abstract:  This interactive panel brings together researchers, practitioners, and educators to explore ways of connecting theory, research, practice, and LIS education around the issue of information format. Despite a growing awareness of the importance of information format to information seeking, discovery, use, and creation, LIS has no sound, theoretically?informed basis for describing or discussing elements of format, with researchers and practitioners alike relying on know?it?when?they?see?it understandings of format types. The Researching Students’ Information Choices project has attempted to address this issue by developing the concept of containers, one element of format, and locating it within a descriptive taxonomy of other format elements based on well?established theories from the field of Rhetorical Genre Studies. This panel will discuss how this concept was developed and implemented in a multi?institutional, IMLS?grant?funded research project and how panelists are currently deploying and planning to deploy this concept in their own practice. Closing the loop in this way creates sustainable concepts that build a stronger field overall.

 

DAISY Publishes White Paper on the Benefits of EPUB 3 – The DAISY Consortium

“The DAISY Consortium has published a white paper encouraging the use of Born Accessible EPUB 3 files for corporate, government and university publications and documents. This important piece of work recognizes the work of the publishing industry who have embraced EPUB 3  as their format of choice for ebooks and digital publishing and focuses on how this same approach should be used for all types of digital content, both online and offline….”

New business models for the open research agenda | Research Information

“The rise of preprints and the move towards universal open access are potential threats to traditional business models in scholarly publishing, writes Phil Gooch

Publishers have started responding to the latter with transformative agreements[1], but if authors can simply upload their research to a preprint server for immediate dissemination, comment and review, why submit to a traditional journal at all? Some journals are addressing this by offering authors frictionless submission direct from the preprint server. This tackles two problems at once: easing authors’ frustrations with existing journal submission systems[2], and providing a more direct route from the raw preprint to the richly linked, multiformat version of record that readers demand and accessibility standards require….

Dissemination of early-stage research as mobile-unfriendly PDF is arguably a technological step backwards. If preprints are here to stay, the reading experience needs to be improved. A number of vendors have developed native XML or LaTeX authoring environments which enable dissemination in richer formats….”