AI writes summaries of preprints in bioRxiv trial

“The bioRxiv pilot is part of a broader trend of using LLMs to help researchers — and the public — keep afloat in the tsunami of scientific literature. The physics-focused preprint server arXiv uses AI to generate audio summaries of some papers, and publishers and funders are starting to roll out features that allow users to ‘talk to a paper’ through a chatbot….

Before rolling out the service, Sever and his colleagues evaluated several dozen summaries produced by the tool. Most were pretty good, he says, and some were even better than the abstracts scientists had written. Others included clear falsehoods. “We know there are going to be errors in these things,” Sever says….

If the pilot becomes a fully fledged service, bioRxiv might look at routinely involving authors in proofreading and approving the content, Sever says. For now, to minimize the consequences of errors, the pilot is not being rolled out to medRxiv, a sister medical-research preprint server run by Cold Spring Harbor Laboratory Press, the London-based publisher BMJ and Yale University in New Haven, Connecticut. MedRxiv studies typically have clinical relevance, and errors could guide patient behaviour. By limiting the pilot to bioRxiv, says Sever, “the consequences of being wrong are more that somebody might feel misled or misunderstand a fairly arcane study in cell biology”. …”

Following Preprints on medRxiv

“One of medRxiv’s goals is to alert readers when new preprints that might interest them are posted. Signing up for personalized alerts on the medRxiv Alerts/RSS page (see figure below) allows you to get notifications when new preprints that interest you are posted.

 

But an individual preprint can be revised, commented on, and peer-reviewed, and a version can eventually be published in a journal. To help you keep track, medRxiv is adding a new feature called Follow a Preprint which notifies you when any of these events occur….”

bioRxiv and medRxiv response to the OSTP memo – an open letter to US funding agencies

“Agencies can enable free public access to research results simply by mandating that reports of federally funded research are made available as “preprints” on servers such as arXiv, bioRxiv, medRxiv, and chemRxiv, before being submitted for journal publication. This will ensure that the findings are freely accessible to anyone anywhere in the world. An important additional benefit is the immediate availability of the information, avoiding the long delays associated with evaluation by traditional scientific journals (typically around one year). Scientific inquiry then progresses faster, as has been particularly evident for COVID research during the pandemic.

Prior access mandates in the US and elsewhere have focused on articles published by academic journals. This complicated the issue by making it a question of how to adapt journal revenue streams and led to the emergence of new models based on article-processing charges (APCs). But APCs simply move the access barrier to authors: they are a significant financial obstacle for researchers in fields and communities that lack the funding to pay them. A preprint mandate would achieve universal access for both authors and readers upstream, ensuring the focus remains on providing access to research findings, rather than on how they are selected and filtered.

Mandating public access to preprints rather than articles in academic journals would also future-proof agencies’ access policies. The distinction between peer-reviewed and non-peer-reviewed material is blurring as new approaches make peer review an ongoing process rather than a judgment made at a single point in time. Peer review can be conducted independently of journals through initiatives like Review Commons. And traditional journal-based peer review is changing: for example, eLife, supported by several large funders, peer reviews submitted papers but no longer distinguishes accepted from rejected articles. The author’s “accepted” manuscript that is the focus of so-called Green Open Access policies may therefore no longer exist. Because of such ongoing change, mandating the free availability of preprints would be a straightforward and strategically astute policy for US funding agencies.

A preprint mandate would underscore the fundamental, often overlooked, point that it is the results of research to which the public should have access. The evaluation of that research by journals is part of an ongoing process of assessment that can take place after the results have been made openly available. Preprint mandates from the funders of research would also widen the possibilities for evolution within the system and avoid channeling it towards expensive APC-based publishing models. Furthermore, since articles on preprint servers can be accompanied by supplementary data deposits on the servers themselves or linked to data deposited elsewhere, preprint mandates would also provide mechanisms to accomplish the other important OSTP goal: availability of research data.”

preLights talks to Richard Sever – preLights

“Richard Sever is Assistant Director of Cold Spring Harbor Laboratory Press (CSHL Press) and co-founder of bioRxiv and medRxiv. Prior to moving to CSHL Press in 2008, he worked as an editor for several journals including Current Opinion in Cell Biology, Trends in Biochemical Sciences, and Journal of Cell Science. Here, we discuss Richard’s transition into the academic publishing industry, the journey that led him to co-found the preprint servers bioRxiv and medRxiv with John Inglis, and his take on preprint peer review and the value it can hold for early-career researchers….”

Comparison of Clinical Study Results Reported in medRxiv Preprints vs Peer-reviewed Journal Articles | Medical Journals and Publishing | JAMA Network Open | JAMA Network

Abstract:  Importance  Preprints have been widely adopted to enhance the timely dissemination of research across many scientific fields. Concerns remain that early, public access to preliminary medical research has the potential to propagate misleading or faulty research that has been conducted or interpreted in error.

Objective  To evaluate the concordance among study characteristics, results, and interpretations described in preprints of clinical studies posted to medRxiv that are subsequently published in peer-reviewed journals (preprint-journal article pairs).

Design, Setting, and Participants  This cross-sectional study assessed all preprints describing clinical studies that were initially posted to medRxiv in September 2020 and subsequently published in a peer-reviewed journal as of September 15, 2022.

Main Outcomes and Measures  For preprint-journal article pairs describing clinical trials, observational studies, and meta-analyses that measured health-related outcomes, the sample size, primary end points, corresponding results, and overarching conclusions were abstracted and compared. Sample size and results from primary end points were considered concordant if they had exact numerical equivalence.

Results  Among 1399 preprints first posted on medRxiv in September 2020, a total of 1077 (77.0%) had been published as of September 15, 2022, a median of 6 months (IQR, 3-8 months) after preprint posting. Of the 547 preprint-journal article pairs describing clinical trials, observational studies, or meta-analyses, 293 (53.6%) were related to COVID-19. Of the 535 pairs reporting sample sizes in both sources, 462 (86.4%) were concordant; 43 (58.9%) of the 73 pairs with discordant sample sizes had larger samples in the journal publication. There were 534 pairs (97.6%) with concordant and 13 pairs (2.4%) with discordant primary end points. Of the 535 pairs with numerical results for the primary end points, 434 (81.1%) had concordant primary end point results; 66 of the 101 discordant pairs (65.3%) had effect estimates that were in the same direction and were statistically consistent. Overall, 526 pairs (96.2%) had concordant study interpretations, including 82 of the 101 pairs (81.2%) with discordant primary end point results.

Conclusions and Relevance  Most clinical studies posted as preprints on medRxiv and subsequently published in peer-reviewed journals had concordant study characteristics, results, and final interpretations. With more than three-fourths of preprints published in journals within 24 months, these results may suggest that many preprints report findings that are consistent with the final peer-reviewed publications.

Evaluation of Publication of COVID-19–Related Articles Initially Presented as Preprints

“Since the launch of the medRxiv preprint server in 2019, the dissemination of research as preprints has grown rapidly, largely facilitated by the COVID-19 pandemic.1 Notwithstanding, this unprecedented increase in preprints has been subject to criticism, mainly because of reliability concerns owing to their lack of peer review. In 2020, Abdill et al2 reported that 62.6% of bioRxiv preprints were later published in scientific journals, considering a time frame of at least 1 year. However, other studies3,4 have highlighted the low percentage of medRxiv preprints subsequently published in journals, with publication rates of 14.0% after 0 to 12 months3 and 10.6% after 6 to 19 months.4 In an analysis of COVID-19–related preprints posted on 3 servers, Añazco et al5 observed that 5.7% were published in a journal 3 to 8 months after their preprint posting. To our knowledge, no recent studies have analyzed whether journal publication rates of medRxiv preprints have changed. Therefore, we conducted this study to evaluate the subsequent journal publication of COVID-19–related preprint articles posted on medRxiv in 2020.”

Reliability of citations of medRxiv preprints in articles published on COVID-19 in the world leading medical journals | PLOS ONE

Abstract:  Introduction

Preprints have been widely cited during the COVID-19 pandemics, even in the major medical journals. However, since subsequent publication of preprint is not always mentioned in preprint repositories, some may be inappropriately cited or quoted. Our objectives were to assess the reliability of preprint citations in articles on COVID-19, to the rate of publication of preprints cited in these articles and to compare, if relevant, the content of the preprints to their published version.

Methods

Articles published on COVID in 2020 in the BMJ, The Lancet, the JAMA and the NEJM were manually screened to identify all articles citing at least one preprint from medRxiv. We searched PubMed, Google and Google Scholar to assess if the preprint had been published in a peer-reviewed journal, and when. Published articles were screened to assess if the title, data or conclusions were identical to the preprint version.

Results

Among the 205 research articles on COVID published by the four major medical journals in 2020, 60 (29.3%) cited at least one medRxiv preprint. Among the 182 preprints cited, 124 were published in a peer-reviewed journal, with 51 (41.1%) before the citing article was published online and 73 (58.9%) later. There were differences in the title, the data or the conclusion between the preprint cited and the published version for nearly half of them. MedRxiv did not mentioned the publication for 53 (42.7%) of preprints.

Conclusions

More than a quarter of preprints citations were inappropriate since preprints were in fact already published at the time of publication of the citing article, often with a different content. Authors and editors should check the accuracy of the citations and of the quotations of preprints before publishing manuscripts that cite them.

JMIRx Med: Preprint Overlay Journal Accepted in the Directory of Open Access Journals (DOAJ)

“JMIR Publications is pleased to announce that JMIRx Med has been accepted and indexed in the Directory of Open Access Journals (DOAJ). DOAJ applies strict criteria to review and index open access journals, which include licensing and copyright criteria, quality control processes, journal website technical and usability setups, and editorial evaluation. 

JMIRx Med (ISSN 2563-6316) is an innovative preprint overlay journal for medRxiv and JMIR Preprints. Conceived to address the urgent need to make highly relevant scientific information available as early as possible without losing the quality of the peer review process, JMIRx Med is the first in a new series of “superjournals” from JMIR Publications. Superjournals (a type of “overlay” journal) sit on top of preprint servers offering peer review and everything else a traditional scholarly journal does. …”

New policy: Review Commons makes preprint review fully transparent – ASAPbio

“In a major step toward promoting preprint peer review as a means of increasing transparency and efficiency in scientific publishing, Review Commons is updating its policy: as of 1 June 2022, peer reviews and the authors’ response will be posted by Review Commons to bioRxiv or medRxiv when authors transfer their refereed preprint to the first affiliate journal….”

Making Science More Open Is Good for Research—but Bad for Security

But a new paper in the journal PLoS Biology argues that, while the swell of the open science movement is on the whole a good thing, it isn’t without risks. 

 

Though the speed of open-access publishing means important research gets out more quickly, it also means the checks required to ensure that risky science isn’t being tossed online are less meticulous. In particular, the field of synthetic biology—which involves the engineering of new organisms or the reengineering of existing organisms to have new abilities—faces what is called a dual-use dilemma: that while quickly released research may be used for the good of society, it could also be co-opted by bad actors to conduct biowarfare or bioterrorism. It also could increase the potential for an accidental release of a dangerous pathogen if, for example, someone inexperienced were able to easily get their hands on a how-to guide for designing a virus. “There is a risk that bad things are going to be shared,” says James Smith, a coauthor on the paper and a researcher at the University of Oxford. “And there’s not really processes in place at the moment to address it.”

 

Open data and data sharing in articles about COVID-19 published in preprint servers medRxiv and bioRxiv

This study aimed to analyze the content of data availability statements (DAS) and the actual sharing of raw data in preprint articles about COVID-19. The study combined a bibliometric analysis and a cross-sectional survey. We analyzed preprint articles on COVID-19 published on medRxiv and bioRxiv from January 1, 2020 to March 30, 2020. We extracted data sharing statements, tried to locate raw data when authors indicated they were available, and surveyed authors. The authors were surveyed in 2020–2021. We surveyed authors whose articles did not include DAS, who indicated that data are available on request, or their manuscript reported that raw data are available in the manuscript, but raw data were not found. Raw data collected in this study are published on Open Science Framework (https://osf.io/6ztec/). We analyzed 897 preprint articles. There were 699 (78%) articles with Data/Code field present on the website of a preprint server. In 234 (26%) preprints, data/code sharing statement was reported within the manuscript. For 283 preprints that reported that data were accessible, we found raw data/code for 133 (47%) of those 283 preprints (15% of all analyzed preprint articles). Most commonly, authors indicated that data were available on GitHub or another clearly specified web location, on (reasonable) request, in the manuscript or its supplementary files. In conclusion, preprint servers should require authors to provide data sharing statements that will be included both on the website and in the manuscript. Education of researchers about the meaning of data sharing is needed.

New options for posting a medRxiv preprint at PLOS

Starting today, PLOS Global Public Health and PLOS Digital Health, will join the other PLOS journals publishing medical research in giving submitting authors the option to have their manuscript forwarded to medRxiv to be considered for posting as a preprint. In offering this free service, we aim to make preprint posting simple and easy, giving researchers more flexibility in how they choose to communicate their science. Researchers, of course, also remain free to post to medRxiv or another preprint server prior to submitting.

Examining Linguistic Shifts Between Preprints and Publications

Preprints allow researchers to make their findings available to the scientific community before they have undergone peer review. Studies on preprints within bioRxiv have been largely focused on article metadata and how often these preprints are downloaded, cited, published, and discussed online. A missing element that has yet to be examined is the language contained within the bioRxiv preprint repository. We sought to compare and contrast linguistic features within bioRxiv preprints to published biomedical text as a whole as this is an excellent opportunity to examine how peer review changes these documents. The most prevalent features that changed appear to be associated with typesetting and mentions of supporting information sections or additional files. In addition to text comparison, we created document embeddings derived from a preprint-trained word2vec model. We found that these embeddings are able to parse out different scientific approaches and concepts, link unannotated preprint–peer-reviewed article pairs, and identify journals that publish linguistically similar papers to a given preprint. We also used these embeddings to examine factors associated with the time elapsed between the posting of a first preprint and the appearance of a peer-reviewed publication. We found that preprints with more versions posted and more textual changes took longer to publish. Lastly, we constructed a web application (https://greenelab.github.io/preprint-similarity-search/) that allows users to identify which journals and articles that are most linguistically similar to a bioRxiv or medRxiv preprint as well as observe where the preprint would be positioned within a published article landscape.