OpCitance: Citation contexts identified from the PubMed Central open access articles | Scientific Data

Abstract:  OpCitance contains all the sentences from 2 million PubMed Central open-access (PMCOA) articles, with 137 million inline citations annotated (i.e., the “citation contexts”). Parsing out the references and citation contexts from the PMCOA XML files was non-trivial due to the diversity of referencing style. Only 0.5% citation contexts remain unidentified due to technical or human issues, e.g., references unmentioned by the authors in the text or improper XML nesting, which is more common among older articles (pre-2000). PubMed IDs (PMIDs) linked to inline citations in the XML files compared to citations harvested using the NCBI E-Utilities differed for 70.96% of the articles. Using an in-house citation matcher, called Patci, 6.84% of the referenced PMIDs were supplemented and corrected. OpCitance includes fewer total number of articles than the Semantic Scholar Open Research Corpus, but OpCitance has 160 thousand unique articles, a higher inline citation identification rate, and a more accurate reference mapping to PMIDs. We hope that OpCitance will facilitate citation context studies in particular and benefit text-mining research more broadly.

 

 

PubMed GPT: a Domain-Specific Large Language Model for Biomedical Text

“Large language models (LLMs) offer amazing capabilities for general-purpose natural language generation, image generation, speech synthesis, and multi-modal combinations of these applications. But is there more we can do when we know they will be used in industry-specific situations?

Today we announce the results of a partnership between MosaicML and the Stanford Center for Research on Foundation Models (CRFM) that demonstrates the capabilities of industry-specific large language models—specifically for the field of biomedicine. Using the MosaicML Cloud platform, CRFM trained a 2.7B parameter GPT on biomedical data from PubMed that achieves state-of-the art results on medical question and answer text from the US Medical Licensing Exam (USMLE) — highlighting the promise of domain-specific language generation models in real-world applications. …”

 

Phase 1 of the NIH Preprint Pilot: Testing the viability of making preprints discoverable in PubMed Central and PubMed | bioRxiv

Abstract:  Introduction The National Library of Medicine (NLM) launched a pilot in June 2020 to 1) explore the feasibility and utility of adding preprints to PubMed Central (PMC) and making them discoverable in PubMed and 2) to support accelerated discoverability of NIH-supported research without compromising user trust in NLM’s widely used literature services.

Methods The first phase of the Pilot focused on archiving preprints reporting NIH-supported SARS-CoV-2 virus and COVID-19 research. To launch Phase 1, NLM identified eligible preprint servers and developed processes for identifying NIH-supported preprints within scope in these servers. Processes were also developed for the ingest and conversion of preprints in PMC and to send corresponding records to PubMed. User interfaces were modified for display of preprint records. NLM collected data on the preprints ingested and discovery of preprint records in PMC and PubMed and engaged users through focus groups and a survey to obtain direct feedback on the Pilot and perceptions of preprints.

Results Between June 2020 and June 2022, NLM added more than 3,300 preprint records to PMC and PubMed, which were viewed 4 million times and 3 million times, respectively. Nearly a quarter of preprints in the Pilot were not associated with a peer-reviewed published journal article. User feedback revealed that the inclusion of preprints did not have a notable impact on trust in PMC or PubMed.

Discussion NIH-supported preprints can be identified and added to PMC and PubMed without disrupting existing operations processes. Additionally, inclusion of preprints in PMC and PubMed accelerates discovery of NIH research without reducing trust in NLM literature services. Phase 1 of the Pilot provided a useful testbed for studying NIH investigator preprint posting practices, as well as knowledge gaps among user groups, during the COVID-19 public health emergency, an unusual time with heightened interest in immediate access to research results.

JMIRx Med first overlay journal accepted for PubMed and PubMed Central

“MIR Publications is proud to announce that our first-of-its-kind overlay journal, JMIRx Med, has been accepted for indexing in PubMed Central (PMC) and PubMed.

As the first overlay journal in PMC and PubMed, JMIRx Med becomes the standard-bearer of this important innovation in scholarly publishing. Editors of overlay journals select content already posted on preprint servers such as medRxiv and bioRxiv. They then select manuscripts that match the scope and quality parameters of their publications and offer authors a rapid peer review and possible publication of their preprints, coupled with all the traditional elements of a journal publication. JMIRx Med enters the ranks of PubMed-ranked scientific publications following the US National Library of Medicine’s (NLM’s) rigorous evaluation criteria. Papers published in JMIRx Med will be in PubMed by mid-summer 2022, after legacy files are prepared and deposited….”

Research data communication strategy at the time of pandemics: a retrospective analysis of the Italian experience | Monaldi Archives for Chest Disease

Abstract:  Coronavirus pandemic has radically changed the scientific world. During these difficult times, standard peer-review processes could be too long for the continuously evolving knowledge about this disease. We wanted to assess whether the use of other types of network could be a faster way to disseminate the knowledge about Coronavirus disease. We retrospectively analyzed the data flow among three distinct groups of networks during the first three months of the pandemic: PubMed, preprint repositories (biorXiv and arXiv) and social media in Italy (Facebook and Twitter). The results show a significant difference in the number of original research articles published by PubMed and preprint repositories. On social media, we observed an incredible number of physicians participating to the discussion, both on three distinct Italian-speaking Facebook groups and on Twitter. The standard scientific process of publishing articles (i.e., the peer-review process) remains the best way to get access to high-quality research. Nonetheless, this process may be too long during an emergency like a pandemic. The thoughtful use of other types of network, such as preprint repositories and social media, could be taken into consideration in order to improve the clinical management of COVID-19 patients.

 

Streamlined peer review and PubMed-ready XML: How Spartan Medical Research Journal is using Scholastica to grow

“When SMRJ was started, the editors used email and Word docs to track peer review, and they published all articles in PDF format. However, with the journal continuing to expand, the editors realized they were in need of an easier way to track submissions and a new publishing system to improve the journal’s online reading experience and chances of being added to relevant indexes. As a result, Chief Editor William Corser and Assistant Editor Sam Wisniewski began searching for publishing tools and services, focused on three key areas: streamlining peer review, modernizing the journal’s website, and producing XML for all articles.

After considering different options, Corser and Wisniewski chose to use Scholastica’s peer review and open access publishing software, as well as Scholastica’s typesetting service to produce PDF, HTML, and XML article files. Since making the switch, they’ve found that peer review is smoother for editors and authors and they’re making strides towards reaching their article discovery and indexing goals….”

NIH Preprint Pilot Update. NLM Technical Bulletin. 2021 Mar–Apr

“Ten months into the NIH Preprint Pilot, more than 2,100 preprints reporting NIH-supported research on COVID-19 are now discoverable in PubMed Central (PMC) and PubMed. Through early April 2021, these records have been viewed more than 1 million times in each of these databases (1.4 million in PMC; 1 million in PubMed). Of the preprints included in the pilot, ~60% are currently discoverable only as a preprint version, having not yet been linked to a published article. All articles are clearly identified as preprints. Preprints may be selected or excluded in searches by using the preprint filter.

The pilot launched in June 2020 with preprint records from medRxiv, bioRxiv, arXiv, ChemRxiv, Research Square, and SSRN. Phase 1 has focused on improving the discoverability of preprints relating to the ongoing public health emergency and accelerating dissemination of NIH-supported research on the SARS-CoV-2 virus and COVID-19. This narrowly scoped first phase has allowed the National Library of Medicine (NLM) to streamline curation and ingest workflows for NIH-supported preprints and refine the details of implementation with a set of articles for which there has been high demand for accelerated access and discovery. Since launching the pilot, NLM has made display of preprint records in PubMed search results more transparent. We have also automated checks for new preprint versions and preprint withdrawals, and reduced the steps required to report preprints as products of awards in My Bibliography….”

Broader reach in searching for adverse events articles – a case study with DOAJ and Crossref

“An efficient strategy for searching for adverse events in scientific literature should find as many relevant events as possible and maintain screening effort within reasonable levels.

 

Naturally, finding more adverse events is directly related to the question of where to search. Past studies suggest results do improve when searching multiple established proprietary global literature databases. We decided to investigate databases that favor open models of scholarly publications, now gaining traction in the academic world. Can they be a cost-effective way to more adverse events results from the literature?

 

In this post, we investigate the use of alternative scientific literature sources to complement searching for adverse events on a mainstream index (PubMed). In particular we explored:

The Directory of Open Access Journals (DOAJ) indexes academic literature with an open access license from publishers worldwide. It currently hosts over 5 million records.

Crossref: a community organization dedicated to supporting scholarly communication by generating metadata and providing services for content discoverability. The Crossref metadata spans over 120 million records, with a growing proportion being published as open abstracts….”

CORE and PubMed collaborate for further full text dissemination – Research

“CORE provides access to freely available full text papers which were previously  unavailable in PubMed to enhance the experience of PubMed users. This is delivered via the LinkOut service. 

PubMed is maintained by the US National Library of Medicine at the National Institutes of Health. It constitutes the largest citations database in health sciences and is one of the most widely used scholarly infrastructure services with millions of monthly active users.

We are happy to announce that hundreds of thousands of relevant articles hosted in CORE are now linked from PubMed, taking  more  available content directly to the researchers.   

Currently, many PubMed records offer metadata information and the full text may not be available. This development now enables PubMed users to access full text links to articles hosted in CORE via its LinkOut service, providing researchers with a direct route to the research. The linking of CORE papers directly from PubMed resources and other related databases further increases the discoverability of content aggregated by CORE, providing a valuable service to our repositories.  …”

A detailed open access model of the PubMed literature | Scientific Data

Abstract:  Portfolio analysis is a fundamental practice of organizational leadership and is a necessary precursor of strategic planning. Successful application requires a highly detailed model of research options. We have constructed a model, the first of its kind, that accurately characterizes these options for the biomedical literature. The model comprises over 18 million PubMed documents from 1996–2019. Document relatedness was measured using a hybrid citation analysis?+?text similarity approach. The resulting 606.6 million document-to-document links were used to create 28,743 document clusters and an associated visual map. Clusters are characterized using metadata (e.g., phrases, MeSH) and over 20 indicators (e.g., funding, patent activity). The map and cluster-level data are embedded in Tableau to provide an interactive model enabling in-depth exploration of a research portfolio. Two example usage cases are provided, one to identify specific research opportunities related to coronavirus, and the second to identify research strengths of a large cohort of African American and Native American researchers at the University of Michigan Medical School.

 

LitCovid – NCBI – NLM – NIH

“LitCovid is a curated literature hub for tracking up-to-date scientific information about the 2019 novel Coronavirus. It is the most comprehensive resource on the subject, providing a central access to 1121 (and growing) research articles in PubMed. The articles are updated daily and are further categorized by different research topics and geographic locations for improved access….”

LitCovid – NCBI – NLM – NIH

“LitCovid is a curated literature hub for tracking up-to-date scientific information about the 2019 novel Coronavirus. It is the most comprehensive resource on the subject, providing a central access to 1121 (and growing) research articles in PubMed. The articles are updated daily and are further categorized by different research topics and geographic locations for improved access….”

Keep up with the latest coronavirus research

“An open-resource literature hub known as LitCovid curates the most comprehensive collection of international research papers so far on the new coronavirus disease COVID-19 (see go.nature.com/3almd5p). Developed with the support of the US National Institutes of Health’s intramural research programme, LitCovid is updated daily with newly published articles. The aim is to provide timely insight from the scientific literature into the biology of the virus and the diagnosis and management of those who have been infected.

LitCovid has a more sophisticated search function than existing resources. It identifies roughly 35% more relevant articles than do conventional keyword-based searches for entries such as ‘COVID-19’ or ‘nCOV’. Furthermore, the articles are categorized by topic — overview, disease mechanism, transmission dynamics, treatment, case report and epidemic forecasting — as well as by geographic location for visualization on a world map…..”

Keep up with the latest coronavirus research

“An open-resource literature hub known as LitCovid curates the most comprehensive collection of international research papers so far on the new coronavirus disease COVID-19 (see go.nature.com/3almd5p). Developed with the support of the US National Institutes of Health’s intramural research programme, LitCovid is updated daily with newly published articles. The aim is to provide timely insight from the scientific literature into the biology of the virus and the diagnosis and management of those who have been infected.

LitCovid has a more sophisticated search function than existing resources. It identifies roughly 35% more relevant articles than do conventional keyword-based searches for entries such as ‘COVID-19’ or ‘nCOV’. Furthermore, the articles are categorized by topic — overview, disease mechanism, transmission dynamics, treatment, case report and epidemic forecasting — as well as by geographic location for visualization on a world map…..”