OpCitance: Citation contexts identified from the PubMed Central open access articles | Scientific Data

Abstract:  OpCitance contains all the sentences from 2 million PubMed Central open-access (PMCOA) articles, with 137 million inline citations annotated (i.e., the “citation contexts”). Parsing out the references and citation contexts from the PMCOA XML files was non-trivial due to the diversity of referencing style. Only 0.5% citation contexts remain unidentified due to technical or human issues, e.g., references unmentioned by the authors in the text or improper XML nesting, which is more common among older articles (pre-2000). PubMed IDs (PMIDs) linked to inline citations in the XML files compared to citations harvested using the NCBI E-Utilities differed for 70.96% of the articles. Using an in-house citation matcher, called Patci, 6.84% of the referenced PMIDs were supplemented and corrected. OpCitance includes fewer total number of articles than the Semantic Scholar Open Research Corpus, but OpCitance has 160 thousand unique articles, a higher inline citation identification rate, and a more accurate reference mapping to PMIDs. We hope that OpCitance will facilitate citation context studies in particular and benefit text-mining research more broadly.

 

 

R= Making it easy to generate CrossRef XML with confidence

“In this module we present the proposal and budget for an open source library to generate CrossRef DOI XML. We imagine a world where people don’t think twice about creating DOIs, and integrate changes with confidence. The proposed libraries could be extended to improve confidence in generating XML for DataCite in web applications as well. This project is unfunded at the time of publication and we are looking for support to realise this mission….”

R= Making it easy to generate CrossRef XML with confidence

“In this module we present the proposal and budget for an open source library to generate CrossRef DOI XML. We imagine a world where people don’t think twice about creating DOIs, and integrate changes with confidence. The proposed libraries could be extended to improve confidence in generating XML for DataCite in web applications as well. This project is unfunded at the time of publication and we are looking for support to realise this mission….”

When XML Marks the Spot: Machine-readable journal articles for discovery and preservation

“If you work with a campus-based journal program and you’re looking to expand the readership and reputation of the articles you publish, adding them to relevant archives and indexes (A&Is) presents a treasure trove of opportunities. A&Is serve as valuable content distribution networks, and inclusion in selective ones is a signal of research quality. You may have heard about XML, one of the primary machine-readable formats academic databases use to ingest content, and wonder if that’s something you need to reach your archiving and indexing goals.

This free webinar, co-hosted by Scholastica, UOregon Libraries, and the GWU Masters in Publishing program, will offer a crash course in the benefits of XML production and use cases, including:

What XML is and the different types required or preferred by academic indexes and archives (with an overview of JATS)
How producing metadata and/or full-text articles in XML can unlock discovery and archiving opportunities with examples
Additional benefits of XML for journal accessibility as well as publishing program and professional development
When XML is needed and when it may not be the best use of journal resources
Ways you can produce XML, including an overview of Scholastica’s production service…”

Welcome to the Single Source Publishing Community | The Single Source Publishing Community (SSPC) is a network stakeholders from the Open Science community that are interested in Single Source Publishing (SSP) for scholarly purposes – developing open-source software and advocacy.

“The Single Source Publishing Community (SSPC) is a network of stakeholders from the Open Science community that are interested in Single Source Publishing (SSP) for scholarly purposes – developing open-source software and advocacy.”

OS-APS: Open Source Academic Publishing Suite

“OS-APS enables XML-based workflows for media-neutral publishing (e.g. Open Access) without technical expertise and cost-intensive XML editing and content management systems. Corporate design can be controlled via existing typesetting templates or in detail with a template development kit. We plan to use and extend existing open source components, such as OJS, OMP and Pandoc, and make all results open source.”

An XML-Based Migration from Digital Commons to Open Journal Systems

Abstract:  The Oregon Library Association has produced its peer-reviewed journal, the OLA Quarterly (OLAQ), since 1995, and OLAQ was published in Digital Commons beginning in 2014. When the host institution undertook to move away from Bepress, their new repository solution was no longer a good match for OLAQ. Oregon State University and University of Oregon agreed to move the journal into their joint instance of Open Journal Systems (OJS), and a small team from OSU Libraries carried out the migration project. The OSU project team declined to use PKP’s existing migration plugin for a number of reasons, instead pursuing a metadata-centered migration pipeline from Digital Commons to OJS. We used custom XSLT to convert tabular data exported from Bepress into PKP’s Native XML schema, which we imported using the OJS Native XML Plugin. This approach provided a high degree of control over the journal’s metadata and a robust ability to test and make adjustments along the way. The article discusses the development of the transformation stylesheet, the metadata mapping and cleanup work involved, as well as advantages and limitations of using this migration strategy.

 

Streamlined peer review and PubMed-ready XML: How Spartan Medical Research Journal is using Scholastica to grow

“When SMRJ was started, the editors used email and Word docs to track peer review, and they published all articles in PDF format. However, with the journal continuing to expand, the editors realized they were in need of an easier way to track submissions and a new publishing system to improve the journal’s online reading experience and chances of being added to relevant indexes. As a result, Chief Editor William Corser and Assistant Editor Sam Wisniewski began searching for publishing tools and services, focused on three key areas: streamlining peer review, modernizing the journal’s website, and producing XML for all articles.

After considering different options, Corser and Wisniewski chose to use Scholastica’s peer review and open access publishing software, as well as Scholastica’s typesetting service to produce PDF, HTML, and XML article files. Since making the switch, they’ve found that peer review is smoother for editors and authors and they’re making strides towards reaching their article discovery and indexing goals….”

An XML Repository of All bioRxiv Articles is Now Available for Text and Data Mining

“bioRxiv and medRxiv provide free and unrestricted access to all articles posted on their servers. We believe this should apply not only to human readers but also to machine analysis of the content. A growing variety of resources have been created to facilitate this access.

bioRxiv and medRxiv metadata are made available via a number of dedicated RSS feeds and APIs. Simplified summary statistics covering the content and usage are also available. For bioRxiv, this information is available here’

Bulk access to the full text of bioRxiv articles for the purposes of text and data mining (TDM) is available via a dedicated Amazon S3 resource. Click here for details of this TDM resource and how to access it….”

What is MEI?

“The Music Encoding Initiative (MEI) is a 21st century community-driven open-source effort to define guidelines for encoding musical documents in a machine-readable structure.

It brings together specialists from various music research communities, including technologists, librarians, historians, and theorists in a common effort to discuss and define best practices for representing a broad range of musical documents and structures. The results of these discussions are then formalized into the MEI schema, a core set of rules for recording physical and intellectual characteristics of music notation documents expressed as an eXtensible Markup Language (XML) schema. This schema is developed and maintained by the MEI Technical Team….”

New business models for the open research agenda | Research Information

“The rise of preprints and the move towards universal open access are potential threats to traditional business models in scholarly publishing, writes Phil Gooch

Publishers have started responding to the latter with transformative agreements[1], but if authors can simply upload their research to a preprint server for immediate dissemination, comment and review, why submit to a traditional journal at all? Some journals are addressing this by offering authors frictionless submission direct from the preprint server. This tackles two problems at once: easing authors’ frustrations with existing journal submission systems[2], and providing a more direct route from the raw preprint to the richly linked, multiformat version of record that readers demand and accessibility standards require….

Dissemination of early-stage research as mobile-unfriendly PDF is arguably a technological step backwards. If preprints are here to stay, the reading experience needs to be improved. A number of vendors have developed native XML or LaTeX authoring environments which enable dissemination in richer formats….”

DOAJ to add Crossref compatibility – News Service

“In a series of metadata improvements, publishers will be able to upload XML in the Crossref format to us from 18th February 2020.

In 2018, we asked our publishers what would make their interaction with DOAJ easier and 46% said that they would like us to accept Crossref XML. Today we only accept XML formatted to our proprietary DOAJ format….”