web analytics
Skip to primary content

About Open Science

(All that comes with it, (but not by itself)

About Open Science

Main menu

  • Home
  • Open Science at UMCG & RUG
  • Open Science Community Groningen
  • Data
  • About

Post navigation

← Previous Next →

Unlocking 100 years of scientific papers: How Scholarcy partnered with BMJ to further I4OC | Scholarcy | The long-form article summariser

Posted on May 12, 2019 by peter.suber's bookmarks

“Reference mining is fundamental to the creation of citation networks and rich, discoverable digital libraries. In recent years, a number of tools have been developed to address this need, but they are often limited by input format, infrastructure requirements and runtime performance. The most recent developments in this space have focused on reference mining PDFs from arts and humanities literature, but there’s a growing need for a fast, accurate way of extracting and parsing references from a wide range of documents and formats across the full research landscape….

From requirements gathering, algorithm refinement, to the process of extracting over 2 million citations as validated XML records in CrossRef, the entire project ran for 12 weeks. Publications which particularly benefited included the British Medical Journal itself (279,000 new records), Gut (177,000), Journal of Clinical Pathology (171,000) and Journal of Neurology, Neurosurgery and Psychiatry (168,000).

99.9% of the extracted records were fully valid XML. In only 0.1% of cases, the XML required some manual correction to meet CrossRef validation standards. The records were uploaded to CrossRef and are now available as open citations for anyone to reuse….”

This entry was posted in oa.bmj, oa.citations, oa.growth, oa.i4oc, oa.metadata, oa.mining, oa.new, oa.progress, oa.scholarcy, openaccess by peter.suber's bookmarks. Bookmark the permalink.
Proudly powered by WordPress