Evolving our support for text-and-data mining – Crossref

“Many researchers want to carry out analysis and extraction of information from large sets of data, such as journal articles and other scholarly content. Methods such as screen-scraping are error-prone, place too much strain on content sites and may be unrepeatable or break if site layouts change. Providing researchers with automated access to the full-text content via DOIs and Crossref metadata reduces these problems, allowing for easy deduplication and reproducibility. Supporting text and data mining echoes our mission to make research outputs easy to find, cite, link, assess, and reuse.

In 2013 Crossref embarked on a project to better support Crossref members and researchers with Text and Data Mining requests and access. There were two main parts to the project:

To collect and make available full-text links and publisher TDM license links in the metadata.

To provide a service (TDM click-through service) for Crossref members to post their additional TDM terms and conditions and for researchers to access, review and accept these terms….

To date, 37.5 million works registered with Crossref have both full-text links and TDM license information. We continue to encourage all members to include full-text links and license information in the metadata they register to assist researchers with TDM. You can see how each member is doing via its participation report (e.g. Wiley’s)….

Members are also making subscription content available for text mining (temporarily or otherwise) for specific purposes, such as to help the research community with its response to COVID-19. Back in April we highlighted how this can be achieved by including:

A “free to read” element in the access indicators section of publisher metadata indicating that the content is being made available free-of-charge (gratis)

An assertion element indicating that the content being made available is available free-of-charge….”