“Sorbonne University has been deeply committed to the promotion and the development of open science for many years. According to its commitment to open research information, it has decided to discontinue its subscription to the Web of Science publication database and Clarivate bibliometric tools in 2024. By resolutely abandoning the use of proprietary bibliometric products, it is opening the way for open, free and participative tools.”
“In the scholarly communications environment, the evolution of a journal article can be traced by the relationships it has with its preprints. Those preprint–journal article relationships are an important component of the research nexus. Some of those relationships are provided by Crossref members (including publishers, universities, research groups, funders, etc.) when they deposit metadata with Crossref, but we know that a significant number of them are missing. To fill this gap, we developed a new automated strategy for discovering relationships between preprints and journal articles and applied it to all the preprints in the Crossref database. We made the resulting dataset, containing both publisher-asserted and automatically discovered relationships, publicly available for anyone to analyse.
We have developed a new, heuristic-based strategy for matching journal articles to their preprints. It achieved the following results on the evaluation dataset: precision 0.99, recall 0.95, F0.5 0.98. The code is available here.
We applied the strategy to all the preprints in the Crossref database. It discovered 627K preprint–journal article relationships.
We gathered all preprint–journal article relationships deposited by Crossref members, merged them with those discovered by the new strategy, and made everything available as a dataset. There are 642K relationships in the dataset, including:
296K provided by the publisher and discovered by the strategy,
331K new relationships discovered by the strategy only,
15K provided by the publisher only.
In the future, we plan to replace our current matching strategy with the new one and make all discovered relationships available through the Crossref REST API….”
“Signatories of this statement recommend the following as best practice in research data sharing:
When publishing their results, researchers deposit related research data and outputs in a trustworthy data repository that assigns persistent identifiers (DOIs where available). Researchers link to research data using persistent identifiers.
When using research data created by others, researchers provide attribution by citing the datasets in the reference section using persistent identifiers.
Data repositories enable sharing of research outputs in a FAIR way, including support for metadata quality and completeness.
Publishers set appropriate journal data policies, describing the way in which data is to be shared alongside the published article.
Publishers set instructions for authors to include Data Citations with persistent identifiers in the references section of articles.
Publishers include Data Citations and links to data in Data Availability Statements with persistent identifiers (DOIs where available) in the article metadata registered with Crossref.
In addition to Data Citations, Data Availability Statements (human- and machine-readable) are included in published articles where appropriate.
Repositories and publishers connect articles and datasets through persistent identifier connections in the metadata and reference lists.
Funders and research organizations provide researchers with guidance on open science practices, track compliance with open science policies where possible, and promote and incentivize researchers to openly share, cite and link research data.
Funders, policymaking institutions, publishers and research organizations collaborate towards aligning FAIR research data policies and guidelines.
All stakeholders collaborate in the development of tools, processes, and incentives throughout the research cycle to enable sharing of high-quality research data, making all steps in the process clear, easy and efficient for researchers by providing support and guidance.
Stakeholders responsible for research assessment take into account data sharing and data citation in their reward and recognition system structures….”
“Join us for the third session of Better Together, a joint webinar series co-organized by Crossref, DataCite and ORCID. We are delighted to announce our featured speaker, Dr. Tiffany Straza, an Open Science Consultant in the Section of Science, Technology and Innovation Policy at UNESCO.
As addressed in the UNESCO Recommendation on Open Science and UNESCO open science toolkit, Open Science infrastructures are key to the sustainability of Open Science. To make scientific research more accessible to everyone, the interoperability and reusability of research outputs associated with uniquely identified individuals are fundamental, which can be achieved via adopting PIDs across different research workflows, improving permanent and unrestricted access to the research community. In this session, we will discuss Open Science, the UNESCO Recommendations, and how connections between research outputs, organizations, and individuals can benefit different research workflows and save costs….”
“The October 2023 Open Science Roundup is dedicated to International Open Access Week, a yearly celebration endorsing open access (OA) to scholarly output and creating a more equitable knowledge society. This month, we hear from Ginny Hendricks from Crossref on Digital Object Identifiers (DOIs)”.
“Following on the announcement that Crossref’s Open Funder Registry will be merging with ROR after 2024, we’d like to do a deep dive into the specifics of the evidence that ROR is ready to take on the important work that the Open Funder Registry has been doing: identifying research funders in a clean, consistent, comprehensive, and interoperable way. The main thing you need to know is that ROR’s data is up to the challenge. As of today, there is a corresponding ROR ID for over 94% of Funder ID assertions in both DataCite and Crossref DOI records….”
“By embracing DataCite and Crossref DOIs, the global scholarly community is empowered, reducing financial barriers and fostering broad creation, dissemination, and recognition of research outputs and resources….”
Abstract: We report evidence of an undocumented method to manipulate citation counts involving ‘sneaked’ references. Sneaked references are registered as metadata for scientific articles in which they do not appear. This manipulation exploits trusted relationships between various actors: publishers, the Crossref metadata registration agency, digital libraries, and bibliometric platforms. By collecting metadata from various sources, we show that extra undue references are actually sneaked in at Digital Object Identifier (DOI) registration time, resulting in artificially inflated citation counts. As a case study, focusing on three journals from a given publisher, we identified at least 9% sneaked references (5,978/65,836) mainly benefiting two authors. Despite not existing in the articles, these sneaked references exist in metadata registries and inappropriately propagate to bibliometric dashboards. Furthermore, we discovered ‘lost’ references: the studied bibliometric platform failed to index at least 56% (36,939/65,836) of the references listed in the HTML version of the publications. The extent of the sneaked and lost references in the global literature remains unknown and requires further investigations. Bibliometric platforms producing citation counts should identify, quantify, and correct these flaws to provide accurate data to their patrons and prevent further citation gaming.
“Around that time we realized the world lacked a comprehensive database of retractions. We saw how many were missing from sources researchers used, whether PubMed, Web of Science, Scopus, or others – including Crossref, more about which I will say in a moment. We were cataloging them in spreadsheets ourselves, but couldn’t keep up.
The three foundations all agreed to support our work, not just the journalism, but to create what became The Retraction Watch Database, officially launched in 2018. Part of that funding was a grant to create a strategic plan for sustainability and growth. One of the pillars of that plan was licensing the Database to organizations – commercial and nonprofit – who could use it in products that would help researchers know when what they were reading had been retracted, among other purposes.
Those license fees – along with other income, particularly individual donations and a subcontract from a grant from the Howard Hughes Medical Institute (HHMI) – have kept Retraction Watch and The Center for Scientific Integrity running for several years. We are deeply grateful for the support and show of confidence they represent.
But we also always wanted to make the Database available to as many people as possible, whether or not they had access to tools that licensed it, if we could find a financial model that did not rely on such fees. (We always provided the data free of charge to scholars studying retractions and related phenomena.)
Fast forward to today. We’re thrilled to announce that Crossref has acquired The Retraction Watch Database and will make it completely open and freely available….”
The Center for Scientific Integrity, the organisation behind the Retraction Watch blog and database, and Crossref, the global infrastructure underpinning research communications, both not-for-profits, announced today that the Retraction Watch database has been acquired by Crossref and made a public resource. An agreement between the two organisations will allow Retraction Watch to keep the data populated on an ongoing basis and always open, alongside publishers registering their retraction notices directly with Crossref.
Today’s big news is that Crossref has acquired the Retraction Watch database of expressions of concerns and retractions and has made it openly accessible to anyone who wants to use it. I’m waiting for full confirmation of the license or public domain dedication under which it will be released, but this is still a great commitment of Crossref to the POSI principles. The liberation of this database is good for science and scholarship in general. It means that Crossref now knows about approximately 50,000 retractions.
Access to the database, for now, is via the Crossref Labs API, which I write and maintain. The Crossref Labs API sits in between the user and the Live API to inject new experimental metadata fields. The API can also serve files, so you can get the CSV of the latest Retraction Watch data at https://api.labs.crossref.org/data/retractionwatch?mailto=[YOUR@EMAIL.HERE].
Meanwhile, the Labs API will show you retraction data in the cr-labs-updates entry if you visit a work that has data, e.g. https://api.labs.crossref.org/works/10.2147/CMAR.S324920?mailto=[YOUR@EMAIL.HERE].
The obvious question (and one that was asked within an hour of our launch) is: how can I retrieve a list of all retractions via the API? This is what we call a “filter” in the API and, for Reasons™, it is very difficult to do in the Labs API. The Reasons® that it’s so difficult is because the Labs API gets its data by pulling from the Live API, which knows nothing about retractions, and then injecting the data. So pulling out just the entries that have retractions means building a separate index of all items with retractions, then fetching these, and paging through them with the user’s requests. This gets to be HORRIBLY slow on the performance front with even not much load, and I don’t recommend it. We tried several ways of implementing this but to no avail. So, for now, please download and parse the CSV if you want a full list of retractions. We’ll continue to work on an API solution, but it’s going to take us a bit longer.
In the meantime, publishers: please continue to deposit retractions. It’s really important and we need to end the culture of shame (at least for publishers) around declaring retractions. It is vital for the progress of science that we accurately mark papers that have been shown to be defective.
The other important thing to note about all this is that, with any luck, it will help to make Retraction Watch itself sustainable. As they say in their announcement: “That means we have achieved sustainability – the highest priority goal for any nonprofit – for the database side of our operation. And the acquisition fee provides important unrestricted reserves that allow for breathing room and the potential for growth.” It’s really great that Crossref can contribute towards this sustainability, in the spirit of also improving the metadata that we can offer.
“Today, we are announcing a long-term plan to deprecate the Open Funder Registry. For some time, we have understood that there is significant overlap between the Funder Registry and the Research Organization Registry (ROR), and funders and publishers have been asking us whether they should use Funder IDs or ROR IDs to identify funders. It has therefore become clear that merging the two registries will make workflows more efficient and less confusing for all concerned. Crossref and ROR are therefore working together to ensure that Crossref members and funders can use ROR to simplify persistent identifier integrations, to register better metadata, and to help connect research outputs to research funders.
Just yesterday, we published a summary of a recent workshop between funders and publishers on funding metadata workflows that we convened with the Dutch Research Council (NWO) and Sesame Open Science. As the report notes, “open funding metadata is arguably the next big thing” [in Open Science]. That being the case, we think this is the ideal time to strengthen our support of open funding metadata by beginning this transition to ROR….”
by Hans de Jonge, Bianca Kramer, Fabienne Michaud, Ginny Hendricks
Ten years on from the launch of the Open Funder Registry (OFR, formerly FundRef), there is renewed interest in the potential of openly available funding metadata through Crossref. And with that: calls to improve the quality and completeness of that data. Currently, about 25% of Crossref records contain some kind of funding information. Over the years, this figure has grown steadily. A number of recent publications have shown, however, that there is considerable variation in the extent to which publishers deposit these data to Crossref. Technical but also business issues seem to lie at the root of this. Crossref – in close collaboration with the Dutch Research Council NWO and Sesame Open Science – brought together a group of 26 organizations from across the ecosystem to discuss the barriers and possible solutions. This blog presents some anonymized lessons learned.
“Digital preservation is crucial to the “persistence” of persistent identifiers. Without a reliable archival solution, if a Crossref member ceases operations or there is a technical disaster, the identifier will no longer resolve. This is why the Crossref member terms insist that publishers make best efforts to ensure deposit in a reputable archive service. This means that, if there is a system failure, the DOI will continue to resolve and the content will remain accessible. This is how we protect the integrity of the scholarly record.
I will write another post, soon, on the reality of preservation of items with a Crossref DOI, but recent work in the Labs team has determined that we have a situation of drastic under-preservation of much scholarly material that has been assigned a persistent identifier. In particular, content from our smaller Crossref members, with limited financial resources, is often precariously preserved. Further, DOI URLs are not always updated, even when, for instance, the underlying domain has been registered by a different third party. This results in DOIs pointing to new, hijacked, and elapsed content that does not reflect the metadata that we hold.
We (Geoffrey) have (has) long-harboured ambitions to build a system that would allow for automatic deposit into an archive and then to present access options to the resolving user. This would ensure that all Crossref content had at least one archival solution backing it and greatly contribute to the improved persistent resolvability of our DOIs. We refer to this, internally, as “Project Op Cit”. And we’re now in a position to begin building it.
However, we need to get this right from the design phase out. We need input from librarians working in the digital preservation space. We need input from members on whether they would use such a service. We are not digital preservation experts and we are acutely aware that we need the expertise of those who are, particularly where we’ve had to take some shortcuts. For instance: we are aware that the Internet Archive is perhaps not the first choice of many digital preservation librarians and specialists, who opt for specific scholarly-communications solutions. However, it is easy, open, and free. Hence, we propose for the prototype to use IA, on the assumption that this will be a proof-of-concept only, which we will expand to other archives if there is demand and once it works….”
“Join us for the second of the joint webinar series co-organized by DataCite, Crossref and ORCID. We will talk in-depth about who we are, our global equitable/participation/access programs, and how our organizations work together for the benefit of the scholarly community. The webinar will be presented in English and will last 90 minutes including time for Q&A. The slides and recording will be shared afterwards with all who register.”