SPARC Europe to facilitate high-level European OS policymaker group CoNOSC

SPARC Europe is honoured to support the Council for National Open Science Coordination (CoNOSC) in their efforts to advance national European Open Science policies. The CoNOSC mission is to help countries […]

The post SPARC Europe to facilitate high-level European OS policymaker group CoNOSC appeared first on SPARC Europe.

Open Science Conference 2022 (Newsletter – Leibniz Research Alliance Open Science)

The 9th International Open Science Conference will be held on March 08-09, 2022. Please save the date. Stay tuned for more information here: https://www.open-science-conference.eu/

For this conference we invite you to submit an abstract for one of the following calls:

The abstract submission deadline is October 15, 2021.

The Open Science Conference 2022 is the 9th international conference of the Leibniz Research Alliance Open Science. The annual conference is dedicated to the Open Science movement.

It provides a unique forum for researchers, librarians, practitioners, infrastructure providers, policy makers, and other important stakeholders to discuss the latest and future developments in Open Science.

The conference offers insights into both practical and technical innovations that serve the implementation of open practices as well as current and pioneering developments in the global Open Science movement. Such developments are, for example, the increasing plea for open practices as lessons learned from global crises as well as recent discussions about the relation of Open Science and knowledge equity. The conference offers many opportunities for networking and exchange.

Will there also be a Barcamp Open Science next year? Definitely, but help us to shape the future direction of the event via this quick survey.

We look forward to seeing you next year!

 

The post Open Science Conference 2022 (Newsletter – Leibniz Research Alliance Open Science) first appeared on Leibniz Research Alliance Open Science.

Data citations in context: We need disciplinary metadata to move forward

By Kathleen Gregory and Anton Ninkov

Data citations hold great promise for a variety of stakeholders. Unfortunately, due in part to a lack of metadata, i.e. about disciplinary domains, many of those promises remain out of reach. Metadata providers – repositories, publishers and researchers – play a key role in improving the current situation.  

The potentials of data citations are many. From the research perspective, citations to data can help researchers discover existing datasets and understand or verify claims made in the academic literature. Citations are also seen as a way to give credit for producing, managing and sharing data, as well as to provide legal attribution. Researchers, funders and repository managers also hope that data citations can provide a mechanism for tracking and understanding the use and ‘impact’ of research data [1]. Bibliometricians, who study patterns in scholarly communication by tracing publications, citations and related metadata, are also interested in using data citations to understand engagements and relationships between data and other forms of research output. 

Figure 1. Perspectives about the potentials of data citation [2]

Realizing the potential of data citations relies on having complete, detailed and standardized metadata describing the who, what, when, where and how of data and their associated work. As we are discovering in the Meaningful Data Counts project, which brings together bibliometricians and members of the research data community as part of the broader Make Data Count initiative, the metadata needed to provide context for both data and data citations are often not provided in standardized ways…if they are provided at all. 

As a first step in this project, we have been mapping the current state of metadata, shared data, and data citations available in the DataCite corpus. Our openly available jupyter notebook pulls realtime metadata about data in DataCite [3] and demonstrates both the evolving nature of the corpus and the lack of available metadata. In particular, our work highlights the current lack of information about a critical metadata element for providing context about data citations – the disciplinary domain where data were created. 

For example, we find that the amount of data available in DataCite has increased by more than 1.5 million individual datasets over a 7 month period from January to July 2021, when the corpus increased from 8,243,204 to 9,930,000 datasets. In January, as few as 5.7% of the available datasets had metadata describing their disciplinary domain according to the most commonly used subject classification system (see the treemap in Figure 2). In July, despite the increased number of datasets overall, the percentage with a disciplinary domain dropped slightly to 5.63%.

Figure 2. Data with metadata describing disciplinary domain, according to the OECD Fields of Science classification, retrieved on July 9th, 2021. For an interactive version of this tree map, with the most current data, please see our Jupyter Notebook [3]

These low percentages reflect the fact that providing information about the subject or disciplinary domain of data is not a required field in the DataCite metadata schema. For the nearly 6% of data that do have subject information, the corpus contains multiple classification schemes of differing granularity levels, ranging from the more general to the more specific. DataCite currently works to automatically map these classifications to each other in order to improve disciplinary metadata. Organizations which submit their data to DataCite also have a role to play in improving these disciplinary descriptions, as this information underlies many of these mapping efforts.  

Subject or disciplinary classifications for data are typically created using three methods:

  • Intellectually, where researchers, data creators or data curators use their expertise to assign a relevant subject classification. 
  • Automatically, where automated techniques are used to extract subject information from other data descriptions, e.g. the title or abstract (if available)
  • By proxy, where data are assigned the same subject classification as a related entity, e.g. when data are given the same subject classification as the repository where they are stored. This can be done either automatically or manually. 

Of these three methods, the intellectual method tends to be the most common, and also the most accurate and time-consuming approach. This method is often carried out by those closest to the data, i.e. researchers/data creators or data curators, who have expert knowledge about the data’s subject or disciplinary context which may be difficult to determine either automatically or by proxy.

While our work also exposes other examples of missing or incomplete metadata [4], we highlight here the current lack of information about disciplinary domains, as disciplinary information is important across all the perspectives shown in Figure 1. For example, disciplinary norms influence how data are shared, how they are made available, how they are understood and how they are reused. Information about disciplines is important for discovering data and is typically used by funders and research evaluators to place academic work in context. Disciplinary analyses are also a critical step in contextualizing citation practices in bibliometric studies, as citation behaviours have repeatedly been shown to follow discipline-specific patterns. Without disciplinary metadata, placing data citations into context will remain elusive and meaningful data metrics cannot be developed. 

In order to move forward with understanding data citations in context, we need better metadata – metadata about disciplinary domains, but also metadata describing other aspects of data creation and use. Metadata providers, from publishers to researchers to data repositories, can help to improve the current situation by working to create complete metadata records describing their data. Only with such metadata can the potentials of data citation be achieved. 

References

[1] These perspectives are visible, e.g. in the Joint Declaration of Data Citation Principles:

Data Citation Synthesis Group: Joint Declaration of Data Citation Principles. Martone M. (ed.) San Diego CA: FORCE11; 2014 https://doi.org/10.25490/a97f-egyk

[2] Gregory, K. (2021, July). Bringing data in sight: Data citations in research. Presentation. Presented at Forum Bibliometrie 2021, Technical University of Munich, online. https://www.ub.tum.de/forum-bibliometrie-2021 

[3] Ninkov, A. (2021). antonninkov/ISSI2021: Datasets on DataCite – an Initial Bibliometric Investigation (1.0) [Computer software]. Zenodo. https://doi.org/10.5281/ZENODO.5092301

[4] Ninkov, A., Gregory, K.; Peters, I., Haustein, S. (2021). Datasets on DataCite – An initial bibliometric investigation. Proceedings of the 18th International Conference of the International Society for Scientometrics and Informetrics, Leuven, Belgium (virtual). Preprint: https://doi.org/10.5281/zenodo.4730857

Newsletter – Leibniz Research Alliance Open Science (Nr. 1 / 2021)

Around the research alliance and its partners

Barcamp Open Science on Tour

Two more barcamps around Open Science will be organized by our partners this year. Even though we would like to meet the Open Science community on site again, both events will take place online and participation is of course free of charge.

The first barcamp will already take place this Friday and Saturday (25/26 June ), but there are still free places. It basically aims at anyone interested in Open Science, but especially at Open Science officers. The event is organized by our alliance partner ZB MED in cooperation with TH Köln. Further information and registration.

The second barcamp will take place on 21 September  2021 (save the date, registration will open soon). This event has a thematic focus on the opportunities, challenges, and requirements of research with available data in the social, educational and economic sciences. The barcamp is organized by the alliance partner DIPF. Further information.

 

New Research Projects in the Alliance

In the research project “Incorporation of infection data on SARS-CoV-2 and other zoonotic viruses into the ORKG”, content from selected publications in the field of virology will be systematically added to the Open Research Knowledge Graph (ORKG). The focus will be on publications on the influence of mutations on pandemic events of SARS-CoV-2 and other zoonotic viruses. In addition, the suitability of the ORKG as an infrastructure for the creation of subject-specific knowledge graphs will be evaluated exemplarily for virology. Read more.

The research project “Reusing research data in a time of crisis: A change in research practices in the COVID-19 pandemic?” addresses the reuse of available research data. The project investigates a potential change in research practices from a focus on primary research to an increased acceptance of secondary research caused by crises like the current COVID-19 pandemic. In the current crisis, primary research in the social and economic sciences (including educational and behavioural sciences) is constrained by social distancing. Thus, the hypothesis is that the constraints should lead researchers to reuse available research data as a replacement for primary data collections that are much more difficult to conduct. Read more.

 

Wrap-Up: Open Science Conference and Barcamp Open Science

In case you missed it, the following reports and further resources could be of your interest:

 

GenR – Latest Blogposts

Decolonizing Scholarly Communications through Bibliodiversity

Diversity is an important characteristic of any healthy ecosystem. In the field of scholarly communications, diversity in services and platforms, funding mechanisms and evaluation measures will allow the ecosystem to accommodate the different workflows, languages, publication outputs and research topics that support the needs of different research communities. Read more.

Launch of Translate Science

Translate Science is interested in the translation of the scholarly literature. Translate Science is an open volunteer group interested in improving the translation of the scientific literature. The group has come together to support work on tools, services and advocate for translating science. Read more.

 

Leaving No One Behind – On the Intersection of Open Science, Knowledge
(In-)Equity and Inclusive Education in the North-South Divide

“From West to the Rest” (Grech 2011, 88) – this is what is being said in the context of inclusive education under a postcolonial perspective. Inclusive Education can be seen as a form of (‘western’) cultural imperialism is being said elsewhere (Haskell 1998). And yes, by looking behind the curtain of the globally understood concept of inclusive education it becomes clear that all that glitters is not gold. Read more.

The post Newsletter – Leibniz Research Alliance Open Science (Nr. 1 / 2021) first appeared on Leibniz Research Alliance Open Science.

Librarians in Action for Open Education: Strategy just out

The European Network of Open Education Librarians ((ENOEL) is helping implement the UNESCO Open Educational Resources (OER) Recommendation as ambassadors and facilitators of Open Education. How it plans to support it […]

The post Librarians in Action for Open Education: Strategy just out appeared first on SPARC Europe.

Data Citation: Let’s Choose Adoption Over Perfection

Daniella Lowenberg, Rachael Lammey, Matthew B. Jones, John Chodacki, Martin Fenner

DOI: 10.5281/zenodo.4701079 

In the last decade, attitudes towards open data publishing have continued to shift, including a rising interest in data citation as well as incorporating open data in research assessment (see Parsons et al. for an overview). This growing emphasis on data citation is driving incentives and evaluation systems for researchers publishing their data. While increased efforts and interest in data citation are a move in the right direction for understanding research data impact and assessment, there are clear difficulties and roadblocks in having universal and accessible data citation across all research disciplines. But these roadblocks can be mitigated and do not need to keep us in a consistent limbo.

The unique properties of data as a citable object have attracted much needed attention, although it has also created an unhelpful perception that data citation is a challenge and requires uniquely burdensome processes to implement. This perception of difficulty begins with defining a ‘citation’ for data. The reality is that all citations are relationships between scholarly objects.  A ‘data citation’ can be as simple as a journal article or other dataset declaring that a dataset was important to the creation of that work.  This is not a unique challenge.  However, many publishers and funders have elevated the relationship of data that “underlies the research” into a Data Availability Statement (DAS).  This has helped address some issues publishers have found with typesetting or production techniques that stripped non-articles from citations.  However, because of this segmentation of data from typical citation lists, and the exclusion of data citations in article metadata, many communities have felt they are in a stalemate about how to move forward. 

Data citations are targeted as an area to explore in terms of research assessment. However, we do not have a clear understanding of how many data citations exist or how often data are reused. In the last few years, the majority of data citation conversations have been facilitated through groups at Research Data Alliance (via Scholix), Earth Science Information Partners (ESIP), EMBL- European Bioinformatics Institute (EMBL-EBI), American Geophysical Union (AGU), and FORCE11. These conversations have focused primarily on datasets and articles that have DOIs from DataCite and Crossref, respectively, emphasizing the relationship between datasets and published articles. While those relationships are areas that need broad uptake from repositories and publishers alike, they do not illustrate the full picture. Many citations are not being accounted for, namely biomedical datasets with accession numbers and compact identifiers that are not registered through DataCite but readily accessible through resolvers like identifiers.org. There is also a lack of understanding around the citations of datasets in other scholarly and non-scholarly (e.g., government documents, policy papers) outputs. 

For these reasons, we have tried to ensure that conversations about data citation are not framed solely around the notion of assigning “credit” or around assigning any specific meaning to citations, for that matter. Without a full picture of how many citations exist, how datasets are composed across disciplines, how citation behavior varies across disciplines, and what context the citations are used in, it is impossible and inappropriate to use citations as a shorthand for credit. The community is working towards a better understanding of citation behavior—and we believe we will get there—but we need to be careful and considered in doing so to avoid repeating previous mistakes (e.g., creating another impact factor).  

Why data citation is perceived as difficult

Data are complex. As mentioned in our 2019 book, data are nuanced. This means data citations are complex and understanding these nuances are essential for understanding true measures of data reuse. For instance, there is work to be done to understand the role of provenance, dataset to dataset re-usage, data aggregation and derivation, and other ways for measuring usage of datasets without a formal “citation”.

Data citations are complex. There is a well-established concept of scholarly citations, using reference lists of citations formatted in a defined citation style. The main challenges with the current approach center on making citations machine-readable using standardized metadata instead of citation styles meant for human readers, as well as on making citations machine-accessible using open APIs and aggregators. These are general challenges to be addressed with citation, but there are additional questions specific to handling data citations: is there a DOI or other persistent identifier and basic citation metadata for the dataset, is there tooling to bring this information into the citation style used by the publisher, should data citations go into the article reference list, what to do when the number of datasets cited in a publication goes into the 1000s or more, should datasets in reference lists be called out as data, etc.

There’s a lack of consistency in guidance. Despite the growing interest among various stakeholders (researchers, journals, repositories, preprint servers, and others)  in supporting data citation, there is no consistency in guidance for and across these groups. These inconsistencies are in respect to how citations should be formatted, how citations should be marked up and indexed, and what the role for each stakeholder should be (especially repositories). Some of this can be attributed to a constant reinvent-the-wheel approach as well as to the wide variety of stakeholder groups and hubs for this information—understandably, people are confused between how Scholix fits with OpenAire, Crossref, and DataCite, nevermind the profusion of other overlapping initiatives and projects in this space that can make it even more difficult to navigate. It is clear that our best way forward is to not consistently reinvent the wheel, spawning new groups and initiatives, but rather to build on the existing work, leveraging the successes of the last decade of investment in data citations, and finding solutions for the more advanced issues at hand. In short: let’s focus on developing the most basic, clear guidance, and work upwards from there.

There’s a tension between data availability statements and data citations. In the last decade, publishers and funders have heavily focused on requiring data availability statements and ensuring that is the way to designate when articles have an associated dataset published. These data availability statements are rarely marked up as a relation in the article metadata or a note of re-use (outside of self citation). If we continue to focus solely on data availability statements as a required first step, which have yet to solve the “machine readability problem”, we will lose slim resources that would be better used to think about how each journal publisher can designate data reuse and citations.

Guidance and decision points

Understanding the many intricacies of data, citations, and data citation, we propose the following path forward for our communities to work effectively towards achieving widespread implementation of data citation and data reuse. This path forward begins with making decisions around clear guidance that needs to be provided, shifting focus away from “decision-pending” attitude and moving forward with clear recommendations on the following:

  • Best practices for citing datasets in articles, preprints, and books. We have multiple sets of best practices. We don’t need more guidance documents, we need consolidation and rationalization of the guidance that already exists.  
  • Simplifying relationship type complexity. The complexity of ontologies for relationships is causing unnecessary churn and delays in implementation. Providers should simplify this; however, the community shouldn’t wait. We can and should implement viable solutions now. We should be promoting datasets in reference lists as a first viable solution. 
  • How non-DOIs are cited. We have too many conversations happening about DOIs and not enough happening about citation in other identifier communities. These communities need to reach some simple conventions around putting data citations into reference lists with globally unique PIDs and citation metadata, in order to avoid requiring massive text mining efforts looking for string matches to, for example, “PDB:6VXX”, the identifier for the spike protein for COVID-19.  
  • Publisher support for those who are not working with Crossref. Not all publishers use Crossref services or have the ability to implement Crossref’s approaches to data citations. We need to focus attention on accessible methods for reference extraction (e.g., from PDFs) and larger support for smaller publishers that do not have the resources to retool to fit current guidance. 
  • The role for data repositories. Publishers are key to implementing data citation but data repositories must also focus on declaring relationships to articles and other outputs in their metadata. Data repositories should focus on making their datasets citable through PIDs and declaring robust metadata as well as reporting all known citations and linkages publicly so they can be used for aggregation.
  • Researchers should cite data despite these infrastructure hold-ups. Regardless of the hurdles to implementing all of the established best practices, the basic fact remains that researchers can currently cite data and they should, using approaches available today.  

Choosing adoption over perfection

Perfection is the enemy of good and finding solutions for every complexity of data citation does not need to be a roadblock to get started. We can use a phased approach to begin implementing best practices for data citations right now:

Phase I: basic implementation

Align as much as possible with existing community practices and workflows (e.g., using reference lists)

Phase II: advanced implementation

Address special use cases (e.g., relation types, machine-readable data availability statements, dynamic data, dataset-dataset provenance)

Phase III: beyond data citation

Build infrastructure for other indicators assessing data reuse

While we have dabbled in all three of these phrases already, we are still largely stuck in Phase I, constantly reinventing the same basic wheel that keeps spinning around the same place. 

Our focus should be on how to scale these best practices across all publishers and repositories, supporting the diverse research landscape. This includes advancing the conversation beyond the DOI-based focus. Once that happens we can really move forward with building mechanisms for credit and understanding data re-use for research assessment.

Despite the agenda ahead, there are many steps that can be taken right now to continue towards the dreamstate. The community should not wait for infrastructure to be perfect before engaging in data citation support. 

This is important, so let’s say it again! The community should not wait for infrastructure to be perfect before engaging in data citation support. 

Data citations are harder when we act like the adoption hurdles are insurmountable, so let’s simplify. Our infrastructure for data citations will continue to improve, use cases will continue to be defined and evolve, and we need as many broad stakeholders as possible to hop on board now and work with us towards comprehensive support for data citation.

A Fork in the Road: Peter Suber on the importance of supporting Open Science Infrastructure

On a gloomy Friday afternoon back in March, SPARC Europe had the pleasure of interviewing a very special guest. We sat down (in our respective living rooms, in front of our […]

The post A Fork in the Road: Peter Suber on the importance of supporting Open Science Infrastructure appeared first on SPARC Europe.

Let’s Align International and National Copyright OA Policy Action

“For so many years researchers have been confused about what they can and can’t do with respect to copyright …We can make the life of the author easier!”: these were some […]

The post Let’s Align International and National Copyright OA Policy Action appeared first on SPARC Europe.

University of Ottawa signs agreement with PeerJ for innovative new Institutional Author Membership model to fund Open Access

“We are delighted to announce that University of Ottawa have signed up to an innovative new approach to fund Open Access publishing. Funded by the University of Ottawa Library, authors affiliated with the University of Ottawa may publish in PeerJ journals using a new Three-Year Membership; the Membership allows authors to publish up to three articles at any time within a three-year period….”

University of Ottawa signs agreement with PeerJ for innovative new Institutional Author Membership model to fund Open Access

“We are delighted to announce that University of Ottawa have signed up to an innovative new approach to fund Open Access publishing. Funded by the University of Ottawa Library, authors affiliated with the University of Ottawa may publish in PeerJ journals using a new Three-Year Membership; the Membership allows authors to publish up to three articles at any time within a three-year period….”

Newsletter – Leibniz Research Alliance Open Science (Nr. 6 / 2020)

Barcamp Open Science & Open Science Conference

The registration for our Barcamp Open Science and Open Science Conference is now open. Both events take place completely virtually.

Get tickets here: https://www.open-science-conference.eu

Barcamp Open Science // 16 February 2021 // #oscibar

The Barcamp Open Science as pre-event of the Open Science Conference is open to everybody interested in discussing, learning more about, and sharing experiences on practices in Open Science. We would like to invite researchers and practitioners from various backgrounds to contribute their experience and ideas to the discussion. The Barcamp will bring together both novices and experts and its open format supports lively discussions, interesting presentations, development of new ideas, and knowledge exchange. Though, previous knowledge on Open Science is not mandatory. The Barcamp is open to all topics around Open Science that the participants like to discuss. 

Open Science Conference // 17-19 February 2021 // #osc2021

The Open Science Conference 2021 is the 8th international conference of the Leibniz Research Alliance Open Science. The annual conference is dedicated to the Open Science movement and provides a unique forum for researchers, librarians, practitioners, infrastructure providers, policy makers, and other important stakeholders to discuss the latest and future developments in Open Science.

This year’s conference will especially focus on the effects and impact of (global) crises and associated societal challenges, such as the Corona pandemic or the climate change, to open research practices and science communication in the context of the digitisation of science. And vice versa, how open practices help to cope with crises. You can look forward to the following speakers:

  • Dr Danielle Cooper, Ithaka S+R (USA)
  • Hilary Hanahoe, Research Data Alliance (Italy/UK)
  • Dr Céline Heinl, Federal Institute for Risk Assessment (BfR) (Germany)
  • Marte Sybil Kessler, Stifterverband (Germany)
  • Dr Alina Loth, Berlin School of Public Engagement and Open Science / Humboldt University of Berlin (Germany)
  • Vanessa Proudman, SPARC Europe (Netherlands)
  • Clifford Tatum, Leiden University (Netherlands)
  • Leonhard Volz, Journal of European Psychology Students / University of Amsterdam (Netherlands)
  • Dr Lilly Winfree, Open Knowledge Foundation (USA/UK)

The post Newsletter – Leibniz Research Alliance Open Science (Nr. 6 / 2020) first appeared on Leibniz Research Alliance Open Science.