European Open Science Cloud: small projects, big plans and 1 billion EUR

by Claudia Sittner

Prof. Dr Klaus Tochtermann is Director of the ZBW – Leibniz Information Centre for Economics, Member of the German Council for Scientific Information Infrastructures (RfII) and board member of the recently established European Open Science Cloud Association (EOSC Association). He was a member of the EOSC’s High Level Expert Group and the EOSC working group for sustainability for many years. He also founded, in 2012, the Leibniz Research Alliance Open Science, the international Open Science Conference and the associated Barcamp Open Science.

Recently, he was interviewed by host Dr Doreen Siegfried (ZBW) in the ZBW podcast “The Future is Open Science” on the future of the European Open Science Cloud and the complexity of the landscape for research data. This blog post is a shortened version of the podcast episode “European Open Science Cloud – Internet of FAIR Data and Services” with Klaus Tochtermann. You can listen to the entire episode (35 minutes) here (German).

Why the name European Open Science Cloud never fitted

Something that will surprise many people: “The terminology of the EOSC was never appropriate – even in 2015”, according to Tochtermann. Back then – as the initial ideas for the EOSC were being developed and small projects were commencing – it was neither European, nor Open, nor Science nor a Cloud:

“It isn’t European – because research doesn’t stop at the regional borders of Europe, but instead many research groups are internationally networked. It isn’t open – because even in science there is data that requires protection such as patient data. It isn’t science – because many scientific research projects also use data from economy. And it isn’t cloud – because the point is not to deposit all data centrally in a cloud solution”, explains Klaus Tochtermann. The term was specified by the European Commission at the time and is now established. Among experts, the term “Internet of FAIR Data and Services” (IFDS) is preferred, says Tochtermann.

Preparatory phase 2015 to 2020

The EOSC started in 2015 with the aim “to provide European researchers, innovators, companies and citizens with a federated and open multi-disciplinary environment where they can publish, find and re-use data, tools and services for research, innovation and educational purposes.” (European Commission).

Since then, 320 million EUR have been deployed to fund 50 projects relating to research data management. These have however only shed light on individual aspects of the EOSC. “In fact, we are still a long way from being able to offer EOSC operationally in the scientific system”, says Tochtermann.

The funds were integrated into a research framework programme that only financed smaller projects at a time – this is owing to the way the European Commission functions and how it funds research. That’s why there was never one big EOSC project, but many small individual projects. These examined issues such as: “What would a search engine for research data look like? How can identifiers for research data be managed?”, explains the ZBW director.

Large projects EOSC Secretariat and EOSC Future

Then the EOSC went into the next phase with two large projects: EOSC Secretariat and EOSC Future. Running time: 30 months. Budget: 41 million EUR. Both are intended to bring together all previous projects in the direction of EOSC, i.e. to enable convergence and actually draw up a “System EOSC”. All puzzle parts from earlier small projects are now being put together to form a large EOSC blueprint.

Founding of the EOSC Association

The EOSC Association was founded in 2020. It is a formal institution and a foundation under Belgian law. It is headquartered in Brussels and will consolidate all activities. A board of directors has been appointed to coordinate the activities, made up of the president Karl Luyben and a further eight members, including Klaus Tochtermann.

In February 2021, the Strategic Research and Innovation Agenda (SRIA, PDF) laid down what the EOSC Association should achieve over the next few years. From now on, all EOSC projects must be orientated on these SRIA guidelines.

Initial time plan for the European Open Science Cloud

The Strategic Research and Innovation Agenda anticipates various development stages with precisely defined timetables. Basis functionalities are classified as “EOSC Core”, a level that should be implemented by 2023. Here, elements such as search, storage/save or a log-in function will be realised. This will be followed by the launch of “EOSC Exchange”, which deals with more complex functionalities and services for special data analyses of research datasets.

Collaboration between the EOSC Association and the European Commission

On the question of how the European Open Science Cloud Association and the European Commission cooperate with each other, Tochtermann emphasises the good relationship to the Commission. The so called partnership model, which is new for everyone and first needs to be experienced, forms the framework for this. However, sometimes the time windows in which the Commission wants reactions from the EOSC Association are very narrow. “I’m glad we have a very strong president of the EOSC Association, who also has the backbone to ensure that we are not always confronted with such short time windows, where reactions are sometimes simply not possible because the subject matter is too complex. But overall it works well”, Tochtermann sums up.

Financing the EOSC Association: 1 billion EUR

For the next ten years, 1 billion EUR is being made available for the development of the EOSC – half from the European Commission and half by the 27 member states of the EU. This was negotiated between the European Commission and the EOSC Association from December 2020 to July 2021 and laid down in an agreement (PDF, the Memorandum of Understanding for the Co-progammed Euroepean Partnership on the European Open Science Cloud.

The EOSC Association also raises further funds through membership fees. According to Klaus Tochtermann: “Members are not individuals, but organisations such as the ZBW or the NFDI Association in Germany. (…) Members can choose between full membership, meaning they can take part in all votes and currently pay a contribution of 10,000 EUR per year. Or they can be an observer, where (…) they have a less active role and are not allowed to vote in the annual general meeting. As an observer, you pay 2,000 EUR.” The contributions of the 200 members currently generate a budget of around 1.5 million EUR for the EOSC Association. This is being utilised to build up staff in the office, among other things.

EOSC, NFDI and Gaia-X: a confusing mishmash?

As well as the EOSC, there are further projects in Germany and Europe aimed at implementing large research data infrastructures. The most well-known from a German perspective are the National Research Data Infrastructure (NFDI) and Gaia-X. All three projects – EOSC, NFDI and Gaia-X are technically linked. They are all technical infrastructures. But how do they differ?

  • National Research Data Infrastructure

    As well as the European EOSC, there is the NFDI (German) in Germany, which was founded by the German Council for Scientific Information Infrastructures (RfII).

    The NFDI – similarly to the EOSC – deals with the technical infrastructure for research data, but is also concerned with the networking people, i.e. the scientific community, says Tochtermann. The NFDI thereby focusses on individual disciplines such as economics, social sciences, material sciences or chemistry.

    The NFDI directorate, a central coordinating body, brings the individual NFDI initiatives together, so that they interact. This takes places through working groups and applies above all to cross-discipline or discipline-independent topics. Klaus Tochtermann gives the following examples:

    • digital long-term archiving of research data,
    • allocation of unique identifiers for a data set,
    • single login or single sign-in for the research data infrastructure NFDI,
    • interoperability of systems,
    • uniform metadata standards and
    • uniform protocols.
  • Gaia-X

    On the other hand, there is Gaia-X: “Gaia-X is an initiative which aims to offer companies in Germany and Europe a European infrastructure for the management, i.e. storage of their data, for example, because many of them opt for services from America or China”, explains Tochtermann. As well as in its target group (including industry, companies), Gaia-X also differs from the EOSC and the NFDI in relation to the major role that the topic of data sovereignty plays in the project. Klaus Tochtermann summarises this as follows: “Data sovereignty means that when I generate data, I can follow who is using my data for what purposes at any time. And if I don’t want this, then I can also say, ’I don’t want my data to go there.’”

How can you learn more about the EOSC?

The EOSC Portal is an information platform that gives details about the services that will be playing a role at the EOSC at a later date. These include services such as European research data repositories. It’s a good place to start if you want to find out more about the EOSC.

Take part in the development of the EOSC

Anyone who wants to get involved in the EOSC can do so in the Advisory Groups. Six of these have been set up initially, to explore topics such as curricula in the field of research data, FAIR data and metadata standards. There was an open call to participate in these groups, for which around 500 applications were received. Most of them came from France (18 percent) and Germany (17 percent) which shows how much the EOSC has already caught on in both countries, says Tochtermann. A selection from these 500 applications will now be used to fill the six working groups.

On the website of the EOSC Association, you will also find regular “Calls and Grants”, which people can apply for, or job applications For up-to-date information, you can subscribe to the monthly newsletter or follow the EOSC Association on Twitter @eoscassociation.

This blogpost is a translation from German.

Related Links

This might also interest you:

  • Episode 12 of the ZBW podcast „The Future is Open Science“ with Prof. Dr Klaus Tochtermann on the European Open Science Cloud (German)
  • The post European Open Science Cloud: small projects, big plans and 1 billion EUR first appeared on ZBW MediaTalk.

    Open Economics Guide: New Open Science Support for Economics Researchers

    by Birgit Fingerle and Guido Scherp

    Open Science represents the best practice for academic work and is a toolkit for “good scientific practice”. In addition to the general benefits of Open Science for the scholarly system and society, Open Science offers many individual benefits for researchers. Among them are a higher visibility of research work and a greater impact in research and society.

    Nevertheless, many researchers in economics and business studies see hurdles and are discouraged from practicing Open Science: A lack of time and of appropriate support are the main reasons for their hesitation. This was revealed by the 2019/2020 study “Die Bedeutung von Open Science in den Wirtschaftswissenschaften – Ergebnisbericht einer Online-Befragung unter Forschenden der Wirtschaftswissenschaften an deutschen Hochschulen 2019” (“The Importance of Open Science in Economics – Result Report of an Online Survey among Researchers in Economics at German Universities 2019”) conducted by the ZBW. See our blog post Open Economics: Study on Open Science Principles and Practice in Economics reporting the studies main findings. Furthermore, the survey on which the study was based expressed a strong desire for support in the form of online materials, especially with regard to Open Science platforms, tools and applications.

    With the new Open Economics Guide (German), the ZBW aims to address these wishes and to support economics and business studies researchers in implementing open practices.

    Support for open science practice

    The Open Economics Guide addresses the challenges and support needs identified in the study. It is based on the perspective and the needs of economics and business studies researchers. It takes into account, for example, that for them lack of time is the top obstacle to Open Science. This is why the texts of the Guide are concise and clear. Therefore, the Open Economics Guide starts with concrete benefits for researchers, for example by recommending first steps for getting started with Open Science easily and quickly to implement.

    Accordingly, where necessary, the content reflects the specifics of economics and business studies research. The Open Economics Guide is also based on systematically reviewed existing content, which it picks up or refers to and recommends where necessary. Since the range of information, tutorials and tools related to Open Science is constantly growing, the Open Economics Guide offers good orientation for researchers and takes up current developments.

    The ZBW has thus designed the Open Economics Guide as the central entry point specifically for Open Science in economics and business studies, initially for German-speaking countries. In the Open Economics Guide, economists can discover how openness enriches their research and how they can benefit from the advantages of open research.

    Quick start, tool overview and knowledge base

    The Open Economics Guide supports economics and business studies researchers with practical tips, methods and tools to practice Open Science independently and successfully and thus to promote their academic career. To this end, the Guide contains, among other things:

    • easy-to-understand quick-start guides to Open Science topics (currently Open Science, Open Access, Open Data and Open Tools),
    • a comprehensive overview of more than 70 tools (German), subdivided by the phases of the research workflow,
    • a growing knowledge database with currently about 100 entries (German) with extensive background information and practical tips on how to proceed,
    • a clear glossary (German), which answers comprehension questions about the most important terms related to open research at a glance.

    Content under open license and further expansion

    The content of the Open Economics Guide is offered under an open license as far as possible. Thus, it can be reused in other contexts according to the principles of Open Science, for example by other libraries for their researchers.

    The Open Economics Guide will be continuously expanded and extended. For instance, further focal points, such as Open Educational Resources and Open Research Software, will be added. All aspects of Open Science relevant to economics and business studies research will be covered. In doing so, a close communication as well as a close cooperation with researchers of economics and business studies will be strived for, in order to develop new contents also jointly. In addition, the guide will aim at an international target group in the future.

    Visit the Open Economics Guide now

    Featured Image: Mockup created by freepik –

    The post Open Economics Guide: New Open Science Support for Economics Researchers first appeared on ZBW MediaTalk.

    ESCAIDE 2019 – A Smörgåsbord of Infectious Disease Epidemiology

    ESCAIDE 2019 [Credit: ECDC](The title is an homage to the host country of the conference – Sweden – where the smörgåsbord is a buffet-style meal served on a large table. In English, the term has also adopted a

    Open data, [open] access: linking data sharing and article sharing in the Earth Sciences

    “INTRODUCTION The norms of a research community influence practice, and norms of openness and sharing can be shaped to encourage researchers who share in one aspect of their research cycle to share in another. Different sets of mandates have evolved to require that research data be made public, but not necessarily articles resulting from that collected data. In this paper, I ask to what extent publications in the Earth Sciences are more likely to be open access (in all of its definitions) when researchers open their data through the Pangaea repository. METHODS Citations from Pangaea data sets were studied to determine the level of open access for each article. RESULTS This study finds that the proportion of gold open access articles linked to the repository increased 25% from 2010 to 2015 and 75% of articles were available from multiple open sources. DISCUSSION The context for increased preference for gold open access is considered and future work linking researchers’ decisions to open their work to the adoption of open access mandates is proposed.”

    Council of the European Union calls for full open access to scientific research by 2020 – Creative Commons blog – Creative Commons

    Science! by Alexandro Lacadena, CC BY-NC-ND 2.0 A few weeks ago we wrote about how the European Union is pushing ahead its support for open access to EU-funded scientific research and data. Today at the meeting of the Council of the European Union, the Council reinforced the commitment to making all scientific articles and data […]

    The post Council of the European Union calls for full open access to scientific research by 2020 appeared first on Creative Commons blog.

    Open Research Glossary

    This glossary is designed to be a resource to inform people about the culture of ‘open scholarship’.

    This resource was written by the community, and depends on the community to stay current. To update this resource please make changes here, and periodically this resource and associated PDF/XML will be updated.

    • Get this as a PDF

    • Get this in XML 

    Share this page with:


    Core Definitions

    Types of Open Access

    Depreciated terms

    Declarations And Principles


    Journal Types

    Peer Review

    Assessment And Metrics

    Tools And Technology

    Data Repositories

    Funders And Policy-Related

    Open Research Infrastructure


    About this resource

    Core Definitions

    • Open Access (OA) – making peer reviewed scholarly manuscripts freely available via the Internet, permitting any user to read, download, copy, distribute, print, search, or link to the full text of these articles, crawl them for indexing, pass them as data to software, or use them for any lawful purpose, without financial, legal or technical barriers other than those inseparable from gaining access to the internet itself. The only constraint on reproduction and distribution, and the only role for copyright in this domain, should be to give authors control over the integrity of their work and the right to be properly acknowledged and cited. May also refer to theses, books, book chapters, monographs and other content. (BOAI)

    • Open Data – making data freely available on the public internet permitting any user to download, copy, analyse, re-process, pass them to software or use them for any other purpose without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself. (Panton Principles)

    • Open Educational Resources (OER) – high quality, openly licensed, online educational materials for sharing, use, and reuse. They act as a mechanism for instructional innovation as networks of teachers and learners share best practices. (Source)

    • Open Source Software (OSS) – availability of source code for a piece of software, along with an open source license permitting reuse, adaptation, and further distribution. (Wikipedia)

    • Article Processing Charge (APC) – a fee charged to the author, creator, or institution to cover the cost of an article, rather than charging the potential reader of the article. APCs may apply to both commercial and Open Access publications. APCs are sometimes charged to authors in order to cover the cost of publishing and disseminating an article in an Open Access scholarly journal. (Source)

    • Repository (article) – an archive to deposit manuscripts. These can be personal, institutional, on websites such as ResearchGate or, or subject-based such as arXiv.

    • Repository (software) – a collection of files managed with version control software (e.g., bzr, hg, git, csv, svn, etc.). Can be hosted by third-party (e.g., github, bitbucket, sourceforge), by an institution, or self-hosted locally.

    • Institutional Repository – An online database designed to collect the intellectual output of a particular institution or university, including digital collections such as electronic theses and dissertations (ETDs), pre-prints, or faculty scholarship, and presents associated metadata regarding the these items. (Source)

    • Embargo period – a length of time imposed on a research output for users who have not paid for access, or do not have institutional access, before it is made freely available.

    • Reproducibility – the similarity between results of a study or experiment and independent results obtained with the same methods but under different conditions (i.e., pertains to results).

    • Repeatability – the similarity between results of a study or experiment and independent results obtained with the same methods and under identical conditions (i.e., pertains to methods and analysis).

    • Publishing – to make a research output available to the public. Commonly refers to the release of works by publishers, irrespective of whether public access is granted or not.

    • Sharing – the joint use of a resource or space. A fundamental aspect of collaborative research. As most research is digitally-authored & digitally-published, the resulting digital content is non-rivalrous and can be shared without any loss to the original creator.

    • Paywall – restriction via a financial barrier to research, often implemented by legacy publishers. Can be removed by personal or institutional subscription. See Loginwall for a barrier that prevents access, without asking for money to unlock access.

    • Funder – an institute, corporation or government body that provides financial assistance for research.

    • Publisher – a company whose purpose is to make the outputs of research publicly available.

    • Creative work – An original, identifiable piece of content, such as an academic paper, a diagram, a photograph, or a video clip. Owners of creative works have rights, such as copyright, that they might reserve to keep control of the content, or relinquish to allow others to share and reuse that content.

    • Intellectual property (IP) – a legal term that refers to creations of the mind. Examples of intellectual property include music, literature, and other artistic works; discoveries and inventions; and phrases, symbols, and designs.

    • Intellectual Property Rights (IPR) – the rights given to the owners of intellectual property. IPR is protected either automatically (eg copyright, design rights) or by registering or applying for it (eg trademarks, patents). Protecting your intellectual property makes it easier to take legal action against anyone who steals or copies it. IPR can be legally sold, assigned or licenced by the creator to other parties, or joint-owned.

    • Copyright – The aspect of Intellectual property that gives creators the right to permit (or not permit) what happens to their creations, as opposed to trademark rights or moral rights.

    • Copyleft – a form of licensing that makes a creative work freely available to be modified, and requiring all modified and extended versions of the creative work to be free as well. Open Access does not require works to be copyleft, nor does it necessarily exclude copyleft works from being open access. The recommended licence (CC-BY) for academic publishing is not copyleft.

    • Subscription – a form of business model whereby a fee is paid in order to gain access to a product or service – in this case, the outputs of scholarly research.

    • Toll access – whereby a fee is required to pass a paywall to access research.

    • Legacy publisher – a publisher that historically has operated on a paywall-based business model.

    • Open access publisher – a publisher that publishes all research articles as open access articles. Most legacy publishers have options to make journals at least partially open access.

    • Open access journal – a journal that exclusively comprises open access articles.

    • Impact – the scale of use of research outputs both inside and outside of academia.

    • Self-archiving – making a copy of a manuscript available through a personal website, institutional repository, or other repository.

    • Scholarly Communication - The creation, transformation, dissemination, and preservation of knowledge related to teaching, research, and scholarly endeavors; the process of academics, scholars and researchers sharing and publishing their research findings so that they are available to the wider academic community. The creation, transformation, dissemination, and preservation of knowledge related to teaching, research, and scholarly endeavors; the process of academics, scholars and researchers sharing and publishing their research findings so that they are available to the wider academic community. (Sources: Wikipedia, University of Pittsburgh)


    Types of Open Access

    • Pre-print – a manuscript draft that has not yet been subject to formal peer review, distributed to receive early feedback on research from peers.

    • Post-print – a manuscript draft after it has been peer reviewed.

    • Version of Record (VOR) – the final version of a manuscript, after peer review and processing by a publishers.

    • Hybrid – a type of journal in which certain articles are made open access for typically a significantly higher price (relative to full OA journals), while others remain toll access.

    • Accepted author manuscript – the version of a manuscript that has been accepted by a publisher for publication.

    • Eprint – a digital version of a research document available online for a repository.


    Depreciated terms

    the use of which is not encouraged as they are typically poorly-understood:

    • Green OA – making a version of the manuscript freely available in a repository.

    • Gold OA – making the final version of manuscript freely available immediately upon publication by the publisher.

    • Gratis OA – the paper is available to read free-of-charge, though its reuse is still restricted, for example by ‘All Rights Reserved’ copyright. (source)

    • Libre OA – the paper is made available under an open licence, allowing it to be shared and reused, depending on which licence is used. (source) (Libre and Gratis refer to copyright and licensing restrictions)

    • Diamond OA – a form of gold open access in which there is no author fee (APC).


    Declarations And Principles

    Taken from:(source)



    • Creative Commons – A suite of licences that set out the rights of authors and users, providing alternatives to the standard copyright. CC licences are widely used, simple to state, machine readable and have been created by legal experts. There are a variety of CC licences, each of which use one or more clauses, examples of which are given below. Some licences are compatible with Open Access in the Budapest sense, and some are not. (Source) (Choosing a license)

    • CC Attribution (BY) – a licence clause that allows the reuse, sharing, and remixing of materials providing the original author is appropriately attributed. Aside from attribution the CC-BY licence has no other restrictions on copying. Compatible with free cultural works.

    • CC NonCommercial (NC) – a licence clause allowing the reuse, sharing, and remixing of materials providing that it is for non-commercial purposes. Not compatible with free cultural works.

    • CC NoDerivatives (ND) – a licence clause requiring that derivatives are not made of the original works. Not compatible with free cultural works.

    • CC ShareAlike (SA) – a licence clause requiring that derivative works have the same licence as the original. Compatible with free cultural works.

    • CC 0 – waiver of copyright; no rights reserved. Places content as openly as possible in the public domain. (Source)

    • BSD (Berkeley Software Distribution) – A family of UNIX-like operating systems. (Wikipedia)

    • GNU GPL (General Public License) – A free copyleft license for software and other kinds of works (Source)

    • Apache License – A free software license by the Apache Software Foundation. (Source)

    • MIT License – An open and permissive software license. (Source)

    • Author Addendum – An author addendum is a supplemental or added agreement to a publishing contract that defines or changes the terms of the contract, often focusing on the transfer of copyright ownership. For authors of scholarly works, an author addendum to a publisher’s standard publication contract may be necessary to help ensure that authors protect important rights, such as the right to post their articles online to a personal website or in a digital repository; the right to use their works within a classroom setting; or the right to use their works as the foundation for future research. (Source)


    Journal Types

    • Megajournal – a journal with editorial criteria based on scientific soundness instead of a priori estimated newsworthiness or ‘impact’.

    • Journal – an aggregation of published research articles. Historically divided into volumes and issues.

    • Overlay journals – An open access, electronic journal that does not produce its own content, but selects and curates groups of articles that are already freely available online. An example of this is an ‘Epijournal’. (Wikipedia)

    • Epub – A free and open e-book standard by the International Digital publishing Forum.

    • Hybrid journal – Some traditional journals offer an option for authors to make their individual articles freely accessible to anyone worldwide, for an additional fee. Other articles in the journal remain accessible only through subscription. Such journals are known as “hybrid journals.” (Source: MIT)

    • Library-based publishing – Many academic libraries are now beginning to act as publishers for scholarly works produced in their institutions and elsewhere.  In some cases, the library works with the university scholarly press to publish works. In other cases, the library publishes works independently or separately from the academic press. Library-based publishers are often strongly in favor of Open Access. (Library Publishing Coalition)


    Peer Review

    • Peer review – a process by which a research article is vetted by experts in community before publication. (Sense About Science)

    • Post publication peer review – standard peer review, but after a research article has been formally published.

    • Transferable peer review – reviews that travel with a paper if it is rejected from a journal. (Wiley pilot)

    • Open review – when reviews are made openly available, typically alongside the article.

    • Signed peer review – when the individual reviews are publicly signed by those who conducted them.

    • Portable peer review – independent peer review that travels with a manuscript that is submitted to subsequent different journals, designed to combat redundancy in the peer review process. (Rubriq)

    • Double blind peer review – when the reviewers don’t know who the authors are, and vice versa.

    • Registered Reports – A type of publication in which peer review of the suggested method is completed prior to data collection and analysis. Accepted papers then are guaranteed publication in the journal if the authors follow through with the registered methodology (Source)


    Assessment And Metrics

    • Altmetrics – Altmetrics are alternative ways of recording and measuring the use and impact of scholarship. Rather than solely counting the number of times a work is cited in scholarly literature, alternative metrics also measure and analyze social media (e.g., Facebook, Twitter, blogs, wikis, etc.), document downloads, links to publishing and unpublished research, and other uses of research literature, in order to provide a more comprehensive measurement of scholarships reach and impact. (Source)

    • Article-level metrics – all types of article-level metrics including download and usage statistics, citations, and article-level altmetrics (Source).

    • Bibliometrics – Bibliometrics is the branch of library and information science concerned with the application of mathematical and statistical analysis to bibliography. Bibliometrics involves the statistical analysis of books, articles, or other publications.

    • Impact factor – a numerical measure that indicates the average number of citations to articles published over the previous two years in a journal, and frequently used as a proxy for a journal’s relative importance.

    • H-index – a personal metric that relates the number of citations to the number of published papers for an academic. (Wikipedia)

    • Journal level metrics – metrics that apply to all papers published within a journal. A common example is Thomson Reuters’ journal impact factor.


    Tools And Technology

    • Extensible Markup Language (XML) – A language that defines a set of rules for encoding documents in a format that is readable by both machines and humans. (Wikipedia)

    • Machine readable – data or metadata in a format that can be understood by a computer.

    • Machine Readable Cataloguing (MARC) – a set of digital formats for the description of items catalogued by libraries. (Wikipedia)

    • Data mining – an analytic process designed to explore data in search of consistent patterns and/or systematic relationships between variables, and then to transform this information into content for future use. (Wikipedia)

    • Content mining – large-scale extraction of information from content (e.g., photographs, videos, audio, metadata), usually involving thousands of items. (The ContentMine)

    • Comma-Separated Values, or Character-Separated Values (CSV) – a plain-text (non-binary) format for tabular data.

    • Hypertext Markup Language (HTML) – the set of markup symbols or codes inserted in a file intended for display on a browser page. (Wikipedia)

    • LaTeX – a markup language for typesetting documents, particularly common in mathematics and the sciences. Many academic journals accept submissions in LaTeX. (Source)

    • Digital Object Identifier (DOI) – a unique text string that is used to identify digital objects such as journal articles or open source software releases. (Source)

    • Journal Article Tag Suite (JATS) – a common XML format in which publishers and archives can exchange journal content. (Source)

    • Uniform Resource Identifier (URI) – a string of characters used to identify a name of a resource to enable its digital and networked representation and interaction. (Wikipedia)

    • GitHub – a web-based service that provides a source code repository that works exclusively with the Git command-line tool. (Source)

    • Git – an open-source, distributed revision control system. (Source)

    • Bitbucket – Free source code hosting site. (Source)

    • IPython notebook – a web-based interactive computational environment where you can combine code execution, text, mathematics, plots and rich media into a single document. (Source)

    • AnnotatorJS / – A framework and application for annotating resources online according to an emerging W3C standard for web annotations. Focus is on scholarly applications. (Source Annotator / Source

    • DSpace – a software for digital open repositories launched by The Massachusetts Institute of Technology (MIT) in 2002. (Source)

    • Flexible Extensible Digital Object and Repository Architecture (FEDORA) – a software for digital repositories launched by The Cornell and Virginia Universities in 2003. (Source)

    • Eprints – a software for open digital repositories to self-archiving launched by Southampton University in 2000. (Source)

    • OAI Media Importer Bot – A computer program, run by Daniel Mietchen, that takes figures and video clips from Open Access articles in PubMed, and copies them to Wikimedia Commons with full attribution of the original paper. This facilitates the reuse of those files in educational materials or Wikipedia articles.

    • Scraping – a computing technique to extract information from websites. (Wikipedia)

    • Scalable Vector Graphics (SVG) – a format for images that is open rather than tied to particular software, resolution-independent (unlike GIF, PNG and JPG), and structured so that with appropriate software it is relatively easy, for example, to translate labels into different languages.

    • Open Journal Systems (OJS) – a journal management and publishing system. (Source)

    • Open Monograph Press – an open source software platform for managing the editorial workflow required to see monographs, edited volumes, and scholarly editions through internal and external review, editing, cataloguing, production and publication. (Source)

    • Open Conference Systems (OCS) – a free Web publishing tool that will create a complete Web presence for scholarly conferences. (Source)

    • Open Harvester Systems – a free metadata indexing system. (Source)

    • ResearcherID – assigns a unique identifier for researchers to manager publication ists, track citations, and avoid author mis-identification. (Source)

    • ORCID – a persistent digital identifier that distinguishes individual researchers. Also supports integration in research workflows. (Source)

    • ProtocolsIO – Up-to-date crowdsourced protocol repository (Source)

    • Publish or Perish – software for retrieving and analysing academic citations. (Source)

    • Open lab notebooks – a concept of blogging about research on a regular basis, such that research notes and data are accumulated and published online as soon as they are obtained. (Wikipedia)

    • Stack Overflow – A Question and Answer site for programming issues. (Source)

    • Markdown – a syntax for adding formatting to documents allowing correctly formatted articles to be written in plain text. (Wikipedia)

    • Etherpad – An online, open source collaborative writing/editing tool operating in real time. (Source)

    • The Open Access Button – Tracks global encounters with paywalls, and helps provide access to papers through a ‘wishlist’. (Source)

    • Open Archives Initiative – Supplies a common framework to web communities that allows them to gain access to content in a standard manner by means of metadata harvesting. (Source)


    Data Repositories

    • Dryad – a curated resource that makes the data underlying scientific publications discoverable, freely reusable, and citable. (Source)

    • Global Biodiversity Information Facility (GBIF) – contains data about all types of life on Earth, published according to common data standards. (Source)

    • Knowledge Network for Biocomplexity (KNB) – a network for the discoverability, access, and interpretation of complex ecological data. (Source)

    • DataONE – a framework and infrastructure for Earth observational data. Source.

    • figshare – a repository where users can make all of their research outputs available in a citable, shareable, and discoverable manner. (Source)

    • Morphbank – an image database documenting a range of specimen-based research, including comparative anatomy and taxonomy. Funded by the National Science Foundation. (Source)

    • Morphobank – a web application for collaborative evolutionary research, specifically phylogenetic systematics or cladistics, involving morphology. (Source)

    • Genbank – the NIH sequence database comprising an annotated collection of all publicly available DNA sequences. Part of the International Nucleotide Sequence Database Collaboration. (Source)

    • UniProt – Central repository of protein sequence and annotation data. (Source)

    • Worldwide Protein Data Bank (wwPDB) – Publicly available repository of macromolecular structural data. (Source)

    • A list of data repositories approved by F1000Research.

    • Map of Open Educational Resource Repositories – Map containing locations for items in the directory of Open Educational Resource Repositories. (Source)

    • Zenodo – An all-purpose free to use repository for all research outputs. DOIs and flexible licensing. (Source)

    • Open Science Framework – A tool created by the Center for Open Science for scientists. It is both a research and workflow management tool and open repository. Their goal is to link up the entire research ecosystem, from conception through publication. They give the user full control over the openness of their work and allow for the creation of registrations, which can be used when submitting registered reports. (Source)

    • – a global registry of research data repositories from different academic disciplines. (Source)

    • Databib – a searchable registry of research data repositories (Source) [Note that the Databib and registries will merch by the end of 2015]


    Funders And Policy-Related

    • Publicly funded research – refers to research which is, at least in part, funded by Governments, often through Research Councils.

    • Research Councils UK (RCUK) – The primary government research funding body in the UK. (Open Access policy)

    • Joint Information Systems Committee (JISC) – A UK educational charity, formerly the of HEFCE but now independent. Provides expertise to universities, colleges and cultural institutions on the use of technology to support research, including publication models, repositories, licensing, and infrastructure. (Source)

    • National Institute of Health (NIH) – The national medical research agency in the USA. (Public Access policy)

    • National Science Foundation (NSF) – an independent federal agency in the USA for the funding of research. (Public Access policy)

    • Higher Education Funding Council for England (HEFCE) – a funding body for higher education, universities and colleges in England. (Open Access policy)

    • Wellcome Trust – A life sciences funding body in the UK (Open Access policy)

    • The Research Excellence Framework (REF) – An initiative to assess researchers in the UK. Coordinated by HEFCE.

    • Gates Foundation – A funding body co-ordinated by Melinda and Bill Gates. (Open Access policy)

    • Max Planck Society – a German research organisation with 82 Institutes worldwide. (Open Access policy)

    • CrossRef – an association of scholarly publishers that develops shared infrastructure to support more effective scholarly communication. (Source)

    • Public Knowledge Project (PKP) – a multi-university initiative developing free, open source software and conducting research to improve the quality and reach of scholarly publishing. (Source)

    • Scholarly Publishing and Academic Resources Coalition (SPARC) – an international alliance of academic and research libraries working to create a more open system of scholarly communication. (Source)

    • Open Access Scholarly Publishers Association (OASPA) – represents the interests of open access journal and book publishers in all scientific, technical, and scholarly disciplines. (Source)

    • Mandate – an authority to carry out a policy. In this context, largely to conform to open access policies.

    • OpenAIRE – a pan-European infrastructure that supports the EC’s Open Access Mandate in Horizon2020. All publications funded by the EC should be made available in Open Access and OpenAIRE harvests from a range of data sources namely repositories, OA publishers. (Source)

    • Department of Energy (DOE) – A federal agency addressing US energy, environment, and nuclear challenges. (Public Access Policy)


    Open Research Infrastructure

    • Google Scholar – a freely accessible search engine for indexing the scholarly literature across an array of publishing formats and disciplines. (Source)

    • Directory of Open Access Repositories (OpenDOAR) – a directory of academic open access repositories. Also has a search function for repositories and repository contents. (Source)

    • Directory of Open Access Journals (DOAJ) – a directory indexing open access peer-reviewed journals (Source)

    • Registry of Open Access Repositories (ROAR) – a registry for open access repositories, hosted by the University of Southampton, UK. (Source)

    • Registry of Research Data Repositories – An open science tool that serves as a global registry of research data repositories. (Source)

    • PubMed – a repository comprising more than 24 million citations for the biomedical literature. (Source)

    • PubMed Central (PMC) – a free full-text archive of biomedical and life sciences journal literature at the US National Institutes of Health’s Library of Medicine. (Source)

    • Europe PubMed Central (EuroPMC) – Based on PubMed Central, and part of a network of repositories supported by funders of life sciences and biomedical research. (Source)

    • Repository 66 – a mashup of data from ROAR and OpenDOAR overlayed onto Google maps.(Source)

    • Scientific Electronic Library Online (SciELO) – a programme started in Brazil in 1998 which has now expanded to 15 other countries, developed by FAPESP (Fundação de Amparo à Pesquisa do Estado de São Paulo) and BIREME (Centro Latino-americano e do Caribe em Informação em Ciências da Saúde). The objectives are to develop a common methodology for the preparation, storage and dissemination of scientific literature, including standardised evaluation and quality control processes. This comprises a model for cooperative electronic publication of scientific periodicals on the internet using organised bibliographic databases with full text access. (Source (Portuguese Language) (English))

    • Securing a Hybrid Environment for Research Preservation and Access – Rights of MEtadata for Open archiving (SHERPA-RoMEO) – a tool to check what the self-archiving policies for individual journals are. (Source)

    • Connecting Repositories (CORE) – a collection of open access repositories. (Source)

    • Paperity – a multidisciplinary aggregator of open access journals and papers, Gold and Hybrid. Aims to include ultimately 100% of open access literature. (Source)


    • Open Access Movement (OAM) – a global movement started in the late 1990s and early 2000s fuelled by the widespread public access to the World Wide Web. Its prime objective is the free and unrestricted access and reuse of the world’s knowledge.

    • Open Archives Initiative (OAI) – develops and promotes interoperability standards that aim to facilitate the efficient dissemination of content. (Source)

    • Openwashing – having an appearance of open-source and open-licensing for marketing purposes, while continuing proprietary practices. Coined by Audrey Watters.

    • Curation – the selection, preservation, maintenance, collection and archiving of [digital] assets. Curation establishes, maintains, and adds value to repositories of digital data for present and future use. (Wikipedia)

    • Typesetting – the composition of text by arranging physical pieces of type or by using software to prepare a version of the text suitable for printing. Stored letters and other symbols are retrieved and ordered according to a language’s orthography (conventional spelling system of a language) for visual display.

    • Copy editing – a type of editing designed to improve the formatting, style, and accuracy of text. It usually does not involve changing the content of the original text.

    • Annotation – a comment with specific location and context, either inline or in the margin of a text document, or within a region of an image or video, or located within a specific row or cell of data in a data set.  

    • Citation – a reference to a published or unpublished source embedded in content, for the purposes of acknowledging the work and relevance of others to the topic of discussion where the citation appears.

    • References – defines a relationship between one object, a designator, and a second object, a source. Usually takes the form of a bibliography of academic papers at the end of a research manuscript.

    • Submission fee – a fee levied by some publishers for submitting a manuscript to their journals.

    • Accessibility – refers to the degree of access. Defined by an end-user basis, depending on their ability to understand or reuse content.

    • Mixed citation – a textual, bibliographic description of a work that is cited within text.

    • Data archiving – the process of moving data to a storage device for long-term preservation. (Wikipedia)

    • Computational reproducibility – when publishing computational findings, include details and access to the underlying code, data, and implementation.

    • Empirical reproducibility – the reproduction of results to obtain ‘verifiable facts’, through improving existing communication standards and reporting.

    • Statistical reproducibility – validating the statistical results, errors, and confidence measures in research. Also the statistical assessment of repeated results for validation purposes. (post from Victoria Stodden)

    • Loginwall – the requirement to log in to a system in order to access content.

    • Shibboleth – a single sign-in system for computer networks and services on the open Internet. (Wikipedia)

    • Athens – A sign-in system that provides access to library resources. (Source)

    • Symplectic – A world-leading products and services company specialised in research information management. Their flagship system Elements, is used by a number of the world’s research institutions. (Source)

    • Journal to Wiki publication (J2W) – Copying text from a published paper to a wiki (such as Wikipedia or Wikibooks), with attribution: legally possible if the licence of the paper is less restrictive than the licence of the wiki.

    • Wiki to Journal publication (W2J) – Creating a paper on a wiki, using its features for collaboration and informal review, for submission to a journal for formal peer review. Might involve a public wiki such as Wikipedia or Wikiversity, or a specially-created wiki.

    • Fee waiver – If an institution, research funder or author cannot pay for an Article Processing Charge, many publishers or journals will offer partial or total waiving for fees.

    • Derivative work – A work based upon one or more pre-existing works, such as a translation, musical arrangement, dramatization, fictionalization, motion picture version, sound recording, art reproduction, abridgment, condensation, or any other form in which a work may be recast, transformed, or adapted. (Source)

    • Double-dipping – In the context of Open Access, double-dipping occurs when a journal has an article processing charge (APC) for publishing an author’s work, as well as requiring payment (usually through a subscription fee) by the potential user of the work. This model makes the institution or author pay twice to access the work. (Source)

    About this resource

    This resource is lightly edited from an original created by created by Jon Tennant and Ross Mounce. This material is licensed under a CC-0 license. We strongly encourage the distribution and re-use of this material.

    Version 2.0: Released [12 July 2015]

    Additional contributors:

    Richard Iannone, Chealsye Bowley, Martin Poulter, Matt Hall, Priscilla Ulguim, Lou Woodley, Sibele Fausto, Nazeefa Fatima, Karen Cranston, Lauren B. Collister, Alasdair Taylor, Matt Menzenski, Patricia Herterich.

    Please note: The Right to Research Coalition does not endorse the accuracy or completeness of this material.

    To the extent possible under law, Jon Tennant has waived all copyright and related or neighboring rights to Open Research Glossary . This work is published from: United Kingdom.











    Open Access,

    Open Data,

    Open Education

    OpenCon 2015 Applications are Open!

    This was originally posted at:

    Applications to attend OpenCon 2015 on November 14-16 in Brussels, Belgium are now open! The application is available on the OpenCon website at and includes the opportunity to apply for a travel scholarship to cover the cost of travel and accommodations. Applications will close on June 22nd at 11:59pm PDT.

    OpenCon seeks to bring together the most capable, motivated students and early career academic professionals from around the world to advance Open Access, Open Education, and Open Data—regardless of their ability to cover travel costs.  In 2014, more than 80% of attendees received support.  Due to this, attendance at OpenCon is by application only.

    Students and early career academic professionals of all experience levels are encouraged to apply.  We want to support those who have ideas for new projects and initiatives in addition to those who are already leading them.  The most important thing is an interest in advancing Open Access, Open Education, and Open Data and a commitment to taking action. We also hope to use applications to connect applicants with opportunities for collaboration, local events in your area, and scholarship opportunities to attend other relevant conferences.

    OpenCon is equal parts conference and community.  The meeting in Brussels serves as the centerpiece of a much larger network to foster initiatives and collaboration among the next generation across OpenCon’s issue areas.  Become an active part of the community by joining our discussion list, tuning in for our monthly community calls and webcasts, or hosting an OpenCon satellite event in your community.

    Apply now, and join the OpenCon community today!

    About OpenCon:

    Hosted by the Right to Research Coalition and SPARC, OpenCon 2015 will bring together students and early career academic professionals from across the world to learn about the issues, develop critical skills, and return home ready to catalyze action toward a more open system for sharing the world’s information — from scholarly and scientific research, to educational materials, to digital data.  OpenCon 2015 will be held on November 14-16 in Brussels, Belgium.

    OpenCon 2015’s three day program will begin with two days of conference-style keynotes, panels, and interactive workshops, drawing both on the expertise of leaders in the Open Access, Open Education and Open Data movements and the experience of participants who have already led successful projects.

    The third day will take advantage of the location in Brussels by providing a half-day of advocacy training followed by the opportunity for in-person meetings with relevant policy makers, ranging from the European Parliament, European Commission, embassies, and key NGOs. Participants will leave with a deeper understanding of the conference’s three issue areas, stronger skills in organizing local and national projects, and connections with policymakers and prominent leaders across the three issue areas.

    OpenCon 2015 builds on the success of the first-ever OpenCon meeting last year which convened 115 students and early career academic professionals from 39 countries in Washington, DC.  

    Speakers at OpenCon 2014 included the Deputy Assistant to the President of the United States for Legislative Affairs, the Chief Commons Officer of Sage Bionetworks, the Associate Director for Data Science for the U.S. National Institutes of Health, and more than 15 students and early career academic professionals leading successful initiatives. OpenCon 2015 will again feature leading experts, and the program will be announced in the coming months.



    Early Career Researchers,

    Open Access,

    Open Data,

    Open Education,



    New PLOS Open data policy

    PLOS one logoPLOS has announced some changes to their publishing policies, and these changes are great news.  The new PLOS policies will go a significant way towards encouraging open data and open source.  Although the announcement itself is somewhat vague on the subject of source code, the actual PLOS One Sharing Policy is excellent:

    …if new software or a new algorithm is central to a PLOS paper, the authors must confirm that the software conforms to the Open Source Definition, have deposited the following three items in an open software archive, and included in the submission as Supporting Information:

    • The associated source code of the software described by the paper. This should, as far as possible, follow accepted community standards and be licensed under a suitable license such as BSD, LGPL, or MIT (see for a full list). Dependency on commercial software such as Mathematica and MATLAB does not preclude a paper from consideration, although complete open source solutions are preferred.
    • Documentation for running and installing the software. For end-user applications, instructions for installing and using the software are prerequisite; for software libraries, instructions for using the application program interface are prerequisite.
    • A test dataset with associated control parameter settings. Where feasible, results from standard test sets should be included. Where possible, test data should not have any dependencies — for example, a database dump.

    However, the one loophole is that they allow for code that runs on closed source platforms in “common use by the readership”  (e.g. MATLAB), although it must run without dependencies on proprietary or otherwise unobtainable ancillary software.  That “common use” loophole could potentially be a mile wide in some fields.  Is Gaussian a common use platform in computational chemistry and therefore exempt from this new policy?   If so, the policy is a bit toothless.  I’d like to see the limits and bounds of the “common use” loophole more clearly stated.

    The announcement makes PLOS ONE a much more attractive place to send our next paper.

    OpenScience comes of age

    In 1998, Open Science seemed like a pretty obvious projection of basic scientific principles into the digital age.  I didn’t think the ideas would meet much, if any, resistance from the scientific community.   And in October 1999, Brookhaven National Lab sponsored a meeting called Open Source / Open Science that, in retrospect, was a pretty utopian gathering.  There were a lot of the current OpenScience community members present at the meeting (notably Brian Glanz and Greg Wilson).   It felt like everyone would be convinced to do Open Source & Open Data science in short order.

    The past 14 years have been instructive in just how long it can take to make cultural changes in the scientific community.

    So, it was an amazing experience to be present when the Office of Science and Technology Policy (OSTP) announced the Champions of Change for Open Science.  These are 13 incredible individuals and organizations with great stories about sharing their science.  It feels like we’ve made significant motion on implementing policies that are friendly to Open Science.   I should note that we’re particularly happy to see OSTP use the phrase Open Science, and not the more narrow terms: Open Data or Open Access.  I’m hopeful that Open Source will also be part of science policy going forward.

    openscipostersThere was a second group who got the opportunity to present at this event at a poster session later that day.  I haven’t seen the list publicized elsewhere, but these are some sharp folks who deserve recognition for their work.  I’m going to highlight some of these in the coming week.  Here’s the list of posters:

    1. Richard Judson & Ann Richard from the National Center for Computational Toxicology presented on “ACToR & DSSTox: EPA Open Information Tools for Chemicals in the Environment”
    2. Tom Bleier, Clark Dunson & Michael Lencioni from the QuakeFinder project presented on “Electromagnetic Earthquake Forecasting Research”
    3. David C. Van Essen from WUSTL presented on the “Human Connectome Project
    4. Heather Piwowar & Jason Priem presented a poster on “ImpactStory: Open Carrots for Open Science”
    5. Jean-Claude Bradley (Drexel) and Andrew Lang (Oral Roberts University) presented a poster on “Open Notebook Science“.
    6. Dan Gezelter (that’s me) presented on “The OpenScience Project“.
    7. John Wilbanks from Sage Bionetworks presented on “Portable Legal Consent – Let Patients Donate Data to Science
    8. Matt Martin from the National Center for Computational Toxicology presented on “ToxRefDB & ToxCastDB: High-Throughput Toxicology Resources”
    9. Brian Athey and Christoph Brockel presented on “The tranSMART Platform: Accelerating Open Science, Data Analytics and Data Sharing”
    10. Alexander Wait Zaranek, Ward Vandewege & Jonathan Sheffi from Clinical Future, Inc. presented on “Transparent Informatics: A Foundation for Precision Medicine

    It was an intense day, and I’m delighted that Open Science has finally come of age.

    OpenAPIs for scientific instrumentation?

    382119_573424529339454_1784469895_nAn interesting question from Dale Smith:  Are there OpenAPIs for remote sensing and monitoring of scientific instruments?  Dale pointed us at this very cool RSOE EDIS alert map as an example of what could be possible with distributed consumer-grade sensors that had OpenAPIs.   I can imagine a number of very cool things that could be done with distributed weather or earth motion sensors.  Are there software tools out there that make querying these sensors easy?

    (One suggestion,  however, would be for the RSOE EDIS to look for a slightly less ominous-sounding motto).

    OpenScience poster


    I’m giving a poster in a few days about, and it has been a very long time since I’ve had to make a poster.  This one turned out quite text-heavy, but I wanted to make a few arguments that seemed difficult or impossible to translate into graphics.   A PDF (9.3 MB) of the draft is available by clicking the image on the right…

    Comments and suggestions, as always, are quite welcome.