NOAA Open Data Dissemination: Petabyte-scale Earth system data in the cloud | Science Advances

Abstract:  NOAA Open Data Dissemination (NODD) makes NOAA environmental data publicly and freely available on Amazon Web Services (AWS), Microsoft Azure (Azure), and Google Cloud Platform (GCP). These data can be accessed by anyone with an internet connection and span key datasets across the Earth system including satellite imagery, radar, weather models and observations, ocean databases, and climate data records. Since its inception, NODD has grown to provide public access to more than 24 PB of NOAA data and can support billions of requests and petabytes of access daily. Stakeholders routinely access more than 5 PB of NODD data every month. NODD continues to grow to support open petabyte-scale Earth system data science in the cloud by onboarding additional NOAA data and exploring performant data formats. Here, we document how this program works with a focus on provenance, key datasets, and use. We also highlight how to access these data with the goal of accelerating use of NOAA resources in the cloud.

Open Source Infrastructure Engineer: Site Reliability Engineering and Cloud Infrastructure | 2i2c

“2i2c manages, supports, and builds community-centric infrastructure for interactive computing in the cloud with partner communities in research and education.

We’re looking for an Open Source Infrastructure Engineer that will join our Site Reliability Engineering team and make our cloud infrastructure more reliable, scalable, and efficient. It will help build a future of data-intensive scientific research and democratize the design and access to cloud-based resources for research and education purposes….”

New project: Open science cloud infrastructure and training for communities in Latin America and Africa

“We are excited to share that the grant proposal that the IOI team contributed to, titled “A Collaborative Interactive Computing Service Model for Global Communities”, has been awarded funding by the Chan Zuckerberg Initiative….

The goal of this proposal is to create a collaborative cloud infrastructure service that enables community-based cloud-native workflows in the biosciences. Together with our collaborators, we will promote values of open and inclusive community practices, infrastructure that enables these practices, and a “train-the-trainers” approach that empowers community leaders to share expertise in cloud infrastructure with others in their communities. Our focus will be on communities in Latin America and Africa, and we hope to learn how this model could be extended to other global communities that are historically marginalized from large-scale scientific infrastructure projects….”


Abstract: Open Science is the trend that aims to make scientific research and its dissemination available to all levels of the society, amateur or professional. It is considered the future of Scientific Research, and all the research institutions are transitioning towards Open Science. In Europe, European Open Science Cloud is fostering the transition towards it by creating a set of rules and guidelines to be followed to make Research Data accessible by all EU researchers through interoperable FAIR data. From all the European Countries transitioning towards Open Science, Balkan countries are newly joining the transition. This is especially true for Albanian Institutions. In this dissertation, Open Science is analyzed in detail. Starting from what is Open Science and moving on to Open Access, all key components of Open Science are explained. Open Data is also described and compared to FAIR data. Open Source is also described and the concept of Digital Repositories is detailed, as a key element in storing research data. Storing research data is an important task for Open Science. Next, the European view on Open Science is introduced with the European Open Science Cloud (EOSC), moving to the most used Research Data Digital Repositories. EOSC defines the guidelines to be followed in Europe to be compliant for future collaboration in Open Science, thus a complete understanding of EOSC is necessary to continue the work. After that, the current state of Open Science in Balkan countries is captured, focusing on identifying the stages in the transition to Open Science and main problems faced by Balkan universities. To do so, a questionnaire is distributed to all relevant institutions and analyzed in detail. Key findings are found and the next steps are planned. The careful analysis illustrates the need for the Albanian Open Science Cloud (AOSC). The goal of AOSC is to make Albanian Science open and to help Albania join the EOSC initiative. A prototype deployed in Albania is presented, following the European standards set by European Open Science Cloud. Albanian-CRIS is the repository that will help the build of Albanian Open Science 3 Cloud, to follow the transition to Open Science, and to join EOSC. The data structure of the repository is illustrated. The analysis also indicates a need for an Open Science Policy to be implemented in Albanian Universities. The policy is presented and is taken into consideration by Albanian Research Institutions for implementation. The Open Access Mandate aspires to an Open Science transition and considers it as a critical component in enhancing the relevance of research on the Albanian community. The intention of Albania’s first Open-Science Policy and Open Access Digital Repository is to make research data FAIR and make knowledge publicly open to all Albanian researchers. In conclusion, this dissertation describes the detailed transition of Albanian Universities into Open Science and the next steps taken to foster the future of the research. 

Nightingale Open Science

“Nightingale Open Science is a platform that connects researchers with world-class medical data. We work closely with health systems around the world to create and curate datasets of medical images linked to ground-truth labels. We carefully deidentify the data and make it available for non-profit research on our cloud infrastructure….

Unfortunately, existing medical data with the potential to shed light on these patterns have historically been siloed. By making this data accessible to broad groups of interdisciplinary researchers, we can begin to unlock discoveries that save lives, surfacing previously unknown patterns of disease….”


EOSC activities update and UK engagement – Research

“The European Open Science Cloud (EOSC) is a European Commission (EC) initiative to support the development of open science and the digital transformation of research in Europe and further afield. Now in its implementation phase, it aims to develop a “web” of FAIR data and services, providing a multi-disciplinary environment where researchers can publish, find and re-use data, tools and services. The EOSC is complementary to UK efforts to define and adopt open science policies and practices, and the UK contributes to development of the EOSC through participation in implementation projects and in the EOSC Association, a legal entity established to govern the European Open Science Cloud.

As part of its Tech 2 Tech series, Jisc held an EOSC webinar in March 2021 which helped to confirm strong interest in the EOSC across the UK research community. Another Jisc webinar about EOSC will be held on 15 December. This blog provides an update on the numerous activities which have been taking place as part of the ongoing development of the EOSC, and UK engagement with them….

An update on developments in the European Open Science Cloud | Jisc

“This online European Open Science Cloud (EOSC) event will provide an update on developments since our previous EOSC online event and expand on information in our recent blog post.

You’ll get information about:

Developments in the EOSC Association
The work of the new EOSC Advisory Groups and Task Forces
What’s happening in some of the EOSC implementation projects
Ways you can become involved in EOSC….”

Global Open Science: Virtual SciDataCon 2021 Strand – CODATA, The Committee on Data for Science and Technology

“Virtual SciDataCon 2021 is organised around a number of thematic strands.  This is the third of a series of announcements presenting these strands to the global data community. Please note that registration is free, but participants must register for each session they wish to attend.

For some time there has been recognition of the need for investment in domain specific research infrastructures at a national and sometimes regional level. In recent years, in some countries and regions, there has been a move towards research infrastructures that are both vertically and horizontally integrated: vertically, in the sense that they aim to bring generic e-infrastructure closer to research communities’ needs; horizontally, in the sense that they explicitly aim, by embracing principles of Open Science and FAIR data, to better facilitate interdisciplinary research. Examples include, but are not limited to, the European Open Science Cloud (EOSC), the China Science and Technology Cloud (CSTCloud), the Australian Research Data Commons (ARDC), the Malaysian Open Science Platform, the African Open Science Platform, the planned broadening of LA Referencia in Latin America, as well as Canada’s NDRIO and Germany’s NFDI   The major international data organisation that collaborate in Data Together have complementary activities to define a model for Open Research Commons and to encourage cooperation, alignment and interoperability between Open Science Clouds….”

Developing Cooperation and Alignment Between Open Science Clouds | 27 October 2021 | SciDataCon session

Date: Oct. 27, 2021

Time: 11:00 – 12:30

“Session Title: Developing Cooperation and Alignment Between Open Science Clouds: governance and sustainability, policy and legal, technical infrastructure, data interoperability

Session Organisers: Simon Hodson

Session Description:

Register for the session:

This interactive workshop session will provide an overview of the activities of four thematic working groups established by the Global Open Science Cloud project.  Each Working Group will give a short presentation, focusing on the areas which it has identified to share information, develop cooperation and to explore alignment.  The presentations will be followed by structured discussion.  We invite participants to make recommendations for this work and to help identify areas where cooperation can be supported by the Working Groups….”

Agreements on Croatian Open Science Cloud Initiative Awarded to Institutions

“The agreements regulate resources and services necessary for the collection, processing, storage, dissemination and availability of research data.

This initiative is the result of years-long joint efforts of many stakeholders from the science and tertiary education in the open science movement, and the initiative was launched with the support of the Ministry of Science and the Croatian Science Foundation.

It creates preconditions for developing the Croatian open science cloud that will enable coordinated development of the country’s e-infrastructure.


The initiative will bring together relevant stakeholders in creating required preconditions for the implementation, realisation, and promotion of open science….”

In a First, Global Climate Models Will Be Made Available Via the Cloud

“Until now, the most advanced climate models have mostly been available to researchers in the wealthiest countries.

New program will see Amazon Web Services’ advanced cloud technologies host 30 climate model simulations and make them available to researchers around the globe….

The resulting free, open access dataset will allow research teams internationally to skirt one of the major barriers to specialized climate modeling, even for those who have the computing capacity to make it happen: cost. Wanser said running the 30 simultaneous simulations would normally cost roughly $700,000, and take two months to run. 

The AWS program will cover all costs associated with hosting and sharing data from the cloud, and accessing and downloading it will be free. Grants will be available to users who choose to analyze or run additional models on AWS.”

Renowned Digital Humanities Researchers Begin Computing Cultural Heritage in the Cloud | Library of Congress

“Three renowned researchers in digital humanities and computer science are joining forces with the Library of Congress on three inaugural Computing Cultural Heritage in the Cloud projects, exploring how biblical quotations, photographic styles and “fuzzy searches” reveal more about the collections in the world’s largest Library than first meets the eye.

Supported by a $1 million grant from the Andrew W. Mellon Foundation awarded in 2019, the initiative combines cutting edge technology with the Library’s vast collections to support digital humanities research at scale. These three outside researchers will collaborate with subject matter experts and technology specialists at the Library of Congress to experiment in pursuit of answers that can only be achieved with collections and data at scale. These collaborations will enable research on questions previously difficult to address due to technical and data constraints. Expanding the skills and knowledge necessary for this work will enable the Library to support emerging methods in cloud-based computing research such as machine learning, computer vision, interactive data visualization, and other areas of digital humanities and computer science research. As a result, the Library and other cultural heritage institutions may build upon or adapt these approaches for their own use in improving access to text and image collections….”

Dockstore: enhancing a community platform for sharing reproducible and accessible computational protocols | Nucleic Acids Research | Oxford Academic

Abstract:  Dockstore ( is an open source platform for publishing, sharing, and finding bioinformatics tools and workflows. The platform has facilitated large-scale biomedical research collaborations by using cloud technologies to increase the Findability, Accessibility, Interoperability and Reusability (FAIR) of computational resources, thereby promoting the reproducibility of complex bioinformatics analyses. Dockstore supports a variety of source repositories, analysis frameworks, and language technologies to provide a seamless publishing platform for authors to create a centralized catalogue of scientific software. The ready-to-use packaging of hundreds of tools and workflows, combined with the implementation of interoperability standards, enables users to launch analyses across multiple environments. Dockstore is widely used, more than twenty-five high-profile organizations share analysis collections through the platform in a variety of workflow languages, including the Broad Institute’s GATK best practice and COVID-19 workflows (WDL), nf-core workflows (Nextflow), the Intergalactic Workflow Commission tools (Galaxy), and workflows from Seven Bridges (CWL) to highlight just a few. Here we describe the improvements made over the last four years, including the expansion of system integrations supporting authors, the addition of collaboration features and analysis platform integrations supporting users, and other enhancements that improve the overall scientific reproducibility of Dockstore content.