Pangeo Forge: Crowdsourcing Open Data in the Cloud :: FOSS4G 2022 general tracks :: pretalx

“Geospatial datacubes–large, complex, interrelated multidimensional arrays with rich metadata–arise in analysis-ready geopspatial imagery, level 3/4 satellite products, and especially in ocean / weather / climate simulations and [re]analyses, where they can reach Petabytes in size. The scientific python community has developed a powerful stack for flexible, high-performance analytics of databcubes in the cloud. Xarray provides a core data model and API for analysis of such multidimensional array data. Combined with Zarr or TileDB for efficient storage in object stores (e.g. S3) and Dask for scaling out compute, these tools allow organizations to deploy analytics and machine learning solutions for both exploratory research and production in any cloud platform. Within the geosciences, the Pangeo open science community has advanced this architecture as the “Pangeo platform” (

However, there is a major barrier preventing the community from easily transitioning to this cloud-native way of working: the difficulty of bringing existing data into the cloud in analysis-ready, cloud-optimized (ARCO) format. Typical workflows for moving data to the cloud currently consist of either bulk transfers of files into object storage (with a major performance penalty on subsequent analytics) or bespoke, case-by-case conversions to cloud optimized formats such as TileDB or Zarr. The high cost of this toil is preventing the scientific community from realizing the full benefits of cloud computing. More generally, the outputs of the toil of preparing scientific data for efficient analysis are rarely shared in an open, collaborative way.

To address these challenges, we are building Pangeo Forge (, the first open-source cloud-native ETL (extract / transform / load) platform focused on multidimensional scientific data. Pangeo Forge consists of two main elements. An open-source python package–pangeo_forge_recipes–makes it simple for users to define “recipes” for extracting many individual files, combining them along arbitrary dimensions, and depositing ARCO datasets into object storage. These recipes can be “compiled” to run on many different distributed execution engines, including Dask, Prefect, and Apache Beam. The second element of Pangeo Forge is an orchestration backend which integrates tightly with GitHub as a continuous-integration-style service….”

Planet Research Data Commons Consultation Roundtables Tickets, Multiple Dates | Eventbrite

“The ARDC would like to invite environmental researchers and decision makers to a consultation roundtable for the Planet Research Data Commons.

The Planet Research Data Commons will deliver shared, accessible data and digital research tools that will help researchers and decision makers tackle the big challenges facing our environment, which include adapting to climate change, saving threatened species, and reversing ecosystem deterioration.

We invite environmental researchers and decision makers to get involved in the consultations for the Planet Research Data Commons to help guide the development of the new digital research infrastructure.

The Planet Research Data Commons is the second of 2 pilot Thematic Research Data Commons launching in the 2022-23 financial year with an initial budget of $15.8m. The first pilot, the People Research Data Commons, is focused on digital research infrastructure for health research. The Planet Research Data Commons will explore the digital research infrastructure needs for research challenges set out in the 2021 National Research Infrastructure Roadmap, including environment and climate resilience.

The Planet Research Data Commons will support environmental researchers to develop cross-sector and multi-disciplinary data collaborations on a national scale. It will integrate underpinning compute, storage infrastructure and services with analysis platforms and tools that are supported by expertise, standards and best practices. And it will bring together data from a range of sources to tackle the big questions….”

Resolving the location of small intracontinental earthquakes using Open Access seismic and geodetic data: lessons from the 2017 January 18 m<SUB>b</SUB> 4.3, Ténéré, Niger, earthquake – NASA/ADS

Abstract:  A low-magnitude earthquake was recorded on 2017 January 18, in the Ténéré desert in northern Niger. This intraplate region is exceptionally sparsely covered with seismic stations and the closest open seismic station, G.TAM in Algeria at a distance of approximately 600 km, was unusually and unfortunately not operational at the time of the event. Body-wave magnitude estimates range from mb 4.2 to mb 4.7 and both seismic location and magnitude constraints are dominated by stations at teleseismic distances. The seismic constraints are strengthened considerably by array stations of the International Monitoring System for verifying compliance with the Comprehensive Nuclear Test-Ban-Treaty. This event, with magnitude relevant to low-yield nuclear tests, provides a valuable validation of the detection and location procedure for small land-based seismic disturbances at significant distances. For seismologists not in the CTBT system, the event is problematic as data from many of the key stations are not openly available. We examine the uncertainty in published routinely determined epicentres by performing multiple Bayesloc location estimates with published arrival times considering both all published arrival times and those from open stations only. This location exercise confirms lateral uncertainties in seismologically derived location no smaller than 10 km. Coherence for interferometric synthetic aperture radar in this region is exceptionally high, and allows us to confidently detect a displacement of the order 6 mm in the time frame containing the earthquake, consistent with the seismic location estimates, and with a lateral length scale consistent with an earthquake of this size, allowing location constraint to within one rupture length (?5 km)-significantly reducing the lateral uncertainty compared with relying on seismological data only. Combining Open Access-only seismological and geodetic data, we precisely constrain the source location, and conclude that this earthquake likely had a shallow source. We then discuss potential ways to continue the integration of geodetic data in the calibration of seismological earthquake location.


Frontiers | Rethinking the A in FAIR Data: Issues of Data Access and Accessibility in Research

“The FAIR data principles are rapidly becoming a standard through which to assess responsible and reproducible research. In contrast to the requirements associated with the Interoperability principle, the requirements associated with the Accessibility principle are often assumed to be relatively straightforward to implement. Indeed, a variety of different tools assessing FAIR rely on the data being deposited in a trustworthy digital repository. In this paper we note that there is an implicit assumption that access to a repository is independent of where the user is geographically located. Using a virtual personal network (VPN) service we find that access to a set of web sites that underpin Open Science is variable from a set of 14 countries; either through connectivity issues (i.e., connections to download HTML being dropped) or through direct blocking (i.e., web servers sending 403 error codes). Many of the countries included in this study are already marginalized from Open Science discussions due to political issues or infrastructural challenges. This study clearly indicates that access to FAIR data resources is influenced by a range of geo-political factors. Given the volatile nature of politics and the slow pace of infrastructural investment, this is likely to continue to be an issue and indeed may grow. We propose that it is essential for discussions and implementations of FAIR to include awareness of these issues of accessibility. Without this awareness, the expansion of FAIR data may unintentionally reinforce current access inequities and research inequalities around the globe.”



Modern geoscience publishing – GEOSCIENTIST

“The preprint is the initial version of a research article, often (but not always) before submission to a journal and before formal peer-review. Preprints help modernise geoscience by removing barriers that inhibit broad participation in the scientific process, and which are slowing progress towards a more open and transparent research culture. …

Preprints have many well-documented benefits for both researchers and the public (e.g., Bourne et al., 2017; Sarabipour et al., 2019; Pourret et al., 2020). For example, preprints enable:


• Rapid sharing of research results, which can be critical for time-sensitive studies (such as after disasters), as well as for early career researchers applying for jobs, or any academic applying for grants or a promotion, given that journal-led peer review can take many months to years;
• Greater visibility and accessibility for research outputs, given there is no charge for posting or reading a preprint, especially for those who do not have access to pay-walled journals, or limited access due to remote working (such as during lockdowns);
• Additional peer feedback beyond that provided by journal-led peer review, enhancing the possibility of collaboration via community input and discussion;
• Researchers to establish priority (or a precedent) on their results, mitigating the chance of being ‘scooped’;
• Breakdown of the silos that traditional journals uphold, by exposing us to broader research than we might encounter otherwise, and giving a home to works that do not have a clear destination in a traditional publication;
• Research to be more open and transparent, with the intention of improving the overall quality, integrity, and reproducibility of results. …”

Participatory Mapping: A Systematic Review and Open Science Framework for Future Research | Research Explorer | The University of Manchester

Abstract:  Participatory Mapping emerged from a need for more inclusive methods of collecting spatial data with the intention of democratising the decision-making process. It encompasses a range of methods including mental mapping, sketch mapping and Participatory GIS. Whilst there has been a rapid increase in uptake of Participatory Mapping, the multidisciplinary nature of the field has resulted in a lack of consistency in the conducting and reporting of research, limiting further development. In this paper we argue that an Open Science approach is required to enable the field to advance, increasing transparency and replicability in the way Participatory Mapping research is both conducted and reported. This argument is supported by the first large-scale systematic review of the field, which identifies specific areas within Participatory Mapping that would benefit from an Open Science approach. Four questions are used to explore the sample: (1) How are different Participatory Mapping methods being used and reported? (2) What information is given on the data collected through Participatory Mapping? (3) How are participant demographics being recorded? (4) Who is conducting the research and where is it being published? From a total of 578 academic research articles, we analysed a stratified sample of 117. The review reveals a significant lack of reporting on key details in the data collection process, restricting the transparency, replicability, and transferability of Participatory Mapping research and demonstrating the urgent need for an Open Science approach. Recommendations are then drawn from the results to guide the design of future Participatory Mapping research.


Transform to Open Science (TOPS) Curriculum Development Team

“Open science  —  opening up the scientific process from idea inception to result — increases access to knowledge and expands opportunities for new voices to participate. Sharing the data, code, and knowledge associated with the scientific process lowers barriers to entry, enables findings to be more easily reproduced, generates new knowledge at scale, and allows and facilitates diverse societal uses.

AGU and NASA have made a commitment to advancing the principles of open science to build a more inclusive and open community at NASA, AGU and beyond. This is a resolution to work towards a more transparent and collaborative scientific progress, opening data and results to the broader public whenever possible, and incentivizing researchers around the globe to do the same.

To help catalyze and support the cultural change necessary for such an opening of scientific knowledge, NASA has launched the Open-Source Science Initiative (OSSI), a long-term commitment to open science. To spark change and inspire open science engagement, OSSI has created the Transform to Open Science (TOPS) mission and declared 2023 as the Year Of Open Science.

A key goal of TOPS is to engage thousands of researchers in open science leading practices.

Launching a program such as TOPS is possible thanks to the open science communities’ work over the last couple of decades. TOPS would like to leverage this work in developing a five-part curriculum on open science.  We seek participation from individuals actively engaging with open science communities, open software and data, and related practices to serve on a TOPS Curriculum Development Team. This will include participation in a series of virtual meetings and sprints this year. For those selected to lead module development, there will also be in-person working sessions at AGU’s headquarters in Washington, DC. AGU, in partnership with NASA and experts in curriculum development, will coordinate this effort.  All content will be openly shared….”

Data tools for achieving disaster risk reduction: An analysis of open-access georeferenced disaster risk datasets – World | ReliefWeb

“The priorities of the Sendai Framework are to (1) understand disaster risk; (2) strengthen disaster risk governance to manage risk; (3) invest in disaster risk reduction and resilience; and (4) enhance the capacity to recover from disasters (UNDRR, 2015). This study advances our knowledge of implementing the Sendai Framework from publications that have utilized open-access spatial data and issues common to Framework implementation. The findings from a literature review reveal that many of the problems cited by recent work are data-related.

This study engages with these issues and discusses how they could be addressed by those who have a vested interest in disaster risk reduction, from policymakers to community members.”

Panel discussion: Building geospatial data capacity at the municipal level Tickets, Wed, 18 May 2022 at 12:00 PM | Eventbrite

“Municipalities are the level of government closest to residents. Geospatial data is critical in planning the infrastructure and delivering the services that residents interact with daily. More broadly, sharing geospatial capacity can enable municipalities to collectively address challenges extending beyond any community’s borders.

Yet, the ability to fully leverage geospatial data varies significantly between communities. Collaboration – that is, sharing data assets, infrastructure, and knowledge – can help municipalities to gain capacity they would not otherwise be able to access in order to improve internal data practices; share collective intelligence and make mutual decisions on issues of regional importance; unlock geospatial information for community-based economic, social, and environmental initiatives, and; present a united ask for resources from higher levels of government.

Join Open North for a virtual panel discussion where we will address questions raised in our recent report such as:

What issues can most benefit from greater collaboration and sharing of geospatial resources between municipalities?
What are the barriers to forming and sustaining collaborations?
What can we learn from successful existing collaborations?
How can provincial governments, civil society, and the private sector better support collaborations? …”

UChicago Library awarded grant to digitize Chicagoland’s historical maps | University of Chicago News

“The National Endowment for the Humanities has awarded the University of Chicago Library, in partnership with the Newberry Library and the Chicago History Museum, a grant to digitize historical maps of Chicago from the 19th century through 1940.

The grant of $348,930 to fund their proposal, “Mapping Chicagoland,” will also support the enrichment of the digital images with geographic information for use in spatial overlays and analyses, as well as the work to make them open to the public on the UChicago Library website. The maps will also be available through the BTAA (Big Ten Academic Alliance) Geoportal and Chicago Collections platforms….”

A golden era for volcanic gas geochemistry?

Abstract:  […] We argue that the recent advent of automated, continuous geochemical monitoring at volcanoes now allows us to track activity from unrest to eruption, thus providing valuable insights into the behavior of volatiles throughout the entire sequence. In the next 10 years, the research community stands to benefit from the expansion of geochemical monitoring networks to many more active volcanoes. This, along with technical advances in instrumentation, and in particular the increasing role that unoccupied aircraft systems (UAS) and satellite-based observations are likely to play in collecting volcanic gas measurements, will provide a rich dataset for testing hypotheses and developing diagnostic tools for eruption forecasts. The use of consistent, well-documented analytical methods and ensuring free, public access to the collected data with few restrictions will be most beneficial to the advancement of volcanic gas science.


Goodbye, world! OER World Map Blog

The North-Rhine Westphalian Library Service Centre (hbz) will cease operating the OER World Map on 2022-04-29. We would like to thank all those who have supported and promoted the project in recent years. hbz will provide an appropriate solution for archiving the collected data. The software and data are openly licensed, so it is possible to continue operating the platform. If you are interested in continuing to operate the OER World Map, please do not hesitate to contact us at  

Frontiers | Toward More Inclusive Metrics and Open Science to Measure Research Assessment in Earth and Natural Sciences | Research Metrics and Analytics

“Diversity, equity and inclusion are key components of Open Science. In achieving them, we can hope that we can reach a true Open Access of scientific resources, one that encompasses both (i) open access to the files (uploading them to a public repository) and (ii) open access to the contents (including language). Until we decide to move away from profit-driven journal-based criteria to evaluate researchers, it is likely that high author-levied publication costs will continue to maintain inequities to the disadvantage of researchers from non-English speaking and least developed countries. As quoted from Bernard Rentier, “the universal consensus should focus on the research itself, not where it was published.” ”

Position Statement on Earth and Space Science Data | AGU

“Earth and space science data are a world heritage, and an essential part of the science ecosystem. All players in the science ecosystem—researchers, repositories, publishers, funders, institutions, etc.—should work to ensure that relevant scientific evidence is processed, shared, and used ethically, and is available, preserved, documented, and fairly credited. To achieve this legacy, all AGU members and stakeholders must have a clear understanding of the culture of responsible research, and take action to support, enable, and nurture that culture.

The Challenge

Preserving data as a world heritage requires a culture of data use, sharing, curation, and attribution that is equitable, accessible, and ethical, all of which are essential for scientific research to be transparent, trusted, and valued. Data and other research artefacts, such as physical samples, software, models, methods, and algorithms, are all part of the science ecosystem and essential for research. Data and other research artefacts must be discoverable, accessible, verifiable, trustworthy, and usable, and those responsible for their acquisition or creation should receive due credit for their contribution to scientific advancement. Trustworthy, robust, verifiable, reproducible, and open science is our responsibility and legacy for future generations. To achieve this legacy, policy makers, AGU members, and other stakeholders must recognize that the science ecosystem should be flexible enough to adapt to a changing landscape of research practices, technology innovation, and demonstrations of impact. They must also have a clear understanding of the culture of responsible research, and take action to support, enable, and nurture that culture. This statement, in alignment with other AGU position statements, helps form the foundation to support data as a world heritage.

The Solution

I. Championing Open and Transparent Data

Robust, verifiable, and reproducible science requires that evidence behind an assertion be accessible for evaluation. Researchers have a responsibility to collect, develop, and share this evidence in an ethical manner, that is as open and transparent as possible. Most Earth and space science data can and should be openly available except in cases where human subjects are involved, where other legal restrictions apply, or where data release could cause harm, (e.g. where data could lead to identification of specific people, or could publicly reveal locations of endangered species). Even where data are not publicly available, transparency of collection and processing methods, data quality, inherent assumptions, and known sources of bias is essential. Building transparency and ethical behavior into the entire science ecosystem, even as technology and scientific practice evolves, is a vital component of responsible research.

Data and other research artefacts are useful to the broader scientific community only insofar as they can be shared, examined, and reused. Working within discipline communities to develop, share, and adopt best practices, standards, clear documentation and appropriate licensing will facilitate sharing and interoperability. …

Statement adopted by the American Geophysical Union 29 May 1997; Reaffirmed May 2001, May 2005, May 2006; Revised and Reaffirmed May 2009, February 2012, September 2015; November 2019.”