Pangeo Forge: Crowdsourcing Open Data in the Cloud :: FOSS4G 2022 general tracks :: pretalx

“Geospatial datacubes–large, complex, interrelated multidimensional arrays with rich metadata–arise in analysis-ready geopspatial imagery, level 3/4 satellite products, and especially in ocean / weather / climate simulations and [re]analyses, where they can reach Petabytes in size. The scientific python community has developed a powerful stack for flexible, high-performance analytics of databcubes in the cloud. Xarray provides a core data model and API for analysis of such multidimensional array data. Combined with Zarr or TileDB for efficient storage in object stores (e.g. S3) and Dask for scaling out compute, these tools allow organizations to deploy analytics and machine learning solutions for both exploratory research and production in any cloud platform. Within the geosciences, the Pangeo open science community has advanced this architecture as the “Pangeo platform” (http://pangeo.io/).

However, there is a major barrier preventing the community from easily transitioning to this cloud-native way of working: the difficulty of bringing existing data into the cloud in analysis-ready, cloud-optimized (ARCO) format. Typical workflows for moving data to the cloud currently consist of either bulk transfers of files into object storage (with a major performance penalty on subsequent analytics) or bespoke, case-by-case conversions to cloud optimized formats such as TileDB or Zarr. The high cost of this toil is preventing the scientific community from realizing the full benefits of cloud computing. More generally, the outputs of the toil of preparing scientific data for efficient analysis are rarely shared in an open, collaborative way.

To address these challenges, we are building Pangeo Forge ( https://pangeo-forge.org/), the first open-source cloud-native ETL (extract / transform / load) platform focused on multidimensional scientific data. Pangeo Forge consists of two main elements. An open-source python package–pangeo_forge_recipes–makes it simple for users to define “recipes” for extracting many individual files, combining them along arbitrary dimensions, and depositing ARCO datasets into object storage. These recipes can be “compiled” to run on many different distributed execution engines, including Dask, Prefect, and Apache Beam. The second element of Pangeo Forge is an orchestration backend which integrates tightly with GitHub as a continuous-integration-style service….”

Open Science Stories

“Is there an open science story that made you say, “I need to get involved?” Do you remember a story about the impact of open data, open-source software, open publications on science? How has open science enabled scientific breakthroughs? Tell us about it! 

NASA’s Transform to Open Science (TOPS) mission is seeking compelling stories about open science in practice. To help transform NASA scientific processes to open science, we need to provide compelling and relatable examples that show how open science creates more impactful, efficient, inclusive science. By collecting these stories, we can start showing scientists how open science can help them.

We are looking for big, awe-inspiring stories about open data, open-source software, open results and open access, and the use of openly available tools for scientific practice. These stories could be about projects that utilize large amounts of data, about utilizing code on an open-source repository, about citizen scientists working together to identify constellations or clouds….”

[2208.08426] “We Need a Woman in Music”: Exploring Wikipedia’s Values on Article Priority

Abstract:  Wikipedia — like most peer production communities — suffers from a basic problem: the amount of work that needs to be done (articles to be created and improved) exceeds the available resources (editor effort). Recommender systems have been deployed to address this problem, but they have tended to recommend work tasks that match individuals’ personal interests, ignoring more global community values. In English Wikipedia, discussion about Vital articles constitutes a proxy for community values about the types of articles that are most important, and should therefore be prioritized for improvement. We first analyzed these discussions, finding that an article’s priority is considered a function of 1) its inherent importance and 2) its effects on Wikipedia’s global composition. One important example of the second consideration is balance, including along the dimensions of gender and geography. We then conducted a quantitative analysis evaluating how four different article prioritization methods — two from prior research — would affect Wikipedia’s overall balance on these two dimensions; we found significant differences among the methods. We discuss the implications of our results, including particularly how they can guide the design of recommender systems that take into account community values, not just individuals’ interests.

 

[2208.08426] “We Need a Woman in Music”: Exploring Wikipedia’s Values on Article Priority

Abstract:  Wikipedia — like most peer production communities — suffers from a basic problem: the amount of work that needs to be done (articles to be created and improved) exceeds the available resources (editor effort). Recommender systems have been deployed to address this problem, but they have tended to recommend work tasks that match individuals’ personal interests, ignoring more global community values. In English Wikipedia, discussion about Vital articles constitutes a proxy for community values about the types of articles that are most important, and should therefore be prioritized for improvement. We first analyzed these discussions, finding that an article’s priority is considered a function of 1) its inherent importance and 2) its effects on Wikipedia’s global composition. One important example of the second consideration is balance, including along the dimensions of gender and geography. We then conducted a quantitative analysis evaluating how four different article prioritization methods — two from prior research — would affect Wikipedia’s overall balance on these two dimensions; we found significant differences among the methods. We discuss the implications of our results, including particularly how they can guide the design of recommender systems that take into account community values, not just individuals’ interests.

 

delightful open science

“This Open Science list is open, just like Open Science itself. What is delightful is rather subjective, because of the background of the initiators the list has started quite nerdy and focussed on infrastructure and scholarly communication. Please help and add more information by adding an “issue” or making a pull request (both options in menu above), especially on topics around reproducibility, meta-science and outreach, where this list is weaker….”

A Possible Fix For Scientific (and Academic) Publishing | Peer Review – News and Blog

“This is a proposal for a software platform that may help the academic community solve these problems, and more….

Peer Review [the proposed platform] allows scholars, scientists, academics, and researchers to self organize their own peer review and refereeing, without needing journal editors to manually mediate it. The platform allows review and refereeing to be crowdsourced, using a reputation system tied to academic fields to determine who should be able to offer review and to referee.

The platform splits pre-publish peer review from post-publish refereeing. Pre-publish review then becomes completely about helping authors polish their work and decide if their articles are ready to publish. Refereeing happens post-publish, and in a way which is easily understandable to the lay reader, helping the general public sort solid studies from shakey ones.

 

Peer Review is being developed open source. The hope is to form a non-profit to develop it which would be governed by the community of academics who use the platform in collaboration with the team of software professionals who build it (a multi-stakeholder cooperative)….”

Supporting public preprint review through collaborative reviews – an update on ASAPbio’s crowd preprint review – ASAPbio

“Through our crowd preprint review activities we seek to draw on the collective input of a group of commenters who each can comment on the preprint according to their level of expertise and interest. We are midway through our activities for 2022 and we wanted to share an update on our progress.

What have we accomplished so far?

We had a great response from the community with over 120 crowd reviewers signed up so far, with strong representation of early career researchers. We have three groups which complete reviews of preprints in each of the disciplines below:

Cell biology – a crowd of 70 members reviews preprints posted on bioRxiv 
Biochemistry – a crowd of 35 researchers reviews preprints from bioRxiv 
Infectious diseases preprints in Portuguese – a crowd of 30 researchers provide reviews in Portuguese for preprints posted in SciELO Preprints

For each of the groups, a group of ASAPbio Fellows and partners from SciELO Preprints are involved in selecting preprints to review and summarizing the comments received. They also provide regular feedback on aspects of the process that can be adjusted or improved. 

We circulate a new preprint to each group every week and invite comments via a Google document. We have seen a great level of engagement from reviewers, and are particularly pleased to see the interactions among reviewers in the collaborative documents, where they provide comments and feedback to each other, not only about the preprints but also about queries that may arise during their review….”

We’ve passed 100,000,000 verifiable observations on iNaturalist! · iNaturalist

“If you made 1,000 observations a day, every day, it would take you 274 years to generate 100 million observations. This milestone shows what people can do by working together. The iNaturalist dataset is something we’ve all made together, but it’s larger than any one of us. We hope everyone is as proud of this accomplishment as we are. Together, the iNaturalist community has created a unique window into life on Earth and hundreds of thousands of species with whom we share the planet. Thank you!

We know that even more potential for iNaturalist lies ahead. To fulfill our mission of connecting people to nature and advancing science and conservation, we’re working on a strategy to reach 100 million naturalists by 2030. This requires investing in technology improvements, so we’re now searching for two new software engineers to join the iNat team. Please spread the word to help us find great candidates….”

The LOTUS initiative for open knowledge management in natural products research | eLife

Abstract:  Contemporary bioinformatic and chemoinformatic capabilities hold promise to reshape knowledge management, analysis and interpretation of data in natural products research. Currently, reliance on a disparate set of non-standardized, insular, and specialized databases presents a series of challenges for data access, both within the discipline and for integration and interoperability between related fields. The fundamental elements of exchange are referenced structure-organism pairs that establish relationships between distinct molecular structures and the living organisms from which they were identified. Consolidating and sharing such information via an open platform has strong transformative potential for natural products research and beyond. This is the ultimate goal of the newly established LOTUS initiative, which has now completed the first steps toward the harmonization, curation, validation and open dissemination of 750,000+ referenced structure-organism pairs. LOTUS data is hosted on Wikidata and regularly mirrored on https://lotus.naturalproducts.net. Data sharing within the Wikidata framework broadens data access and interoperability, opening new possibilities for community curation and evolving publication models. Furthermore, embedding LOTUS data into the vast Wikidata knowledge graph will facilitate new biological and chemical insights. The LOTUS initiative represents an important advancement in the design and deployment of a comprehensive and collaborative natural products knowledge base.

 

Supporting Ukrainian Editorial Staff: Crowdfunding Campaign

The invasion of Ukraine on 24 February 2022 and the expansion of the war zone across the country have had a significant impact on the country’s scientific activity. Much civilian infrastructure has been destroyed, including higher education and research institutions.

Through a number of programmes, such as Science for Ukraine, support is being provided to Ukrainian researchers, but this support has not been extended to staff working alongside researchers in knowledge generation: the librarians, editors, technicians, and administrative staff at universities, research institutes, and other infrastructures.

Yet preserving the knowledge, expertise, and knowledge-sharing capabilities of these scientific communities is of vital importance.

What can we do to help?

Supporting Ukrainian Editorial Staff (SUES) is an initiative by various European institutions, infrastructures, and organizations (Institute of Literary Research of the Polish Academy of Sciences [IBL-PAN], OPERAS, Directory of Open Access Journals [DOAJ], Directory of Open Access Books [DOAB], Electronic Information for Libraries [EIFL], Association of European University Presses [AEUP]), as well as a number of French scientific publishers, aimed at supporting scientific communication in Ukraine and helping scholarly journals and academic publishers to continue their publishing activities.

Did you know that there are more than 1,000 academic journals in Ukraine? Over 700 of these are open access journals published via the URAN platform. The publication of academic books is also extensive, with more than 20 Ukrainian university presses currently distributed via the CEEOL portal. These publications, in fields ranging from physics to literature via history, sociology, and biology, are key vehicles for the communication of knowledge generated by Ukrainian researchers. The editors, reviewers, typesetters, proofreaders, translators, and technical and administrative staff working in the various publishing centres need your support to continue their mission: to share and disseminate knowledge.

A questionnaire is being circulated around Ukrainian journals and publishers to help accurately identify their needs in terms of financial and technical support. The requests received so far relate primarily to remuneration for editorial work, to enable them to continue their work and to publish the next issue of their journal or their next book. The purpose of this campaign is to help 10 journals or publishers to keep publishing. In the long term, the project is also aimed at strengthening relationships and exchanging knowledge to ensure the international presence and visibility of Ukrainian academic publishers. Thanks to your contribution, Ukrainian scholarly journals and scientific publishers will be able to continue sharing knowledge.

A crowdfunding campaign is being run from Wednesday, 4 May to Monday, 6 June 2022, to raise money to help Ukrainian journals who have requested assistance from the coalition. Unique compensation will be offered in return for any financial support offered.

Link to the crowdfunding webpage: https://wemakeit.com/projects/support-to-ukrainian-editors

Contacts

The Open Access Tracking Project – OATP – TIB-Blog

“In a recent meta-study for the German Federal Ministry of Education and Science (abbreviated BMBF), TIB investigated the current state of research on the effects of Open Access. The report resulting from this study has also recently been published (“Wirkungen von Open Access”; https://doi.org/10.34657/7666), here in the blog I have summarised the results of the study. The study relied on the Open Access Tracking Project (OATP) as a control instrument: Using the collection of Open Access references on OATP, we were able to systematically expand the literature on all of the impacts we examined and make sure that we did not overlook any significant studies. After completing the study, we supplemented OATP with the small amount of literature that had not been already recorded there. We use this opportunity to introduce this important resource for information on Open Access to the audience of the TIB blog.

The OATP is dedicated to collecting and making available all news and commentary on OA topics in one place. The platform was founded in 2009 by Peter Suber. Different from existing channels such as blogs, OATP was designed to provide a comprehensive collection of the growing number of contributions on OA topics via crowdsourcing. For this purpose, OATP relies on the open source software TagTeam, which was specially developed for OATP by the Berkman Klein Center for Internet & Society. Using TagTeam, users can link items on OATP and tag them in order to categorize their contents: For example, oa.benefit refers to entries on the benefits of Open Access; the tag oa.germany identifies entries on Open Access in Germany….”

ASAPbio Crowd preprint review 2022 sign-up form

“Following our trial last year, ASAPbio is running further preprint crowd review activities in 2022. Our goal is to provide an engaging environment for researchers to participate in providing feedback on preprints and support public reviews for preprints.

In 2022, we will be coordinating public reviews for different disciplines. We are pleased to say that we are collaborating with SciELO Preprints to also coordinate the review of preprints in Portuguese. This year we will cover the following disciplines:

– Cell biology preprints from bioRxiv (English)
– Biochemistry preprints from bioRxiv (English)
– Infectious diseases preprints from SciELO Preprints (Portuguese)

**This form is for reviewers who will participate in the review of preprints from bioRXiv, to sign up for the review of SciELO Preprints in Portuguese, please complete this form: https://docs.google.com/forms/d/e/1FAIpQLSd0wrAa7FLrw8I1j5p9mysWrstehPqDqsn9UPjUbqrwRnQU-A/viewform

We invite researchers in the disciplines above to join our crowd preprint review activities, and particularly encourage early career researchers to participate. The activities will run for three months, from mid May to August 2022….”

Citizen seismology helps decipher the 2021 Haiti earthquake

Abstract:  The August 14, Mw7.2, Nippes earthquake in Haiti occurred within the same fault zone as its devastating, Mw7.0, 2010 predecessor but struck the country when field access was limited by insecurity and conventional seismometers from the national network were inoperative. A network of citizen seismometers installed in 2019 provided near-field data critical to rapidly understand the mechanism of the mainshock and monitor its aftershock sequence. Their real-time data define two aftershock clusters that coincide with two areas of coseismic slip derived from inversions of conventional seismological and geodetic data. Machine learning applied to data from the citizen seismometer closest to the mainshock allows us to forecast aftershocks as accurately as with the network-derived catalog. This shows the utility of citizen science contributing to the understanding of a major earthquake.