NASA, ESA Partnership Releases Platform for Open-source Science in the Cloud
“Virtual SciDataCon 2021 is organised around a number of thematic strands. This is the third of a series of announcements presenting these strands to the global data community. Please note that registration is free, but participants must register for each session they wish to attend.
For some time there has been recognition of the need for investment in domain specific research infrastructures at a national and sometimes regional level. In recent years, in some countries and regions, there has been a move towards research infrastructures that are both vertically and horizontally integrated: vertically, in the sense that they aim to bring generic e-infrastructure closer to research communities’ needs; horizontally, in the sense that they explicitly aim, by embracing principles of Open Science and FAIR data, to better facilitate interdisciplinary research. Examples include, but are not limited to, the European Open Science Cloud (EOSC), the China Science and Technology Cloud (CSTCloud), the Australian Research Data Commons (ARDC), the Malaysian Open Science Platform, the African Open Science Platform, the planned broadening of LA Referencia in Latin America, as well as Canada’s NDRIO and Germany’s NFDI The major international data organisation that collaborate in Data Together have complementary activities to define a model for Open Research Commons and to encourage cooperation, alignment and interoperability between Open Science Clouds….”
Date: Oct. 27, 2021
Time: 11:00 – 12:30
“Session Title: Developing Cooperation and Alignment Between Open Science Clouds: governance and sustainability, policy and legal, technical infrastructure, data interoperability
Session Organisers: Simon Hodson
Register for the session: https://us02web.zoom.us/meeting/register/tZ0lf-CpqzojG9OqmXdz54QFxeU639vMCrzo
This interactive workshop session will provide an overview of the activities of four thematic working groups established by the Global Open Science Cloud project. Each Working Group will give a short presentation, focusing on the areas which it has identified to share information, develop cooperation and to explore alignment. The presentations will be followed by structured discussion. We invite participants to make recommendations for this work and to help identify areas where cooperation can be supported by the Working Groups….”
“The agreements regulate resources and services necessary for the collection, processing, storage, dissemination and availability of research data.
This initiative is the result of years-long joint efforts of many stakeholders from the science and tertiary education in the open science movement, and the initiative was launched with the support of the Ministry of Science and the Croatian Science Foundation.
It creates preconditions for developing the Croatian open science cloud that will enable coordinated development of the country’s e-infrastructure.
The initiative will bring together relevant stakeholders in creating required preconditions for the implementation, realisation, and promotion of open science….”
“Until now, the most advanced climate models have mostly been available to researchers in the wealthiest countries.
New program will see Amazon Web Services’ advanced cloud technologies host 30 climate model simulations and make them available to researchers around the globe….
The resulting free, open access dataset will allow research teams internationally to skirt one of the major barriers to specialized climate modeling, even for those who have the computing capacity to make it happen: cost. Wanser said running the 30 simultaneous simulations would normally cost roughly $700,000, and take two months to run.
The AWS program will cover all costs associated with hosting and sharing data from the cloud, and accessing and downloading it will be free. Grants will be available to users who choose to analyze or run additional models on AWS.”
The EOSC Future project is looking for 200+ science champions to help co-design and finetune EOSC services and products.
“Three renowned researchers in digital humanities and computer science are joining forces with the Library of Congress on three inaugural Computing Cultural Heritage in the Cloud projects, exploring how biblical quotations, photographic styles and “fuzzy searches” reveal more about the collections in the world’s largest Library than first meets the eye.
Supported by a $1 million grant from the Andrew W. Mellon Foundation awarded in 2019, the initiative combines cutting edge technology with the Library’s vast collections to support digital humanities research at scale. These three outside researchers will collaborate with subject matter experts and technology specialists at the Library of Congress to experiment in pursuit of answers that can only be achieved with collections and data at scale. These collaborations will enable research on questions previously difficult to address due to technical and data constraints. Expanding the skills and knowledge necessary for this work will enable the Library to support emerging methods in cloud-based computing research such as machine learning, computer vision, interactive data visualization, and other areas of digital humanities and computer science research. As a result, the Library and other cultural heritage institutions may build upon or adapt these approaches for their own use in improving access to text and image collections….”
Abstract: Dockstore (https://dockstore.org/) is an open source platform for publishing, sharing, and finding bioinformatics tools and workflows. The platform has facilitated large-scale biomedical research collaborations by using cloud technologies to increase the Findability, Accessibility, Interoperability and Reusability (FAIR) of computational resources, thereby promoting the reproducibility of complex bioinformatics analyses. Dockstore supports a variety of source repositories, analysis frameworks, and language technologies to provide a seamless publishing platform for authors to create a centralized catalogue of scientific software. The ready-to-use packaging of hundreds of tools and workflows, combined with the implementation of interoperability standards, enables users to launch analyses across multiple environments. Dockstore is widely used, more than twenty-five high-profile organizations share analysis collections through the platform in a variety of workflow languages, including the Broad Institute’s GATK best practice and COVID-19 workflows (WDL), nf-core workflows (Nextflow), the Intergalactic Workflow Commission tools (Galaxy), and workflows from Seven Bridges (CWL) to highlight just a few. Here we describe the improvements made over the last four years, including the expansion of system integrations supporting authors, the addition of collaboration features and analysis platform integrations supporting users, and other enhancements that improve the overall scientific reproducibility of Dockstore content.
“In the race to harness the power of cloud computing, and further develop artificial intelligence, academics have a new concern: falling behind a fast-moving tech industry. In the US, 22 higher education institutions, including Stanford and Carnegie Mellon, have signed up to a National Research Cloud initiative seeking access to the computational power they need to keep up. It is one of several cloud projects being called for by academics globally, and is being explored by the US Congress, given the potential of the technology to deliver breakthroughs in healthcare and climate change….”
“In alignment with RDA’s core mission to ‘set international Research Data and Protocol agreements and standards’11 , the RDA Global Open Research Commons Interest Group (GORC IG)12 is helping to support coordination amongst regional, national, pan-national and domain-specific organizations. Those organizations are developing the interoperable resources necessary to enable researchers to address societal grand challenges across disciplines, technologies and countries….
The Global Open Science Cloud (GOSC)13 initiative has its roots in the same series of meetings. It was proposed in 2019 at the CODATA conference in Beijing with the objective to assist the alignment and interoperation of open science cloud activities. GOSC aims to co-design and build a cross-continental, federated e-infrastructure and virtual research environment for global cooperation and open science using harmonized policies, interoperable protocols and transparent services. Network connectivity, secure AAI (Authentication and Authorization Infrastructure), computing federation, FAIR data, and policy alignment are the key components….
While the GORC initiative focuses on a roadmap for commons integration, the GOSC is creating a cooperation mechanism and testbed implementations for science clouds that arise from that roadmap. Developing and sustaining collaboration between GORC and GOSC, through the Data Together partnership will enhance the impact of each initiative and result in sustainable benefits for the wider research community. In addition, members of the Data Together group are working with the various platforms to convene a roundtable of senior representatives from the organizations to facilitate these efforts.”
Abstract: This paper introduces the Archives Unleashed Cloud, a web-based interface for working with web archives at scale. Current access paradigms, largely driven by the scope and scale of web archives, generally involve using the command line and writing code. This access gap means that subject-matter experts, as opposed to developers and programmers, have few options to directly work with web archives beyond the page-by-page paradigm of the Wayback Machine. Drawing on first-hand research and analysis of how scholars use web archives, we present the interface design and underpinning architecture of the Archives Unleashed Cloud. We also discuss the sustainability implications of providing a cloud-based service for researchers to analyze their collections at scale.
“Big bibliographic datasets hold promise for revolutionizing the scientific enterprise when combined with state-of-the-science computational capabilities. Yet, hosting proprietary and open big bibliographic datasets poses significant difficulties for libraries, both large and small. Libraries face significant barriers to hosting such assets, including cost and expertise, which has limited their ability to provide stewardship for big datasets, and thus has hampered researchers’ access to them. What is needed is a solution to address the libraries’ and researchers’ joint needs. This article outlines the theoretical framework that underpins the Collaborative Archive and Data Research Environment project. We recommend a shared cloud-based infrastructure to address this need built on five pillars: 1) Community–a community of libraries and industry partners who support and maintain the platform and a community of researchers who use it; 2) Access–the sharing platform should be accessible and affordable to both proprietary data customers and the general public; 3) Data-Centric–the platform is optimized for efficient and high-quality bibliographic data services, satisfying diverse data needs; 4) Reproducibility–the platform should be designed to foster and encourage reproducible research; 5) Empowerment—the platform should empower researchers to perform big data analytics on the hosted datasets. In this article, we describe the many facets of the problem faced by American academic libraries and researchers wanting to work with big datasets. We propose a practical solution based on the five pillars: The Collaborative Archive and Data Research Environment. Finally, we address potential barriers to implementing this solution and strategies for overcoming them.”
“EOSC-Life has launched its first Digital Life Sciences Open Call: A European Open Science Cloud (EOSC-Life) call for projects sharing data, tools and workflows in the cloud. This call offers financial support for projects, to enable life science researchers to connect their research to the cloud, alongside training, advice and assistance from data experts, tool developers and cloud specialists. Proposals should align with the goals of EOSC (European Open Science Cloud) – ie. enabling data sharing for the purpose of furthering scientific research. The project’s overarching aim is to make life science research data publicly available in a FAIR (Findable, Accessible, Interoperable, Reusable) way in the EOSC….”
“The Library of Congress is the largest library in the world, with millions of books, recordings, photographs, newspapers, maps and manuscripts in its collections. One of the missions of Library of Congress’ Labs (Labs) at the Library of Congress (Library) is to enable transformational experiences between the Library’s digital collections and the American people.
LC Labs (Labs), a division in the Digital Strategy Directorate in the Office of the Chief Information Officer of the Library of Congress, was awarded an Andrew W. Mellon Foundation grant titled “Computing Cultural Heritage in the Cloud” to test a cloud-based approach for interacting with digital collections as data, supporting those researchers who are creatively applying emerging styles of research to Library material. In collaboration with subject matter experts and IT specialists at the Library, the Library is seeking to award contracts to up to four research experts (Research Experts) to experiment with solutions to problems that can only be explored at scale. See attached BAA for details about this opportunity….”
“When big data intersects with highly sensitive data, both opportunity to society and risks abound. Traditional approaches for sharing sensitive data are known to be ineffective in protecting privacy. Differential Privacy, deriving from roots in cryptography, is a strong mathematical criterion for privacy preservation that also allows for rich statistical analysis of sensitive data. Differentially private algorithms are constructed by carefully introducing “random noise” into statistical analyses so as to obscure the effect of each individual data subject. OpenDP is an open-source project for the differential privacy community to develop general-purpose, vetted, usable, and scalable tools for differential privacy, which users can simply, robustly and confidently deploy.
Dataverse is an open source web application to share, preserve, cite, explore, and analyze research data. It facilitates making data available to others, and allows you to replicate others’ work more easily. Researchers, journals, data authors, publishers, data distributors, and affiliated institutions all receive academic credit and web visibility. A Dataverse repository is the software installation, which then hosts multiple virtual archives called Dataverses. Each dataverse contains datasets, and each dataset contains descriptive metadata and data files (including documentation and code that accompany the data).
This session examines ongoing efforts to realize a combined use case for these projects that will offer academic researchers privacy-preserving access to sensitive data. This would allow both novel secondary reuse and replication access to data that otherwise is commonly locked away in archives. The session will also explore the potential impact of this work outside the academic world.”