Abstract: Dockstore (https://dockstore.org/) is an open source platform for publishing, sharing, and finding bioinformatics tools and workflows. The platform has facilitated large-scale biomedical research collaborations by using cloud technologies to increase the Findability, Accessibility, Interoperability and Reusability (FAIR) of computational resources, thereby promoting the reproducibility of complex bioinformatics analyses. Dockstore supports a variety of source repositories, analysis frameworks, and language technologies to provide a seamless publishing platform for authors to create a centralized catalogue of scientific software. The ready-to-use packaging of hundreds of tools and workflows, combined with the implementation of interoperability standards, enables users to launch analyses across multiple environments. Dockstore is widely used, more than twenty-five high-profile organizations share analysis collections through the platform in a variety of workflow languages, including the Broad Institute’s GATK best practice and COVID-19 workflows (WDL), nf-core workflows (Nextflow), the Intergalactic Workflow Commission tools (Galaxy), and workflows from Seven Bridges (CWL) to highlight just a few. Here we describe the improvements made over the last four years, including the expansion of system integrations supporting authors, the addition of collaboration features and analysis platform integrations supporting users, and other enhancements that improve the overall scientific reproducibility of Dockstore content.
“In the race to harness the power of cloud computing, and further develop artificial intelligence, academics have a new concern: falling behind a fast-moving tech industry. In the US, 22 higher education institutions, including Stanford and Carnegie Mellon, have signed up to a National Research Cloud initiative seeking access to the computational power they need to keep up. It is one of several cloud projects being called for by academics globally, and is being explored by the US Congress, given the potential of the technology to deliver breakthroughs in healthcare and climate change….”
“In alignment with RDA’s core mission to ‘set international Research Data and Protocol agreements and standards’11 , the RDA Global Open Research Commons Interest Group (GORC IG)12 is helping to support coordination amongst regional, national, pan-national and domain-specific organizations. Those organizations are developing the interoperable resources necessary to enable researchers to address societal grand challenges across disciplines, technologies and countries….
The Global Open Science Cloud (GOSC)13 initiative has its roots in the same series of meetings. It was proposed in 2019 at the CODATA conference in Beijing with the objective to assist the alignment and interoperation of open science cloud activities. GOSC aims to co-design and build a cross-continental, federated e-infrastructure and virtual research environment for global cooperation and open science using harmonized policies, interoperable protocols and transparent services. Network connectivity, secure AAI (Authentication and Authorization Infrastructure), computing federation, FAIR data, and policy alignment are the key components….
While the GORC initiative focuses on a roadmap for commons integration, the GOSC is creating a cooperation mechanism and testbed implementations for science clouds that arise from that roadmap. Developing and sustaining collaboration between GORC and GOSC, through the Data Together partnership will enhance the impact of each initiative and result in sustainable benefits for the wider research community. In addition, members of the Data Together group are working with the various platforms to convene a roundtable of senior representatives from the organizations to facilitate these efforts.”
Abstract: This paper introduces the Archives Unleashed Cloud, a web-based interface for working with web archives at scale. Current access paradigms, largely driven by the scope and scale of web archives, generally involve using the command line and writing code. This access gap means that subject-matter experts, as opposed to developers and programmers, have few options to directly work with web archives beyond the page-by-page paradigm of the Wayback Machine. Drawing on first-hand research and analysis of how scholars use web archives, we present the interface design and underpinning architecture of the Archives Unleashed Cloud. We also discuss the sustainability implications of providing a cloud-based service for researchers to analyze their collections at scale.
“Big bibliographic datasets hold promise for revolutionizing the scientific enterprise when combined with state-of-the-science computational capabilities. Yet, hosting proprietary and open big bibliographic datasets poses significant difficulties for libraries, both large and small. Libraries face significant barriers to hosting such assets, including cost and expertise, which has limited their ability to provide stewardship for big datasets, and thus has hampered researchers’ access to them. What is needed is a solution to address the libraries’ and researchers’ joint needs. This article outlines the theoretical framework that underpins the Collaborative Archive and Data Research Environment project. We recommend a shared cloud-based infrastructure to address this need built on five pillars: 1) Community–a community of libraries and industry partners who support and maintain the platform and a community of researchers who use it; 2) Access–the sharing platform should be accessible and affordable to both proprietary data customers and the general public; 3) Data-Centric–the platform is optimized for efficient and high-quality bibliographic data services, satisfying diverse data needs; 4) Reproducibility–the platform should be designed to foster and encourage reproducible research; 5) Empowerment—the platform should empower researchers to perform big data analytics on the hosted datasets. In this article, we describe the many facets of the problem faced by American academic libraries and researchers wanting to work with big datasets. We propose a practical solution based on the five pillars: The Collaborative Archive and Data Research Environment. Finally, we address potential barriers to implementing this solution and strategies for overcoming them.”
“EOSC-Life has launched its first Digital Life Sciences Open Call: A European Open Science Cloud (EOSC-Life) call for projects sharing data, tools and workflows in the cloud. This call offers financial support for projects, to enable life science researchers to connect their research to the cloud, alongside training, advice and assistance from data experts, tool developers and cloud specialists. Proposals should align with the goals of EOSC (European Open Science Cloud) – ie. enabling data sharing for the purpose of furthering scientific research. The project’s overarching aim is to make life science research data publicly available in a FAIR (Findable, Accessible, Interoperable, Reusable) way in the EOSC….”
“The Library of Congress is the largest library in the world, with millions of books, recordings, photographs, newspapers, maps and manuscripts in its collections. One of the missions of Library of Congress’ Labs (Labs) at the Library of Congress (Library) is to enable transformational experiences between the Library’s digital collections and the American people.
LC Labs (Labs), a division in the Digital Strategy Directorate in the Office of the Chief Information Officer of the Library of Congress, was awarded an Andrew W. Mellon Foundation grant titled “Computing Cultural Heritage in the Cloud” to test a cloud-based approach for interacting with digital collections as data, supporting those researchers who are creatively applying emerging styles of research to Library material. In collaboration with subject matter experts and IT specialists at the Library, the Library is seeking to award contracts to up to four research experts (Research Experts) to experiment with solutions to problems that can only be explored at scale. See attached BAA for details about this opportunity….”
“When big data intersects with highly sensitive data, both opportunity to society and risks abound. Traditional approaches for sharing sensitive data are known to be ineffective in protecting privacy. Differential Privacy, deriving from roots in cryptography, is a strong mathematical criterion for privacy preservation that also allows for rich statistical analysis of sensitive data. Differentially private algorithms are constructed by carefully introducing “random noise” into statistical analyses so as to obscure the effect of each individual data subject. OpenDP is an open-source project for the differential privacy community to develop general-purpose, vetted, usable, and scalable tools for differential privacy, which users can simply, robustly and confidently deploy.
Dataverse is an open source web application to share, preserve, cite, explore, and analyze research data. It facilitates making data available to others, and allows you to replicate others’ work more easily. Researchers, journals, data authors, publishers, data distributors, and affiliated institutions all receive academic credit and web visibility. A Dataverse repository is the software installation, which then hosts multiple virtual archives called Dataverses. Each dataverse contains datasets, and each dataset contains descriptive metadata and data files (including documentation and code that accompany the data).
This session examines ongoing efforts to realize a combined use case for these projects that will offer academic researchers privacy-preserving access to sensitive data. This would allow both novel secondary reuse and replication access to data that otherwise is commonly locked away in archives. The session will also explore the potential impact of this work outside the academic world.”
“This document sets the draft general framework for future strategic research, development and innovation (RDI) activities to be further defined in the context of the candidate EOSC European Partnership1 proposed under the Horizon Europe Programme. • It uses elements of the candidate Partnership document proposed by the EOSC governing bodies as well as further work by the Executive Board, in order to develop, by October 2020, a first full version of the Strategic Research and Innovation Agenda (SRIA) for EOSC. • With the consultation launched on 20 July, the EOSC governing bodies are seeking the views and contributions of different stakeholders on the content of this document through the accompanying questionnaire. The consultation will remain open until 31 August 2020. • The feedback obtained in the consultation process will serve as input for the SRIA. The draft SRIA will be presented at the EOSC Governance Board meeting on 1 October 2020….”
“Welcome to the EOSC Strategic Research and Innovation Agenda (SRIA) Open Consultation page.
The European Open Science Cloud (EOSC) is the envisioned federation of research (data) infrastructures that will enable the Web of FAIR Data and Services, help researchers to perform Open Science, and open up and exploit their data, publications and code.
The Strategic Research and Innovation Agenda (SRIA) provides general guidelines to help develop the work programmes for EOSC in Horizon Europe. The SRIA is open for public consultation until 31 August 2020, involving stakeholders from inside and outside the EOSC Community. Research infrastructures, universities, researchers, industry, national and international initiatives, policymakers, citizen scientists are all invited to take part in this collective effort across countries and disciplines.
The consultation takes the form of an online questionnaire (below) where respondents can give their views on topics such as EOSC’s guiding principles, action areas and priorities. This includes information relating to rewarding Open Science practices and skills; standards, tools and services to find, access and reuse results; and shared and federated infrastructures to enable open sharing of scientific results.
Before completing the consultation questionnaire below, respondents are advised to first read the SRIA consultation document which sets out a general framework and provides key information. The questionnaire refers directly to the consultation document.
Complete the online questionnaire below and help shape the content of the Agenda….”