? sci2sci – “GitHub for scientists” – AI-friendly research data management and publishing platform | Grants | Gitcoin

“At sci2sci, we are building an electronic lab notebook and a publishing platform in one interface. This will allow to store all experimental data and metadata in one place, and quickly release it in public access with one click. 

In a nutshell, we offer full stack data publishing – from the experiment planning through raw data acquisition and analysis to the final research report – all in a single platform, with a number of benefits that cannot be offered by a current journal pdf manuscript:…”

Institute of Network Cultures | Christopher Kelty: The Internet We Could Have Had

“And “openness” today has become boring but essential to the massive economy of social media, which has monetized engagement based on the use of open source software and a sophisticated system of data tracking and transaction processing. Today “openness” is more likely to be experienced as part of the neoliberal managerial borg than it is the more radical liberation of knowledge for the people. Today, libgen and scihub are the open access we could have had. …

There were many who would have liked to make the internet more like the dreams of Ted Nelson or Douglas Engelbart, many who would build the Victorian Web or the Perseus Digital Library of ancient Greek and Roman texts. The internet would be epochal like the printing press and the invention of writing; it was the end of the book, as no shortage of breathless books was paradoxically announced. In the 1990s, we talked about how, once upon a time, the internet was a military project run by ARPA, but now that the National Science Foundation was in charge, it would be instead the culmination of Vannevar Bush’s imagination of the Memex, organizing the world’s knowledge for all to access and navigate, like a vast memory palace. …

But even this capitalism enthusiasm was tempered by the many things the internet still could have been. Even doused in lubricant, it was still an artistic medium, a hive mind, a multiplayer game, a playing field leveller, and a destroyer of old Idols, whether of the market, the university or the government. The internet we could have had was a haven for hackers and activists, legal scholars and (digital) anthropologists, net.artists and music pirates, cultural critics and journalists, meme-makers and Anonymous….”

Pangeo Forge: Crowdsourcing Open Data in the Cloud :: FOSS4G 2022 general tracks :: pretalx

“Geospatial datacubes–large, complex, interrelated multidimensional arrays with rich metadata–arise in analysis-ready geopspatial imagery, level 3/4 satellite products, and especially in ocean / weather / climate simulations and [re]analyses, where they can reach Petabytes in size. The scientific python community has developed a powerful stack for flexible, high-performance analytics of databcubes in the cloud. Xarray provides a core data model and API for analysis of such multidimensional array data. Combined with Zarr or TileDB for efficient storage in object stores (e.g. S3) and Dask for scaling out compute, these tools allow organizations to deploy analytics and machine learning solutions for both exploratory research and production in any cloud platform. Within the geosciences, the Pangeo open science community has advanced this architecture as the “Pangeo platform” (http://pangeo.io/).

However, there is a major barrier preventing the community from easily transitioning to this cloud-native way of working: the difficulty of bringing existing data into the cloud in analysis-ready, cloud-optimized (ARCO) format. Typical workflows for moving data to the cloud currently consist of either bulk transfers of files into object storage (with a major performance penalty on subsequent analytics) or bespoke, case-by-case conversions to cloud optimized formats such as TileDB or Zarr. The high cost of this toil is preventing the scientific community from realizing the full benefits of cloud computing. More generally, the outputs of the toil of preparing scientific data for efficient analysis are rarely shared in an open, collaborative way.

To address these challenges, we are building Pangeo Forge ( https://pangeo-forge.org/), the first open-source cloud-native ETL (extract / transform / load) platform focused on multidimensional scientific data. Pangeo Forge consists of two main elements. An open-source python package–pangeo_forge_recipes–makes it simple for users to define “recipes” for extracting many individual files, combining them along arbitrary dimensions, and depositing ARCO datasets into object storage. These recipes can be “compiled” to run on many different distributed execution engines, including Dask, Prefect, and Apache Beam. The second element of Pangeo Forge is an orchestration backend which integrates tightly with GitHub as a continuous-integration-style service….”

PLOS partners with DataSeer to develop Open Science Indicators – The Official PLOS Blog

“To provide richer and more transparent information on how PLOS journals support best practice in Open Science, we’re going to begin publishing data on ‘Open Science Indicators’ observed in PLOS articles. These Open Science Indicators will initially include (i) sharing of research data in repositories, (ii) public sharing of code and, (iii) preprint posting, for all PLOS articles from 2019 to present. These indicators – conceptualized by PLOS and developed with DataSeer, using an artificial intelligence-driven approach – are increasingly important to PLOS achieving its mission. We plan to share the results openly to support Open Science initiatives by the wider community.”

Democratizing Open Knowledge | Library Innovation Lab

Democratizing Open Knowledge is a three-year program at the Library Innovation Lab to explore the goals articulated in Harvard Library’s Advancing Open Knowledge strategy from a decentralized and generative perspective. If you like what you see here and want to collaborate, get in touch!

In “Advancing Open Knowledge,” Harvard Library outlines three strategic goals for libraries:

Diversify and Expand Access to Knowledge

The information globe is still dominated by the wealthiest nations and by inequitable systems of producing and sharing knowledge that are not representative of all voices.

Enhance Discovery and Engagement

We are witnessing a rise in disinformation, coupled with distrust of sources established as trustworthy. Information discovery mechanisms are also far from ideal.

Preserve for the Future

Preservation of information, particularly digital information, is an unsolved problem: information can be here today and gone tomorrow….”

Easing Into Open Science: A Guide for Graduate Students and Their Advisors | Collabra: Psychology | University of California Press

Abstract:  This article provides a roadmap to assist graduate students and their advisors to engage in open science practices. We suggest eight open science practices that novice graduate students could begin adopting today. The topics we cover include journal clubs, project workflow, preprints, reproducible code, data sharing, transparent writing, preregistration, and registered reports. To address concerns about not knowing how to engage in open science practices, we provide a difficulty rating of each behavior (easy, medium, difficult), present them in order of suggested adoption, and follow the format of what, why, how, and worries. We give graduate students ideas on how to approach conversations with their advisors/collaborators, ideas on how to integrate open science practices within the graduate school framework, and specific resources on how to engage with each behavior. We emphasize that engaging in open science behaviors need not be an all or nothing approach, but rather graduate students can engage with any number of the behaviors outlined.


Open Source Programme Offices (OSPOs) in the UN System – A Spotlight on WHO. 15-16 September 2022 | 77th United Nations General Assembly (UNGA)

“…Our first side-event will cover the highly anticipated launch of the Open Source Programme Office (OSPO) at the World Health Organization, which is the first OSPO in the entire UN system. Panelists from WHO, GitHub, and the United Nations Envoy on Technology will discuss the technical aspects and the vision of the WHO OSPO, why an OSPO is a major step forward for how WHO engages with open source technologies, and how an OSPO in the UN system can contribute to more equitable technology and inclusive economic growth….”

hosted by GitHub Social Impact, Tech for Social Good 


Design and development of an open-source framework for citizen-centric environmental monitoring and data analysis | Scientific Reports

Abstract:  Cities around the world are struggling with environmental pollution. The conventional monitoring approaches are not effective for undertaking large-scale environmental monitoring due to logistical and cost-related issues. The availability of low-cost and low-power Internet of Things (IoT) devices has proved to be an effective alternative to monitoring the environment. Such systems have opened up environment monitoring opportunities to citizens while simultaneously confronting them with challenges related to sensor accuracy and the accumulation of large data sets. Analyzing and interpreting sensor data itself is a formidable task that requires extensive computational resources and expertise. To address this challenge, a social, open-source, and citizen-centric IoT (Soc-IoT) framework is presented, which combines a real-time environmental sensing device with an intuitive data analysis and visualization application. Soc-IoT has two main components: (1) CoSense Unit—a resource-efficient, portable and modular device designed and evaluated for indoor and outdoor environmental monitoring, and (2) exploreR—an intuitive cross-platform data analysis and visualization application that offers a comprehensive set of tools for systematic analysis of sensor data without the need for coding. Developed as a proof-of-concept framework to monitor the environment at scale, Soc-IoT aims to promote environmental resilience and open innovation by lowering technological barriers.


Equitable Open-Source for web3

“The tools that build the internet have steeped too long. For the past two decades, big tech has made trillions off the generosity of visionary developers and web pioneers… never thanking, never mentioning, and certainly never paying. At tea, we’re brewing something to change that by enabling developers (you) to continue doing what you love, while earning what you deserve….

We’re calling on all open?source devs to authenticate their Github with tea.


Developers who have contributed to OSS will be entitled to a variety of rewards, including minted NFT badges to honor your work so far. This is your chance to be an early member of our community: take a sip while it’s hot!…”

A survey of researchers’ code sharing and code reuse practices, and assessment of interactive notebook prototypes [PeerJ]

Abstract:  This research aimed to understand the needs and habits of researchers in relation to code sharing and reuse; gather feedback on prototype code notebooks created by NeuroLibre; and help determine strategies that publishers could use to increase code sharing. We surveyed 188 researchers in computational biology. Respondents were asked about how often and why they look at code, which methods of accessing code they find useful and why, what aspects of code sharing are important to them, and how satisfied they are with their ability to complete these tasks. Respondents were asked to look at a prototype code notebook and give feedback on its features. Respondents were also asked how much time they spent preparing code and if they would be willing to increase this to use a code sharing tool, such as a notebook. As a reader of research articles the most common reason (70%) for looking at code was to gain a better understanding of the article. The most commonly encountered method for code sharing–linking articles to a code repository–was also the most useful method of accessing code from the reader’s perspective. As authors, the respondents were largely satisfied with their ability to carry out tasks related to code sharing. The most important of these tasks were ensuring that the code was running in the correct environment, and sharing code with good documentation. The average researcher, according to our results, is unwilling to incur additional costs (in time, effort or expenditure) that are currently needed to use code sharing tools alongside a publication. We infer this means we need different models for funding and producing interactive or executable research outputs if they are to reach a large number of researchers. For the purpose of increasing the amount of code shared by authors, PLOS Computational Biology is, as a result, focusing on policy rather than tools.


Needs for mobile-responsive institutional open access digital repositories | Emerald Insight

Abstract:  Purpose

The purpose of this study is to promote mobile-responsive and agile institutional open-access digital repositories. This paper provided an x-ray of the tilted research approach to open access (OA). Most underlying causes that inhibit OA, such as lack of mobile-friendly user interfaces, infrastructure development and digital divides, are not sufficiently addressed. This paper also indicated that academic libraries over-relied on open-source software and institutional repository, but most institutional repositories are merely “dumping sites” due to how information is classified and indexed.


This paper adopted meta-analysis by mining data sets from databases and provided thematic clustering of its content analysis through network visualisation to juxtapose the existing research gaps and lack of mobile-first insights needed to provide open-access information to the library’s users to consume information via mobile platforms. The retrieved dataset was discussed in tandem with the literature and the author’s insights into systems librarianship knowledge.


The library and information science (LIS) has not addressed how the academics could escape the pay-for-play cost, which was an exclusion tactic to disenfranchise emerging scholars and those without sufficient financial resources to choose between visibility, citation or publishing their outputs in journals without the possibility of citations, which is very important to their academic advancements. The LIS must shift its paradigm from mere talking about OA by producing graduates with the requisite skill to design, develop and host platforms that could enhance indexing and citations and import references. The current design of the institutional repository could be enhanced and promote easy navigation through mobile devices. Thereby taking into accounts internet bandwidth and digital divide, which still hinders accessibility of online resources.

Research limitations/implications

This paper covered research within the LIS fields, and other outputs from other disciplines on OA were not included.

Practical implications

This paper showed the gaps that existed within the LIS campaign on OA, the research focuses of the LIS scholars/research librarians and the needed practical solution for the academic libraries to move beyond OA campaign and reconfigure institutional repository, not as dumping sites, but as infrastructure to host peer-reviewed journals.

Towards Robust, Reproducible, and Clinically Actionable EEG Biomarkers: Large Open Access EEG Database for Discovery and Out-of-sample Validation – Hanneke van Dijk, Mark Koppenberg, Martijn Arns, 2022

“To aid researchers in development and validation of EEG biomarkers, and development of new (AI) methodologies, we hereby also announce our open access EEG dataset: the Two Decades Brainclinics Research Archive for Insights in Neuroscience (TDBRAIN)….

The whole raw EEG dataset as well as python code to preprocess the raw data is available at www.brainclinics.com/resources and can freely be downloaded using ORCID credentials….”

Say Hello to Anno : Hypothesis | 18 Aug 2022

“It’s been 11 years since we launched Hypothesis. It’s gone by so fast. During this time, we’ve accomplished many things: We defined a vision for open web annotation, we built an open source framework to implement it, we helped form and lead the working group that shipped the W3C standard, and we launched a service that’s now used by over a million people around the world who have made nearly 40 million annotations. In higher education, more than 1,200 colleges and universities use Hypothesis. And we’ve grown from a handful of people into a team of more than 35 passionate web builders. We’re not stopping here.

We’ve always had our sights set on the bigger idea: that this still-nascent effort can blossom into a true network of interoperable services — a rich ecosystem of collaboration, conversation and community over all knowledge. We believe that when incentives are aligned toward quality and away from monetizing attention, we can produce something of profound social importance. A utility layer for humanity. Since launch, the Hypothesis Project has been incorporated as a nonprofit. And while our nonprofit was an excellent home for our mission, it also limited us to grants and donations. Though we were beginning to provide services that we could charge for, we still needed capital to expand. Frustratingly, while our needs were growing, several of the key funding sources we’d relied on were no longer available to us as they shuttered programs or changed strategies. In 2019, we and others formed Invest In Open Infrastructure (IOI), an “initiative to dramatically increase the amount of funding available to open scholarly infrastructure.” We recruited Kaitlin Thaney to that effort, and she has been doing a terrific job laying the foundation for this. But all this would take time we didn’t have.

In response, and to better position us to achieve our long-held mission, we’ve formed Anno, a public benefit corporation (aka “Annotation Unlimited, PBC”) that shares the Hypothesis mission as well as its team. We’ve done this so that we can take investment in a mission aligned way and scale the Hypothesis service to meet the opportunity in front of us. Anno is funded by a $14M seed round that includes a $2.5M investment from ITHAKA, the nonprofit provider of JSTOR, a digital library that serves more than 13,000 education institutions around the world, providing access to more than 12 million journal articles, books, images and primary sources in 75 disciplines. Also participating in the round are At.inc, Triage Ventures, Esther Dyson, Mark Pincus and others. ITHAKA’s president, Kevin Guthrie, has joined Anno’s board as an observer….”