“Europeana Subtitled gathered seven major national broadcasters and audiovisual archives from seven European countries to provide high-quality audiovisual materials to Europeana. The project combined AI technology and audiovisual cultural heritage to produce high-quality closed captions and English subtitles for local video content, and created a platform to allow organisations to run crowdsourcing campaigns to revise captions using state of the art editing tools.
Europeana Subtitled also supported cultural heritage professionals with the use of automatic speech recognition (ASR) and machine translation (MT) technologies in the cultural sector through an online training suite consisting of video tutorials, documentation and guidelines, and worked with teachers and museum educators to create learning resources with audiovisual content.
Finally, the project engaged audiences through crowdsourcing events and editorial activities on the Europeana website, in particular, through the ‘Broadcasting Europe’ page and ‘Mass-media and propaganda’ online exhibition….
The Subtitled content is publicly available and videos can be enjoyed directly on the Europeana website, while you can also access freely reusable content with more than 3,000 records in the Public Domain….”
@recap.email is a system that gathers content from PACER and adds it to the RECAP Archive. If you receive notification emails from PACER, it only takes a minute to set up this system and contribute content to the public commons….”
“Open Food Facts is a database of food products with ingredients, allergens, nutrition facts and all the tidbits of information we can find on product labels.
Open Food Facts is a non-profit association of volunteers.
15000+ contributors like you have added 1 000 000+ products from 150 countries using our Android,iPhone or Windows Phone app or their camera to scan barcodes and upload pictures of products and their labels….
Data about food is of public interest and has to be open. The complete database is published as open data and can be reused by anyone and for any use. Check-out the cool reuses or make your own!…”
Abstract: Photovoltaic (PV) energy generation plays a crucial role in the energy transition. Small-scale, rooftop PV installations are deployed at an unprecedented pace, and their safe integration into the grid requires up-to-date, high-quality information. Overhead imagery is increasingly being used to improve the knowledge of rooftop PV installations with machine learning models capable of automatically mapping these installations. However, these models cannot be reliably transferred from one region or imagery source to another without incurring a decrease in accuracy. To address this issue, known as distribution shift, and foster the development of PV array mapping pipelines, we propose a dataset containing aerial images, segmentation masks, and installation metadata (i.e., technical characteristics). We provide installation metadata for more than 28000 installations. We supply ground truth segmentation masks for 13000 installations, including 7000 with annotations for two different image providers. Finally, we provide installation metadata that matches the annotation for more than 8000 installations. Dataset applications include end-to-end PV registry construction, robust PV installations mapping, and analysis of crowdsourced datasets.
“Peer Review is an experiment in scholarly publishing currently in Beta. It is a platform that enables crowdsourced peer review and public dissemination of scientific and academic papers. For now, the platform can only handle pre-prints. It is and will remain open source and diamond open access. It is currently being maintained by a single developer as a side project.
Peer Review uses a reputation system to ensure that review and refereeing is done by qualified peers. Reputation is primarily gained from publishing, but can also be gained from giving constructive reviews. Review is separated into pre-publish “review” and post-publish “refereeing”. Review is entirely focused on giving authors constructive, supportive feedback. Refereeing is intended to help maintain the integrity of the overall literature by identifying spam, malpractice, and misinformation. To learn more, please read how it works.”
The real value of open data for the research community is not to access them, but to process them as conveniently as possible in order to reduce time-to-result and increase productivity. RAISE project will provide the infrastructure for a distributed crowdsourced data processing system, moving from open data to open access data for processing.
“The purpose of this document is to organise ideas about open communication platforms for big team science and open research coordination. It can serve as a primer for those looking to set up a platform, or for ideas when developing new platforms. Please add yourself to the Document Contributors list if you make a contribution (feel free to edit anything). …”
“The Linux Foundation, a global nonprofit organization enabling innovation through open source, today announced the formation of the Overture Maps Foundation, a new collaborative effort to develop interoperable open map data as a shared asset that can strengthen mapping services worldwide. The initiative was founded by Amazon Web Services (AWS), Meta, Microsoft, and TomTom and is open to all communities with a common interest in building open map data.
Overture’s mission is to enable current and next-generation map products by creating reliable, easy-to-use, and interoperable open map data. This interoperable map is the basis for extensibility, enabling companies to contribute their own data. Members will combine resources to build map data that is complete, accurate, and refreshed as the physical world changes. Map data will be open and extensible by all under an open data license. This will drive innovation by enabling a network of communities that create services on top of Overture data….”
“Google Maps is getting some competition. The Linux Foundation has announced Overture Maps, a “new collaborative effort to develop interoperable open map data as a shared asset that can strengthen mapping services worldwide.” It’s an open source mapping effort that includes a list of heavy hitters: Amazon Web Services (AWS), Meta, Microsoft, and TomTom, with the foundation adding that the project is “open to all communities with a common interest in building open map data.”…
If you’re saying, “Wait! isn’t there already an open source map community out there?” There is, and it’s called “OpenStreetMap,” the Wikipedia of maps that anyone can edit. The Overture press release says, “The project will seek to integrate with existing open map data from projects such as OpenStreetMap and city planning departments, along with new map data contributed by members and built using computer vision and AI/ML techniques to create a living digital record of the physical world.” …”
“It seems like every couple of months, I get asked for advice on picking a Collection Management System (or maybe referred to as a digital repository, or something else) for use in an archive, special collection library, museum, or another small “GLAMorous” institution. The acronym is CMS, which is not to be confused with Content Management System (which is for your blog). This can be for collection management, digital asset management, collection description, digital preservation, public access and request support, or combinations of all of the above. And these things have to fit into an existing workflow/system, or maybe replace an old system and require a data migration component. And on top of that, there are so many options out there! This can be overwhelming!
What factors do you use in making a decision? I tried to put together some crucial components to consider, while keeping it as simple as possible (if 19 columns can be considered simple). I also want to be able to answer questions with a strong yes/no, to avoid getting bogged down in “well, kinda…” For example, I had a “Price” category and a “Handles complex media?” category but I took them away because it was too subjective of an issue to be able to give an easy answer. A lot of these are still going to be “well, kinda” and in that case, we should make a generalization. (Ah, this is where the “simple” part comes in!)
In the end, though, it is really going to depend on the unique needs of your institution, so the answer is always going to be “well, kinda?” But I hope this spreadsheet can be used as a starting point for those preparing to make a decision, or those who need to jog their memory with “Can this thing do that?”…”
“Another major change in scientific publishing could come from the same blockchain-based infrastructure that’s enabling the rise of the rest of decentralized science. Washington University faculty member and VitaDAO core contributor Tim Peterson proposed his own peer review alternative, called The Longevity Decentralized Review (TLDR), and is assembling a team of editors to begin reviewing papers on longevity and aging….
TLDR works a lot like Reddit: First researchers post their work publicly, either directly or to numerous so-called “pre-print” servers like bioRxiv or medRxiv. These have been around for several years but became much more influential during the COVID-19 pandemic because of the speed with which they could bring research to other scientists. Reviewers get paid by the TLDR site, which is funded through charitable donations and from anyone who would like their manuscript peer-reviewed. VitaDAO is one of the TLDR backers, offering $VITA tokens for peer review of longevity-related projects of interest to VitaDAO. It’s anybody’s guess whether this will result in meaningful income to reviewers, but it’ll be more than the zero dollars and zero cents they earn now….”
“I became a Wikipedian because of a belief that knowledge — and access to knowledge — matters. Wikipedia, more than anything else I could point to, offered a way to bring together and make sense of the sheer, overwhelming accumulation of human knowledge. Library stacks full of more books and journals than anyone could read in a hundred lifetimes! Surely this kind of intellectual connective tissue makes a difference! Until recently that was a matter of faith to me; no longer.
Three well-designed experiments from the last few years show some specific ways that Wikipedia has large, measurable effects in the real world — and hint at what I’ve long believed. When you improve Wikipedia, you can be confident that it’s reaching people, affecting what they think, what they write, and how they behave. The juice is worth the squeeze….”
Sciety is pleased to announce the first non-English group to bring open review and curation to the platform: ASAPbio–SciELO Preprints crowd review. Based in Brazil, the group reviews preprints relating to infectious disease research that are posted on the SciELO Preprints server in Brazilian Portuguese.
“Geospatial datacubes–large, complex, interrelated multidimensional arrays with rich metadata–arise in analysis-ready geopspatial imagery, level 3/4 satellite products, and especially in ocean / weather / climate simulations and [re]analyses, where they can reach Petabytes in size. The scientific python community has developed a powerful stack for flexible, high-performance analytics of databcubes in the cloud. Xarray provides a core data model and API for analysis of such multidimensional array data. Combined with Zarr or TileDB for efficient storage in object stores (e.g. S3) and Dask for scaling out compute, these tools allow organizations to deploy analytics and machine learning solutions for both exploratory research and production in any cloud platform. Within the geosciences, the Pangeo open science community has advanced this architecture as the “Pangeo platform” (http://pangeo.io/).
However, there is a major barrier preventing the community from easily transitioning to this cloud-native way of working: the difficulty of bringing existing data into the cloud in analysis-ready, cloud-optimized (ARCO) format. Typical workflows for moving data to the cloud currently consist of either bulk transfers of files into object storage (with a major performance penalty on subsequent analytics) or bespoke, case-by-case conversions to cloud optimized formats such as TileDB or Zarr. The high cost of this toil is preventing the scientific community from realizing the full benefits of cloud computing. More generally, the outputs of the toil of preparing scientific data for efficient analysis are rarely shared in an open, collaborative way.
To address these challenges, we are building Pangeo Forge ( https://pangeo-forge.org/), the first open-source cloud-native ETL (extract / transform / load) platform focused on multidimensional scientific data. Pangeo Forge consists of two main elements. An open-source python package–pangeo_forge_recipes–makes it simple for users to define “recipes” for extracting many individual files, combining them along arbitrary dimensions, and depositing ARCO datasets into object storage. These recipes can be “compiled” to run on many different distributed execution engines, including Dask, Prefect, and Apache Beam. The second element of Pangeo Forge is an orchestration backend which integrates tightly with GitHub as a continuous-integration-style service….”
“Is there an open science story that made you say, “I need to get involved?” Do you remember a story about the impact of open data, open-source software, open publications on science? How has open science enabled scientific breakthroughs? Tell us about it!
NASA’s Transform to Open Science (TOPS) mission is seeking compelling stories about open science in practice. To help transform NASA scientific processes to open science, we need to provide compelling and relatable examples that show how open science creates more impactful, efficient, inclusive science. By collecting these stories, we can start showing scientists how open science can help them.
We are looking for big, awe-inspiring stories about open data, open-source software, open results and open access, and the use of openly available tools for scientific practice. These stories could be about projects that utilize large amounts of data, about utilizing code on an open-source repository, about citizen scientists working together to identify constellations or clouds….”