The Collection Management System Collection · BLOG Progress Process

“It seems like every couple of months, I get asked for advice on picking a Collection Management System (or maybe referred to as a digital repository, or something else) for use in an archive, special collection library, museum, or another small “GLAMorous” institution. The acronym is CMS, which is not to be confused with Content Management System (which is for your blog). This can be for collection management, digital asset management, collection description, digital preservation, public access and request support, or combinations of all of the above. And these things have to fit into an existing workflow/system, or maybe replace an old system and require a data migration component. And on top of that, there are so many options out there! This can be overwhelming!

What factors do you use in making a decision? I tried to put together some crucial components to consider, while keeping it as simple as possible (if 19 columns can be considered simple). I also want to be able to answer questions with a strong yes/no, to avoid getting bogged down in “well, kinda…” For example, I had a “Price” category and a “Handles complex media?” category but I took them away because it was too subjective of an issue to be able to give an easy answer. A lot of these are still going to be “well, kinda” and in that case, we should make a generalization. (Ah, this is where the “simple” part comes in!)

In the end, though, it is really going to depend on the unique needs of your institution, so the answer is always going to be “well, kinda?” But I hope this spreadsheet can be used as a starting point for those preparing to make a decision, or those who need to jog their memory with “Can this thing do that?”…”

Does the Peer Review Process Need Blockchain? – NEO.LIFE

“Another major change in scientific publishing could come from the same blockchain-based infrastructure that’s enabling the rise of the rest of decentralized science. Washington University faculty member and VitaDAO core contributor Tim Peterson proposed his own peer review alternative, called The Longevity Decentralized Review (TLDR), and is assembling a team of editors to begin reviewing papers on longevity and aging….

 

TLDR works a lot like Reddit: First researchers post their work publicly, either directly or to numerous so-called “pre-print” servers like bioRxiv or medRxiv. These have been around for several years but became much more influential during the COVID-19 pandemic because of the speed with which they could bring research to other scientists. Reviewers get paid by the TLDR site, which is funded through charitable donations and from anyone who would like their manuscript peer-reviewed. VitaDAO is one of the TLDR backers, offering $VITA tokens for peer review of longevity-related projects of interest to VitaDAO. It’s anybody’s guess whether this will result in meaningful income to reviewers, but it’ll be more than the zero dollars and zero cents they earn now….”

Randomized controlled experiments hint at Wikipedia’s huge real-world impact – Wiki Education

“I became a Wikipedian because of a belief that knowledge — and access to knowledge — matters. Wikipedia, more than anything else I could point to, offered a way to bring together and make sense of the sheer, overwhelming accumulation of human knowledge. Library stacks full of more books and journals than anyone could read in a hundred lifetimes! Surely this kind of intellectual connective tissue makes a difference! Until recently that was a matter of faith to me; no longer.

Three well-designed experiments from the last few years show some specific ways that Wikipedia has large, measurable effects in the real world — and hint at what I’ve long believed. When you improve Wikipedia, you can be confident that it’s reaching people, affecting what they think, what they write, and how they behave. The juice is worth the squeeze….”

Sciety welcomes ASAPbio–SciELO Preprints crowd review for the evaluation of Brazilian-Portuguese preprints | For the press | eLife

Sciety is pleased to announce the first non-English group to bring open review and curation to the platform: ASAPbio–SciELO Preprints crowd review. Based in Brazil, the group reviews preprints relating to infectious disease research that are posted on the SciELO Preprints server in Brazilian Portuguese.

Pangeo Forge: Crowdsourcing Open Data in the Cloud :: FOSS4G 2022 general tracks :: pretalx

“Geospatial datacubes–large, complex, interrelated multidimensional arrays with rich metadata–arise in analysis-ready geopspatial imagery, level 3/4 satellite products, and especially in ocean / weather / climate simulations and [re]analyses, where they can reach Petabytes in size. The scientific python community has developed a powerful stack for flexible, high-performance analytics of databcubes in the cloud. Xarray provides a core data model and API for analysis of such multidimensional array data. Combined with Zarr or TileDB for efficient storage in object stores (e.g. S3) and Dask for scaling out compute, these tools allow organizations to deploy analytics and machine learning solutions for both exploratory research and production in any cloud platform. Within the geosciences, the Pangeo open science community has advanced this architecture as the “Pangeo platform” (http://pangeo.io/).

However, there is a major barrier preventing the community from easily transitioning to this cloud-native way of working: the difficulty of bringing existing data into the cloud in analysis-ready, cloud-optimized (ARCO) format. Typical workflows for moving data to the cloud currently consist of either bulk transfers of files into object storage (with a major performance penalty on subsequent analytics) or bespoke, case-by-case conversions to cloud optimized formats such as TileDB or Zarr. The high cost of this toil is preventing the scientific community from realizing the full benefits of cloud computing. More generally, the outputs of the toil of preparing scientific data for efficient analysis are rarely shared in an open, collaborative way.

To address these challenges, we are building Pangeo Forge ( https://pangeo-forge.org/), the first open-source cloud-native ETL (extract / transform / load) platform focused on multidimensional scientific data. Pangeo Forge consists of two main elements. An open-source python package–pangeo_forge_recipes–makes it simple for users to define “recipes” for extracting many individual files, combining them along arbitrary dimensions, and depositing ARCO datasets into object storage. These recipes can be “compiled” to run on many different distributed execution engines, including Dask, Prefect, and Apache Beam. The second element of Pangeo Forge is an orchestration backend which integrates tightly with GitHub as a continuous-integration-style service….”

Open Science Stories

“Is there an open science story that made you say, “I need to get involved?” Do you remember a story about the impact of open data, open-source software, open publications on science? How has open science enabled scientific breakthroughs? Tell us about it! 

NASA’s Transform to Open Science (TOPS) mission is seeking compelling stories about open science in practice. To help transform NASA scientific processes to open science, we need to provide compelling and relatable examples that show how open science creates more impactful, efficient, inclusive science. By collecting these stories, we can start showing scientists how open science can help them.

We are looking for big, awe-inspiring stories about open data, open-source software, open results and open access, and the use of openly available tools for scientific practice. These stories could be about projects that utilize large amounts of data, about utilizing code on an open-source repository, about citizen scientists working together to identify constellations or clouds….”

[2208.08426] “We Need a Woman in Music”: Exploring Wikipedia’s Values on Article Priority

Abstract:  Wikipedia — like most peer production communities — suffers from a basic problem: the amount of work that needs to be done (articles to be created and improved) exceeds the available resources (editor effort). Recommender systems have been deployed to address this problem, but they have tended to recommend work tasks that match individuals’ personal interests, ignoring more global community values. In English Wikipedia, discussion about Vital articles constitutes a proxy for community values about the types of articles that are most important, and should therefore be prioritized for improvement. We first analyzed these discussions, finding that an article’s priority is considered a function of 1) its inherent importance and 2) its effects on Wikipedia’s global composition. One important example of the second consideration is balance, including along the dimensions of gender and geography. We then conducted a quantitative analysis evaluating how four different article prioritization methods — two from prior research — would affect Wikipedia’s overall balance on these two dimensions; we found significant differences among the methods. We discuss the implications of our results, including particularly how they can guide the design of recommender systems that take into account community values, not just individuals’ interests.

 

[2208.08426] “We Need a Woman in Music”: Exploring Wikipedia’s Values on Article Priority

Abstract:  Wikipedia — like most peer production communities — suffers from a basic problem: the amount of work that needs to be done (articles to be created and improved) exceeds the available resources (editor effort). Recommender systems have been deployed to address this problem, but they have tended to recommend work tasks that match individuals’ personal interests, ignoring more global community values. In English Wikipedia, discussion about Vital articles constitutes a proxy for community values about the types of articles that are most important, and should therefore be prioritized for improvement. We first analyzed these discussions, finding that an article’s priority is considered a function of 1) its inherent importance and 2) its effects on Wikipedia’s global composition. One important example of the second consideration is balance, including along the dimensions of gender and geography. We then conducted a quantitative analysis evaluating how four different article prioritization methods — two from prior research — would affect Wikipedia’s overall balance on these two dimensions; we found significant differences among the methods. We discuss the implications of our results, including particularly how they can guide the design of recommender systems that take into account community values, not just individuals’ interests.

 

delightful open science

“This Open Science list is open, just like Open Science itself. What is delightful is rather subjective, because of the background of the initiators the list has started quite nerdy and focussed on infrastructure and scholarly communication. Please help and add more information by adding an “issue” or making a pull request (both options in menu above), especially on topics around reproducibility, meta-science and outreach, where this list is weaker….”

A Possible Fix For Scientific (and Academic) Publishing | Peer Review – News and Blog

“This is a proposal for a software platform that may help the academic community solve these problems, and more….

Peer Review [the proposed platform] allows scholars, scientists, academics, and researchers to self organize their own peer review and refereeing, without needing journal editors to manually mediate it. The platform allows review and refereeing to be crowdsourced, using a reputation system tied to academic fields to determine who should be able to offer review and to referee.

The platform splits pre-publish peer review from post-publish refereeing. Pre-publish review then becomes completely about helping authors polish their work and decide if their articles are ready to publish. Refereeing happens post-publish, and in a way which is easily understandable to the lay reader, helping the general public sort solid studies from shakey ones.

 

Peer Review is being developed open source. The hope is to form a non-profit to develop it which would be governed by the community of academics who use the platform in collaboration with the team of software professionals who build it (a multi-stakeholder cooperative)….”

Supporting public preprint review through collaborative reviews – an update on ASAPbio’s crowd preprint review – ASAPbio

“Through our crowd preprint review activities we seek to draw on the collective input of a group of commenters who each can comment on the preprint according to their level of expertise and interest. We are midway through our activities for 2022 and we wanted to share an update on our progress.

What have we accomplished so far?

We had a great response from the community with over 120 crowd reviewers signed up so far, with strong representation of early career researchers. We have three groups which complete reviews of preprints in each of the disciplines below:

Cell biology – a crowd of 70 members reviews preprints posted on bioRxiv 
Biochemistry – a crowd of 35 researchers reviews preprints from bioRxiv 
Infectious diseases preprints in Portuguese – a crowd of 30 researchers provide reviews in Portuguese for preprints posted in SciELO Preprints

For each of the groups, a group of ASAPbio Fellows and partners from SciELO Preprints are involved in selecting preprints to review and summarizing the comments received. They also provide regular feedback on aspects of the process that can be adjusted or improved. 

We circulate a new preprint to each group every week and invite comments via a Google document. We have seen a great level of engagement from reviewers, and are particularly pleased to see the interactions among reviewers in the collaborative documents, where they provide comments and feedback to each other, not only about the preprints but also about queries that may arise during their review….”

We’ve passed 100,000,000 verifiable observations on iNaturalist! · iNaturalist

“If you made 1,000 observations a day, every day, it would take you 274 years to generate 100 million observations. This milestone shows what people can do by working together. The iNaturalist dataset is something we’ve all made together, but it’s larger than any one of us. We hope everyone is as proud of this accomplishment as we are. Together, the iNaturalist community has created a unique window into life on Earth and hundreds of thousands of species with whom we share the planet. Thank you!

We know that even more potential for iNaturalist lies ahead. To fulfill our mission of connecting people to nature and advancing science and conservation, we’re working on a strategy to reach 100 million naturalists by 2030. This requires investing in technology improvements, so we’re now searching for two new software engineers to join the iNat team. Please spread the word to help us find great candidates….”

The LOTUS initiative for open knowledge management in natural products research | eLife

Abstract:  Contemporary bioinformatic and chemoinformatic capabilities hold promise to reshape knowledge management, analysis and interpretation of data in natural products research. Currently, reliance on a disparate set of non-standardized, insular, and specialized databases presents a series of challenges for data access, both within the discipline and for integration and interoperability between related fields. The fundamental elements of exchange are referenced structure-organism pairs that establish relationships between distinct molecular structures and the living organisms from which they were identified. Consolidating and sharing such information via an open platform has strong transformative potential for natural products research and beyond. This is the ultimate goal of the newly established LOTUS initiative, which has now completed the first steps toward the harmonization, curation, validation and open dissemination of 750,000+ referenced structure-organism pairs. LOTUS data is hosted on Wikidata and regularly mirrored on https://lotus.naturalproducts.net. Data sharing within the Wikidata framework broadens data access and interoperability, opening new possibilities for community curation and evolving publication models. Furthermore, embedding LOTUS data into the vast Wikidata knowledge graph will facilitate new biological and chemical insights. The LOTUS initiative represents an important advancement in the design and deployment of a comprehensive and collaborative natural products knowledge base.