Visualizing the research ecosystem via Wikidata and Scholia | Zenodo

“Research takes place in a sociotechnical ecosystem that connects researchers with the objects of study and the natural and cultural worlds around them.

Wikidata is a community-curated open knowledge base in which concepts covered in any Wikipedia — and beyond — can be described and annotated collaboratively.

This session is devoted to demoing Scholia, an open-source tool to visualize the global research ecosystem based on information in Wikidata about research fields, researchers, institutions, funders, databases, locations, publications, methodologies and related concepts….”

Visualizing Book Usage Statistics with Metabase · punctum books

“There is an inherent contradiction between publishing open access books and gathering usage statistics. Open access books are meant to be copied, shared, and spread without any limit, and the absence of any Digital Rights Management (DRM) technology in our PDFs makes it indeed impossible to do so. Nevertheless, we can gather an approximate impression of book usage among certain communities, such as hardcopy readers and those connected to academic infrastructures, by gathering data from various platforms and correlating them. These data are useful for both our authors and supporting libraries to gain insight into the usage of punctum publications.undefined

As there exists no ready-made open-source solution that we know of to accomplish this, for many years we struggled to import these data from various sources into ever-growing spreadsheets, with ever more complicated formulas to extract meaningful data and visualize them. This year, we decided to split up the database and correlation/visualization aspects, by moving the data into a MySQL database managed via phpMyAdmin, while using Metabase for the correlation and visualization part. This allows us to expose our usage data publicly, while also keeping them secure….”

Visualizing Book Usage Statistics with Metabase · punctum books

“There is an inherent contradiction between publishing open access books and gathering usage statistics. Open access books are meant to be copied, shared, and spread without any limit, and the absence of any Digital Rights Management (DRM) technology in our PDFs makes it indeed impossible to do so. Nevertheless, we can gather an approximate impression of book usage among certain communities, such as hardcopy readers and those connected to academic infrastructures, by gathering data from various platforms and correlating them. These data are useful for both our authors and supporting libraries to gain insight into the usage of punctum publications.undefined

As there exists no ready-made open-source solution that we know of to accomplish this, for many years we struggled to import these data from various sources into ever-growing spreadsheets, with ever more complicated formulas to extract meaningful data and visualize them. This year, we decided to split up the database and correlation/visualization aspects, by moving the data into a MySQL database managed via phpMyAdmin, while using Metabase for the correlation and visualization part. This allows us to expose our usage data publicly, while also keeping them secure….”

Constellate

“Learn how to text mine or improve your skills using our self-guided lessons for all experience levels. Each lesson includes video instruction and your own Jupyter notebook — think of it like an executable textbook — ready to run in our Analytics Lab….

Teach text analytics to all skill levels using our library of open education resources, including lessons plans and our suite of Jupyter notebooks. Eliminate setup time by hosting your class in our Analytics Lab….

Create a ready-to-analyze dataset with point-and-click ease from over 30 million documents, including primary and secondary texts relevant to every discipline and perfect for learning text analytics or conducting original research….

Find patterns in your dataset with ready-made visualizations, or conduct more sophisticated text mining in our Analytics Lab using Jupyter notebooks configured for a range of text analytics methods….”

How data sharing is accelerating railway safety research

“Andre?’s dataset was shortlisted for the Mendeley Data FAIRest Datasets Award, which recognizes researchers who make their data available for the research community in a way that exemplifies the FAIR Data Principles – Findable, Accessible, Interoperable, Reusable. The dataset was applauded for a number of reasons, not least the provision of clear steps to reproduce the data. What’s more, the data was clearly catalogued and stored in sub folders, with additional links to Blender and GitHub, making the dataset easily available and reproducible for all….”

Library Leaders Forum 2020: Community : Internet Archive : Free Download, Borrow, and Streaming : Internet Archive

“Video recording from the Library Leaders Forum: Community session. October 13, 2020.

A community of practice has emerged around Controlled Digital Lending, and its utility for libraries and educators has been amply demonstrated during library and school closures due to COVID-19. There are now hundreds of libraries that are participating in Controlled Digital Lending programs and using the library practice to reach their patrons while service is disrupted. In this session you’ll learn from librarians, educators, and technologists who are developing next generation library tools that incorporate and build upon Controlled Digital Lending….”

 

Exploration Engines – the koodos collective

“Serendipitous use of the internet is slowly going extinct as we replace link-hopping with the algorithmic-feed. Ranked results and recommendations have become the dominant mode of exploring information online. In this experiment, we break away from this paradigm, and present Wikigraph – our project for Interhackt. While a “search engine” returns a ranked list of results, Wikigraph returns the most relevant sub-graph of pages. Such an application we term an “exploration engine.”…”

Visualizing Altmetric data with VOSviewer – Altmetric

“Visualizations can make data come alive, uncover new insights and capture the imagination in a way that a spreadsheet never can.

Join Mike Taylor, Data Insights & Customer Analytics at Altmetric, and Fabio Gouveia, Public Health Technologist at Oswaldo Cruz Foundation in Brazil, for a demonstration of the exciting ways in which you can create compelling stories to explain the broader impact of academic work using the free-to-download VOSviewer from CWTS Leiden and data from Altmetric.

This actionable webinar will include an introduction to creating network diagrams with VOSviewer with your own data, extracting data from Altmetric tools and adapting it to be imported….”

citizenscience, Twitter, 11/5/2020 4:27:37 AM, 239488

“The graph represents a network of 3,914 Twitter users whose tweets in the requested range contained “citizenscience”, or who were replied to or mentioned in those tweets. The network was obtained from the NodeXL Graph Server on Thursday, 05 November 2020 at 04:07 UTC.

The requested start date was Thursday, 05 November 2020 at 01:01 UTC and the maximum number of days (going backward) was 14.

The maximum number of tweets collected was 7,500.

The tweets in the network were tweeted over the 13-day, 18-hour, 29-minute period from Thursday, 22 October 2020 at 01:42 UTC to Wednesday, 04 November 2020 at 20:11 UTC.

Additional tweets that were mentioned in this data set were also collected from prior time periods. These tweets may expand the complete time period of the data.

There is an edge for each “replies-to” relationship in a tweet, an edge for each “mentions” relationship in a tweet, and a self-loop edge for each tweet that is not a “replies-to” or “mentions”.

The graph is directed.

The graph’s vertices were grouped by cluster using the Clauset-Newman-Moore cluster algorithm.

The graph was laid out using the Harel-Koren Fast Multiscale layout algorithm….”

The Linked Commons 2.0: What’s New?

This is part of a series of posts introducing the projects built by open source contributors mentored by Creative Commons during Google Summer of Code (GSoC) 2020 and Outreachy. Subham Sahu was one of those contributors and we are grateful for his work on this project.


The CC Catalog data visualization—the Linked Commons 2.0—is a web application which aims to showcase and establish a relationship between the millions of data points of CC-licensed content using graphs. In this blog, I’ll discuss the motivation for this visualization and explore the latest features of the newest edition of the Linked Commons.

Motivation

The number of websites using CC-licensed content is enormous, and snowballing. The CC Catalog collects and stores these millions of data points, and each node (a unit in a data structure) contains information about the URL of the websites and the licenses used. It’s possible to do rigorous data analysis in order to understand fully how these are interconnected and to identify trends, but this would be exclusive to those with a technical background. However, by visualizing the data, it becomes easier to identify broad patterns and trends.

For example, by identifying other websites that are linking to your content, you can try to have a specific outreach program or collaborate with them. In this way out of billions of webpages out there on the web, you can very efficiently focus on the webpages where you are more likely to see an increase in growth.

Latest Features

Let’s look at some of the new features in the Linked Commons 2.0.

  • Filtering based on the node name

The Linked Commons 2.0 allows users to search for their favorite node and then explore all of that node’s neighbors across the thousands present in the database. We have color-coded the links connecting the neighbors to the root node, as well as the neighbors which are connected to the root node differently. This makes it immaculately easy for users to classify the neighbors into two categories.

  • A sleek and revamped design

The Linked Commons 2.0 has a sleek design, with a clean and refreshing look along with both a light and dark theme.

The Linked Commons new design

  • Tools for smooth interaction with the canvas

The Linked Commons 2.0 ships with a few tools that allow the user to zoom in, zoom out, and reset zoom with just one tap. It is especially useful to users who are on touch devices or using a trackpad.

The Linked Commons toolbox

  • Autocomplete feature

The current database of the Linked Commons 2.0 contains around 240 thousand nodes and 4.14 million links. Unfortunately, some of the node names are uncommon and lengthy. To prevent users from the exhausting work of typing complete node names, this version ships with an autocomplete feature: for every keystroke, node names will appear that correspond with what the user might be looking for.

The Linked Commons autocomplete

What’s next for the Linked Commons?

In the current version, there are some nodes which are very densely connected. For example, the node “Wikipedia” has around 89k nodes and 102k links as neighbours. This number is too big for web browsers to render. Therefore, we need to configure a way to reduce this to a more reasonable number.

During the preprocessing, we dropped a lot of the nodes and removed more than 3 million nodes which didn’t have CC license information. In general, the current version shows only those nodes which are soundly linked with other domains and their licenses information is available. However, to provide a more complete picture of the CC Catalog, the Linked Commons needs additional filtering methods and other tools. These potentially include:

  • filtering based on Top-Level domain
  • filtering based on the number of web links associated with a node 

Contributing

We plan to continue working on the Linked Commons. You can follow the project development by visiting our GitHub repo. We encourage you to contribute to the Linked Commons, by reporting bugs, suggesting features or by helping us write code. The new Linked Commons makes it easy for anyone to set up the development environment.

The project consists of a dedicated server which powers the filtering by node name and query autocompletion. The frontend is built using ReactJS, for smooth rendering performance. So, it doesn’t matter whether you’re a frontend developer, a backend developer, or a designer: there is some part of the Linked Commons that you can work on and improve. We look forward to seeing you on board with sparkling ideas!

We are extremely proud and grateful for the work done by Subham Sahu throughout his 2020 Google Summer of Code internship. We look forward to his continued contributions to the Linked Commons as a project core committer in the CC Open Source Community! 

Please consider supporting Creative Commons’ open source work on GitHub Sponsors.

The post The Linked Commons 2.0: What’s New? appeared first on Creative Commons.

COVID?19 and the boundaries of open science and innovation: Lessons of traceability from genomic data sharing and biosecurity: EMBO reports: Vol 0, No 0

“While conventional policies and systems for data sharing and scholarly publishing are being challenged and new Open Science policies are being developed, traceability should be a key function for guaranteeing socially responsible and robust policies. Full access to the available data and the ability to trace it back to its origins assure data quality and processing legitimacy. Moreover, traceability would be important for other agencies and organisations – funding agencies, database managers, institutional review boards and so on – for undertaking systematic reviews, data curation or process oversights. Thus, the term “openness” means much more than just open access to published data but must include all aspects of data generation, analysis and dissemination along with other organisations and agencies than just research groups and publishers. The COVID?19 crisis has highlighted the challenges and shortfalls of the current notions of openness and it should serve as an impetus to further advance towards real Open Science.”

 

Digital Humanities Research Platform

“The Academia Sinica Digital Humanities Research Platform develops digital tools to meet the demands of humanities research, assisting scholars in upgrading the quality of their research. We hope to integrate researchers, research data, and research tools to broaden the scope of research and cut down research time. The Platform provides a comprehensive research environment with cloud computing services, offering all the data and tools scholars require. Researchers can upload texts and authority files, or use others’ open texts and authority files available on the platform. Authority terms possess both manual and automatic text tagging functions, and can be hierarchically categorized. Once text tagging is complete, you can calculate authority term and N-gram statistics, or conduct term co-occurrence analysis, and then present results through data visualization methods such as statistical charts, word clouds, social analysis graphs, and maps. Furthermore, Boolean search, word proximity search, and statistical filtering, enabling researchers to easily carry out textual analysis.”