From little acorns . . . A retrospective on OpenCitations | OpenCitations blog

“Now that OpenCitations is hosting over one billion freely available scholarly bibliographic citations, this is perhaps an opportune moment to look back to the start of this initiative. A little over eleven years ago, on 24 April 2010, I spoke at the Open Knowledge Foundation Conference, OKCon2010, in London, on the topic

OpenCitations: Publishing Bibliographic Citations as Linked Open Data

I reported that, earlier that same week, I had applied to Jisc for a one-year grant to fund the OpenCitations Project (opencitations.net). Jisc (at that time ‘The JISC’, the Joint Information Systems Committee) was tasked by the UK government, among other things, to support research and development in information technology for the benefit of the academic community.

The purpose of that original OpenCitations R&D project was to develop a prototype in which we:

harvested citations from the open access biomedical literature in PubMed Central;
described and linked them using CiTO, the Citation Typing Ontology [1];
encoded and organized them in an RDF triplestore; and
published them as Linked Open Data in the OpenCitations Corpus (OCC)….”

Reimagining Wikidata from the margins | document/manifesto | June-October 2021

“Wikidata’s ecosystem has been rapidly growing over the last nine years …[but] we’re still missing Global South and other marginalized communities from the North – both in data and in contributors….

Reimagining Wikidata from the margins is a process that precedes WikidataCon 2021 and that potentially will continue after the conference in October. This is an invitation to communities and individuals from underrepresented groups (plus good allies!) so we can better understand and envision possible ways for our meaningful and empowered agency in the broad Wikidata ecosystem!

Objectives

Awaken the debate within a diverse range of communities from Global South and other marginalized communities
Foment bonds between communities in similar contexts
Identify specific and general problems/challenges/needs/expectations from those communities regarding Wikidata
Elaborate collaboratively a strategy for decentering Wikidata from its current focus on North America and Europe
Consolidate a document/manifesto
Integrate to the conference program this decentering perspectives and the Global South voice…

How will it work?…After the rounds of conversations, we will gather a group of volunteers who participated in them to draft a document, summarizing the discussions and looking for possible ways to further integrate data and volunteers from marginalized communities on Wikidata. The document will be released during WikidataCon 2021, in the hope that it will be an initial step towards effectively decentering Wikidata and lifting up underrepresented voices….

Timeline:

June/July: First round of discussions with local communities
August: Round of thematic meetings with people from different locations
September: A volunteer committee engaged on previous discussions gather for writing a document/manifesto
October: Final review of the document and launching at WikidataCon 2021…”

WikidataCon 2021 | Distributed conference | 29-30-31 October 2021

WikidataCon 2021 | Distributed conference | 29-30-31 October 2021

“Save the date! After the first two editions in 2017 and 2019, the WikidataCon is taking place again in October 2021.

The WikidataCon is an event focused on the Wikidata community in a broad sense: editors, tools builders, but also 3rd party reusers, partner organizations that are using or contributing to the data, the ecosystem of organizations working with Wikibase. The content of the conference will have some parts dedicated to people who want to learn more about Wikidata, some workshops and discussions for the community to share skills and exchange about their practices, and some space left to include side events for specific projects (WikiCite, Wikibase, GLAM, etc.).

Important: as the global COVID pandemic is still hitting the world, and the forecast for 2021 doesn’t indicate much improvement, the situation doesn’t allow us to plan a traditional onsite international conference. In 2021, we will not gather all participants in Berlin, and we will avoid any international travel. Instead, we are experimenting with a hybrid format for the conference: most of the content and interactions will take place online, and small, local gatherings will be possible, if the situation allows it….”

IFLA signs the WikiLibrary Manifesto

“IFLA has endorsed the WikiLibrary Manifesto, aimed at connecting libraries and Wikimedia projects such as Wikibase in order to promote the dissemination of knowledge in open formats, especially in linked open data networks….”

Semantic micro-contributions with decentralized nanopublication services [PeerJ]

Abstract:  While the publication of Linked Data has become increasingly common, the process tends to be a relatively complicated and heavy-weight one. Linked Data is typically published by centralized entities in the form of larger dataset releases, which has the downside that there is a central bottleneck in the form of the organization or individual responsible for the releases. Moreover, certain kinds of data entries, in particular those with subjective or original content, currently do not fit into any existing dataset and are therefore more difficult to publish. To address these problems, we present here an approach to use nanopublications and a decentralized network of services to allow users to directly publish small Linked Data statements through a simple and user-friendly interface, called Nanobench, powered by semantic templates that are themselves published as nanopublications. The published nanopublications are cryptographically verifiable and can be queried through a redundant and decentralized network of services, based on the grlc API generator and a new quad extension of Triple Pattern Fragments. We show here that these two kinds of services are complementary and together allow us to query nanopublications in a reliable and efficient manner. We also show that Nanobench makes it indeed very easy for users to publish Linked Data statements, even for those who have no prior experience in Linked Data publishing.

 

Why researchers created a database of half a million journal editors | Nature Index

“In an attempt to capture that information, Pacher and his colleagues created Open Editors, a database containing information such as names, affiliations and editorial roles of just under half a million editors working for more than 6,000 journals run by 17 scholarly publishers.

They outline their initiative in a SocArXiv preprint paper published on 11 March.

Although Open Editors already includes editor data from publishing heavyweights such as Elsevier and Cambridge University Press, Pacher says, other major players such as Springer Nature, John Wiley & Sons, and Taylor and Francis are so far missing.

Pacher has made the data and code freely available to encourage other academics to help build the database….”

Open Editors

“Open Editors collects publicly available information about the editors and editorial boards of scholarly journals through a technique called webscraping, whereby a script accesses the websites of the publishers to extract the relevant information. The codes (programmed in R) are available at GitHub….”

Open Editors: A Dataset of Scholarly Journals’ Editorial Board Positions

Abstract:  Editormetrics analyse the role of editors of academic journals and their impact on the scientific publication system. However, such analyses would best rely on open, structured and machine-readable data on editors and editorial boards, whose availability still remains rare. To address this shortcoming, the project Open Editors collects data about academic journal editors on a large scale and structures them into a single dataset. It does so by scraping the websites of 6.090 journals from 17 publishers, thereby structuring publicly available information (names, affiliations, editorial roles etc.) about 478.563 researchers. The project will iterate this webscraping procedure annually to enable insights into the changes of editorial boards over time. All codes and data are made available at GitHub, while the result is browsable at a dedicated website (https://openeditors.ooir.org). This dataset carries wide-ranging implications for meta-scientific investigations into the landscape of scholarly publications, including for bibliometric analyses, and allows for critical inquiries into the representation of diversity and inclusivity. It also contributes to the goal of expanding linked open data within science to evaluate and reflect on the scholarly publication process.

The Linked Commons 2.0: What’s New?

This is part of a series of posts introducing the projects built by open source contributors mentored by Creative Commons during Google Summer of Code (GSoC) 2020 and Outreachy. Subham Sahu was one of those contributors and we are grateful for his work on this project.


The CC Catalog data visualization—the Linked Commons 2.0—is a web application which aims to showcase and establish a relationship between the millions of data points of CC-licensed content using graphs. In this blog, I’ll discuss the motivation for this visualization and explore the latest features of the newest edition of the Linked Commons.

Motivation

The number of websites using CC-licensed content is enormous, and snowballing. The CC Catalog collects and stores these millions of data points, and each node (a unit in a data structure) contains information about the URL of the websites and the licenses used. It’s possible to do rigorous data analysis in order to understand fully how these are interconnected and to identify trends, but this would be exclusive to those with a technical background. However, by visualizing the data, it becomes easier to identify broad patterns and trends.

For example, by identifying other websites that are linking to your content, you can try to have a specific outreach program or collaborate with them. In this way out of billions of webpages out there on the web, you can very efficiently focus on the webpages where you are more likely to see an increase in growth.

Latest Features

Let’s look at some of the new features in the Linked Commons 2.0.

  • Filtering based on the node name

The Linked Commons 2.0 allows users to search for their favorite node and then explore all of that node’s neighbors across the thousands present in the database. We have color-coded the links connecting the neighbors to the root node, as well as the neighbors which are connected to the root node differently. This makes it immaculately easy for users to classify the neighbors into two categories.

  • A sleek and revamped design

The Linked Commons 2.0 has a sleek design, with a clean and refreshing look along with both a light and dark theme.

The Linked Commons new design

  • Tools for smooth interaction with the canvas

The Linked Commons 2.0 ships with a few tools that allow the user to zoom in, zoom out, and reset zoom with just one tap. It is especially useful to users who are on touch devices or using a trackpad.

The Linked Commons toolbox

  • Autocomplete feature

The current database of the Linked Commons 2.0 contains around 240 thousand nodes and 4.14 million links. Unfortunately, some of the node names are uncommon and lengthy. To prevent users from the exhausting work of typing complete node names, this version ships with an autocomplete feature: for every keystroke, node names will appear that correspond with what the user might be looking for.

The Linked Commons autocomplete

What’s next for the Linked Commons?

In the current version, there are some nodes which are very densely connected. For example, the node “Wikipedia” has around 89k nodes and 102k links as neighbours. This number is too big for web browsers to render. Therefore, we need to configure a way to reduce this to a more reasonable number.

During the preprocessing, we dropped a lot of the nodes and removed more than 3 million nodes which didn’t have CC license information. In general, the current version shows only those nodes which are soundly linked with other domains and their licenses information is available. However, to provide a more complete picture of the CC Catalog, the Linked Commons needs additional filtering methods and other tools. These potentially include:

  • filtering based on Top-Level domain
  • filtering based on the number of web links associated with a node 

Contributing

We plan to continue working on the Linked Commons. You can follow the project development by visiting our GitHub repo. We encourage you to contribute to the Linked Commons, by reporting bugs, suggesting features or by helping us write code. The new Linked Commons makes it easy for anyone to set up the development environment.

The project consists of a dedicated server which powers the filtering by node name and query autocompletion. The frontend is built using ReactJS, for smooth rendering performance. So, it doesn’t matter whether you’re a frontend developer, a backend developer, or a designer: there is some part of the Linked Commons that you can work on and improve. We look forward to seeing you on board with sparkling ideas!

We are extremely proud and grateful for the work done by Subham Sahu throughout his 2020 Google Summer of Code internship. We look forward to his continued contributions to the Linked Commons as a project core committer in the CC Open Source Community! 

Please consider supporting Creative Commons’ open source work on GitHub Sponsors.

The post The Linked Commons 2.0: What’s New? appeared first on Creative Commons.

Wikidata, Wikibase and the library linked data ecosystem: an OCLC Research Library Partnership discussion – Hanging Together

“n late July the OCLC Research Library Partnership convened a discussion that reflected on the current state of linked data. The discussion format was (for us) experimental — we invited participants to prepare by viewing a pre-recorded presentation, Re-envisioning the fabric of the bibliographic universe – From promise to reality* The presentation covers experiences of national and research libraries as well as OCLC’s own journey in linked data exploration. OCLC Researchers Annette Dortmund and Karen Smith-Yoshimura looked at relevant milestones in the journey from entity-based description research, prototypes, and on to actual practices, based on work that has been undertaken with library partners right up to the present day….”