Abstract: Sparked by issues of quality and lack of proper documentation for datasets, the machine learning community has begun developing standardised processes for establishing datasheets for machine learning datasets, with the intent to provide context and information on provenance, purposes, composition, the collection process, recommended uses or societal biases reflected in training datasets. This approach fits well with practices and procedures established in GLAM institutions, such as establishing collections’ descriptions. However, digital cultural heritage datasets are marked by specific characteristics. They are often the product of multiple layers of selection; they may have been created for different purposes than establishing a statistical sample according to a specific research question; they change over time and are heterogeneous. Punctuated by a series of recommendations to create datasheets for digital cultural heritage, the paper addresses the scope and characteristics of digital cultural heritage datasets; possible metrics and measures; lessons from concepts similar to datasheets and/or established workflows in the cultural heritage sector. This paper includes a proposal for a datasheet template that has been adapted for use in cultural heritage institutions, and which proposes to incorporate information on the motivation and selection criteria, digitisation pipeline, data provenance, the use of linked open data, and version information.
“.Despite the value of open bibliographic resources, they can involve inconsistencies that should be solved for better accuracy. As an example, OpenCitations mistakenly includes 1370 self-citations and 1498 symmetric citations as of April 30, 20221 . As well, they can involve several biases that can provide a distorted mirror of the research efforts across the world (Martín-Martín, Thelwall, Orduna-Malea, & Delgado López-Cózar, 2021). That is why these databases need to be enhanced from the perspective of data modeling, data collection, and data reuse. This goes in line with the current perspective of the European Union on reforming research assessment (CoARA, 2022). In this topical collection, we are honored to feature novel research works in the context of allowing the automatic generation of realtime research assessment reports based on open bibliographic resources. We are happy to host research efforts emphasizing the importance of open research data as a basis for transparent and responsible research assessment, assessing the data quality of open resources to be used in real-time research evaluation, and providing implementations of how online databases can be combined to feed dashboards for real-time scholarly assessment….”
Abstract: Among metadata-related standards, data content standards like metadata guidelines and instructions for creating metadata still remain in legacy forms. This study investigates a way to transform data content standards to linked data (LD) through conversion from other formats, while referring to the proposed layered framework. Under the basic policies for making LD on which this study is based, several principal matters were examined: (a) defining units to be assigned with Universal Resource Identifiers (URIs), (b) defining relationships among the instructions with URIs and (c) expressing instructed content in instructions properly with certain Resource Description Framework (RDF) properties. With the proper choice(s) for each matter, some actual standards were converted to LD: Resource Description and Access (RDA) and Dublin Core User Guide. The results showed that the adopted way of transforming data content standards to LD is valid and proper, and the resultant LD would be expected to be utilised in various manners.
Pausing our LOD services
“The ARIADNEplus project is the extension of the previous ARIADNE Integrating Activity, which successfully integrated archaeological data infrastructures in Europe, indexing in its registry about 2.000.000 datasets (ARIADNE portal). ARIADNEplus will build on the ARIADNE results, extending and supporting the research community that the previous project created and further developing the relationships with key stakeholders such as the most important European archaeological associations, researchers, heritage professionals, national heritage agencies and so on. The new enlarged partnership of ARIADNEplus covers all of Europe. It now includes leaders in different archaeological domains like palaeoanthropology, bioarchaeology and environmental archaeology as well as other sectors of archaeological sciences, including all periods of human presence from the appearance of hominids to present times. Transnational Activities together with the planned training will further reinforce the presence of ARIADNEplus as a key actor.
The ARIADNEplus data infrastructure will be embedded in a cloud that will offer the availability of Virtual Research Environments where data-based archaeological research may be carried out. The project will furthermore develop a Linked Data approach to data discovery, making available to users innovative services, such as visualization, annotation, text mining and geo-temporal data management. Innovative pilots will be developed to test and demonstrate the innovation potential of the ARIADNEplus approach.
ARIADNEplus is funded by the European Commission under the H2020 Programme, contract no. H2020-INFRAIA-2018-1-823914….”
“…SWIB focuses on Linked Open Data (LOD) in libraries and related organizations. It is well established as an event where IT staff, developers, librarians, and researchers from all over the world meet and mingle and learn from each other. The topics of talks and workshops at SWIB revolve around opening data, linking data and creating tools and software for LOD production scenarios. These areas of focus are supplemented by presentations of research projects in applied sciences, industry applications, and LOD activities in other areas. As usual, SWIB22 will be organized by ZBW – Leibniz Information Centre for Economics and the North Rhine-Westphalian Library Service Centre (hbz). The conference language is English….”
Wiki Education is hosting webinars all of October to celebrate Wikidata’s 10th birthday. Below is a summary of our first event. Watch Tuesday’s webinar in full on our Youtube. Sign up for our next three events here.
Never before has the world had a tool like Wikidata. The semantic database behind Wikipedia and virtual assistants like Siri and Alexa is only ten years old this month, and yet with almost 1 billion unique items, it’s the biggest open database ever. Wiki Education’s “Wikidata Will” Kent gathered key players in the Wikidataverse to reflect on the last ten years and set our sights on the next ten. Kelly Doyle, the Open Knowledge Coordinator for the Smithsonian Institution; Andrew Lih, Wikimedian at Large with Smithsonian Institution and Wikimedia strategist with the Metropolitan Museum of Art; and Lane Rasberry, Wikimedian in Residence at University of Virginia’s Data Science Institute discussed the “little database that could” (not so little anymore!).
“Metadata as Knowledge,” is a special issue of KULA: Knowledge Creation, Dissemination, and Preservation Studies that takes up the critical relationship between metadata and knowledge. The issue includes articles and project reports that address metadata, hidden knowledge, and labour; standards versus expression; knowledge sharing and reuse of metadata; forays into open and shared knowledge; linked data, metadata translation, and discovery; and machine learning and knowledge graphs. Although rarely an object of notice or scrutiny by its users, metadata governs the circulation of information and has the power to name, broadcast, normalize, oppress, and exclude. As the contributions to this issue demonstrate, metadata is knowledge, and metadata creators, systems, and practices must contend with how metadata means.
(Source: Editors’ introduction – Allison-Cassin, Stacy, and Dean Seeman. 2022. Metadata as Knowledge. KULA: Knowledge Creation, Dissemination, and Preservation Studies 6(3). https://doi.org/10.18357/kula.244 )
Abstract: Large public knowledge graphs, like Wikidata, contain billions of statements about tens of millions of entities, thus inspiring various use cases to exploit such knowledge graphs. However, practice shows that much of the relevant information that fits users’ needs is still missing in Wikidata, while current linked open data (LOD) tools are not suitable to enrich large graphs like Wikidata. In this paper, we investigate the potential of enriching Wikidata with structured data sources from the LOD cloud. We present a novel workflow that includes gap detection, source selection, schema alignment, and semantic validation. We evaluate our enrichment method with two complementary LOD sources: a noisy source with broad coverage, DBpedia, and a manually curated source with narrow focus on the art domain, Getty. Our experiments show that our workflow can enrich Wikidata with millions of novel statements from external LOD sources with a high quality. Property alignment and data quality are key challenges, whereas entity alignment and source selection are well-supported by existing Wikidata mechanisms. We make our code and data available to support future work.
“This is the presentation of our short paper abstract accepted at the DHBenelux 2022 conference. This records contains the presentation with animations in PowerPoint format as well as a more static version in PDF format….”
“Linked data applications will empower users in new ways, delivering multiple knowledge journeys that will better represent different cultures, disciplines, and pedagogies. It is a watershed technological advancement for libraries. Not just a change in the back office business of cataloging and classification, it will improve what libraries are collectively able to accomplish in a dynamic, pluralistic, information environment.
Moving from a records-based approach towards more granular, flexible systems that rely on entities, persistent identifiers, and descriptions will, of course, change roles within cataloging, technical services, reference, and other library departments. But this isn’t just a move to be more efficient or to support better findability of library resources.
Join OCLC experts Rachel Frick and Nathan Putnam and Torsten Reimer from the British Library who will discuss the implications of linked data and entity descriptions across a range of important issues, including:
New roles and responsibilities for metadata librarians
Better representation of diverse communities and perspectives
The ability to partner with and support other knowledge and memory institutions
Improved discovery opportunities…”
“Imagine the great library of life, the library that Charles Darwin said was necessary for the “cultivation of natural science” (1847). And imagine that this library is not just hundreds of thousands of books printed from 1500 to the present, but also the data contained in those books that represents all that we know about life on our planet. That library is the Biodiversity Heritage Library (BHL) The Internet Archive has provided an invaluable platform for the BHL to liberate taxonomic names, species descriptions, habitat description and much more. Connecting and harnessing the disparate data from over five-centuries is now BHL’s grand challenge. The unstructured textual data generated at the point of digitization holds immense untapped potential. Tim Berners-Lee provided the world with a semantic roadmap to address this global deluge of dark data and Wikidata is now executing on his vision. As we speak, BHL’s data is undergoing rapid transformation from legacy formats into linked open data, fulfilling the promise to evaporate data silos and foster bioliteracy for all humankind….”
The purpose of this paper is to investigate connected data through the use of open-source technology. It demonstrates the transformation process from library bibliographic data to linked data, which allows for easy searching across numerous collections of information.
In generating this file, a high-level operating system such as Ubuntu, which is based on the LAMP architecture, is used. It is required to use open-source strategies in building the relevant information. LODRefine is being used to convert all of Koha’s bibliographic data into linked data that is now available on the Web. This framework has been conceptualized and formulated based on linked data principles and search algorithms accordingly.
Linked data services have been made publicly available to library users by using a variety of different forms of data. Information may be sought quickly and easily using this interface built on numerous search structures. Aside from that, it also meets the needs of users who use the linked data search mechanism to find information. Through modern scripts and algorithms, it is now possible for library users to easily search the linked data enables services.
This paper demonstrates how quickly and easily related data from bibliographic details may be developed and generated using a spreadsheet. The entire procedure culminates in the presence of specialists in the library setting. A further advantage of the SPARQL system is that it allows visitors to group distinct concepts and aspects using independent URIs and URLs instead of the SPARQL endpoint.
Abstract: This paper describes LD4DH, the Linked Data for Digital Humanities: Publishing, Querying, and Linking on the Semantic Web workshop at the Digital Humanities Oxford Summer School. It includes a description of the general structure of the workshop, how it has changed over the course of the last seven years, between 2015 and 2021, and evaluates the differences between in-person delivery in 2018–2019 and the online mode in 2020–2021. Discussion is centred on the description of the data as well as the illustration of the processes, methods, and software used throughout the workshop. The paper concludes with a summary of participant evaluation, and reflects on the opportunities and challenges of teaching Linked Open Data to a mixed cohort of predominantly Humanities researchers and professionals from the cultural heritage sector.
Abstract: Information related to the COVID-19 pandemic ranges from biological to bibliographic and from geographical to genetic. Wikidata is a vast interdisciplinary, multilingual, open collaborative knowledge base of more than 88 million entities connected by well over a billion relationships and is consequently a web-scale platform for broader computer-supported cooperative work and linked open data. Here, we introduce four aspects of Wikidata that make it an ideal knowledge base for information on the COVID-19 pandemic: its flexible data model, its multilingual features, its alignment to multiple external databases, and its multidisciplinary organization. The structure of the raw data is highly complex, so converting it to meaningful insight requires extraction and visualization, the global crowdsourcing of which adds both additional challenges and opportunities. The created knowledge graph for COVID-19 in Wikidata can be visualized, explored and analyzed in near real time by specialists, automated tools and the public, for decision support as well as educational and scholarly research purposes via SPARQL, a semantic query language used to retrieve and process information from databases saved in Resource Description Framework (RDF) format.