Abstract: Sparked by issues of quality and lack of proper documentation for datasets, the machine learning community has begun developing standardised processes for establishing datasheets for machine learning datasets, with the intent to provide context and information on provenance, purposes, composition, the collection process, recommended uses or societal biases reflected in training datasets. This approach fits well with practices and procedures established in GLAM institutions, such as establishing collections’ descriptions. However, digital cultural heritage datasets are marked by specific characteristics. They are often the product of multiple layers of selection; they may have been created for different purposes than establishing a statistical sample according to a specific research question; they change over time and are heterogeneous. Punctuated by a series of recommendations to create datasheets for digital cultural heritage, the paper addresses the scope and characteristics of digital cultural heritage datasets; possible metrics and measures; lessons from concepts similar to datasheets and/or established workflows in the cultural heritage sector. This paper includes a proposal for a datasheet template that has been adapted for use in cultural heritage institutions, and which proposes to incorporate information on the motivation and selection criteria, digitisation pipeline, data provenance, the use of linked open data, and version information.
Category Archives: oa.digital_humanities
CRDDS joins DARIAH as a cooperating partner | University Libraries | University of Colorado Boulder
“CU Boulder researchers now have access to an international organization that supports the digital humanities through funding opportunities, access to high-quality learning materials, an open marketplace with tools and resources and more.
The organization is the Digital Research Infrastructure for the Arts and Humanities (DARIAH) and CU Boulder’s Center for Research Data and Digital Scholarship (CRDDS) has joined as a cooperating partner. CRDDS is a collaboration between Research Computing and University Libraries, offering a full range of data services and support to the university and community….”
[2302.04084] Reception Reader: Exploring Text Reuse in Early Modern British Publications
Abstract: The Reception Reader is a web tool for studying text reuse in the Early English Books Online (EEBO-TCP) and Eighteenth Century Collections Online (ECCO) data. Users can: 1) explore a visual overview of the reception of a work, or its incoming connections, across time based on shared text segments, 2) interactively survey the details of connected documents, and 3) examine the context of reused text for “close reading”. We show examples of how the tool streamlines research and exploration tasks, and discuss the utility and limitations of the user interface along with its current data sources.
Surprise machines | John Benjamins
“Although “the humanities so far has focused on literary texts, historical text records, and spatial data,” as stated by Lev Manovich in Cultural Analytics (Manovich, 2020, p.?10), the recent advancements in artificial intelligence are driving more attention to other media. For example, disciplines such as digital humanities now embrace more diverse types of corpora (Champion, 2016). Yet this shift of attention is also visible in museums, which recently took a step forward by establishing the field of experimental museology (Kenderdine et al., 2021).
This article illustrates the visualization of an extensive image collection through digital means. Following a growing interest in the digital mapping of images – proved by the various scientific articles published on the subject (Bludau et al., 2021; Crockett, 2019; Seguin, 2018), Ph.D. theses (Kräutli, 2016; Vane, 2019), software (American Museum of Natural History, 2020/2022; Diagne et al., 2018; Pietsch, 2018/2022), and presentations (Benedetti, 2022; Klinke, 2021) – this text describes an interdisciplinary experiment at the intersection of information design, experimental museology, and cultural analytics.
Surprise Machines is a data visualization that maps more than 200,000 digital images of the Harvard Art Museums (HAM) and a digital installation for museum visitors to understand the collection’s vastness. Part of a temporary exhibition organized by metaLAB (at) Harvard and entitled Curatorial A(i)gents, Surprise Machines is enriched by a choreographic interface that allows visitors to interact with the visualization through a camera capturing body gestures. The project is unique for its interdisciplinarity, looking at the prestigious collection of Harvard University through cutting-edge techniques of AI….”
A systematic review of Wikidata in Digital Humanities projects | Digital Scholarship in the Humanities | Oxford Academic
Abstract: Wikidata has been widely used in Digital Humanities (DH) projects. However, a focused discussion regarding the current status, potential, and challenges of its application in the field is still lacking. A systematic review was conducted to identify and evaluate how DH projects perceive and utilize Wikidata, as well as its potential and challenges as demonstrated through use. This research concludes that: (1) Wikidata is understood in the DH projects as a content provider, a platform, and a technology stack; (2) it is commonly implemented for annotation and enrichment, metadata curation, knowledge modelling, and Named Entity Recognition (NER); (3) Most projects tend to consume data from Wikidata, whereas there is more potential to utilize it as a platform and a technology stack to publish data on Wikidata or to create an ecosystem of data exchange; and (4) Projects face two types of challenges: technical issues in the implementations and concerns with Wikidata’s data quality. In the discussion, this article contributes to addressing three issues related to coping with the challenges in the specific context of the DH field based on the research findings: the relevance and authority of other available domain sources; domain communities and their practices; and workflow design that coordinates technical and labour resources from projects and Wikidata.
Internet Archive Welcomes Digital Humanists and Cultural Heritage Professionals to “Humanities and the Web: Introduction to Web Archive Data Analysis” – Internet Archive Blogs
“On November 14, 2022, the Internet Archive hosted Humanities and the Web: Introduction to Web Archive Data Analysis, a one-day introductory workshop for humanities scholars and cultural heritage professionals. The group included disciplinary scholars and information professionals with research interests ranging from Chinese feminist movements, to Indigenous language revitalization, to the effects of digital platforms on discourses of sexuality and more. The workshop was held at the Central Branch of the Los Angeles Public Library and coincided with the National Humanities Conference.
The goals of the workshop were to introduce web archives as primary sources and to provide a sampling of tools and methodologies that could support computational analysis of web archive collections. Internet Archive staff shared web archive research use cases and provided participants with hands-on experience building web archives and analyzing web archive collections as data….”
Linking different scientific digital libraries in Digital Humanities: the IMAGO case study | SpringerLink
Abstract: In the last years, several scientific digital libraries (DLs) in digital humanities (DH) field have been developed following the Open Science principles. These DLs aim at sharing the research outcomes, in several cases as FAIR data, and at creating linked information spaces. In several cases, to reach these aims the Semantic Web technologies and Linked Data have been used. This paper presents how the current scientific DLs in the DH field can provide the creation of linked information spaces and navigational services that allow users to navigate them, using Semantic Web technologies to formally represent, search and browsing knowledge. To support the argument, we present our experience in developing a scientific DL supporting scholars in creating, evolving and consulting a knowledge base related to Medieval and Renaissance geographical works within the three years (2020–2023) Italian National research project IMAGO—Index Medii Aevi Geographiae Operum. In the presented case study, a linked information space was created to allow users to discover and navigate knowledge across multiple repositories, thanks to the extensive use of ontologies. In particular, the linked information spaces created within the IMAGO project make use of five different datasets, i.e. Wikidata, the MIRABILE digital archive, the Nuovo Soggettario thesaurus, Mapping Manuscript Migration knowledge base and the Pleiades gazetteer. The linking among different datasets allows to considerably enrich the knowledge collected in the IMAGO KB.
Going a Step Further Than Open Access and Open Source: COVE and the Promise of Open Assembly | Victorians Institute Journal | Scholarly Publishing Collective
Abstract: This articles asks if the principles of open source and open access are sufficient to safeguard our intellectual labor and to guard against the predatory logic of a world dominated by capitalist systems of production and dissemination. Both open source and open access face a similar problem, as it happens: neglect and obsolescence, as well as the most pernicious Achilles’ heel of the vast majority of digital humanities initiatives: long-term sustainability. COVE offers an alternative to both long-term sustainability and the collective sharing of content.
Two principles currently govern the work of the digital humanities: open access, the notion that content should be freely accessible to all without paywalls or other restrictions; and open source, software whose underlying source code is made freely available for reuse and modification. COVE, which stands for Collaborative Organization for Virtual Education at covecollective.org, subscribes to both principles: we support an open-access publication platform, COVE Editions, where we publish material such as scholarly editions after they are put through peer review, revision, and copyediting; we also support the open-source movement by using and modifying open-source tools like TimelineJS, Open Layers, Drupal, and Annotation Studio, and sharing our code through a GitHub repository.
However, COVE seeks to go a little further by asking if the principles of open source and open access are sufficient to safeguard our intellectual labor and to guard against the predatory logic of a world dominated by capitalist systems of production and dissemination. Both open source and open access face a similar problem, as it happens: neglect and obsolescence, as well as the most pernicious Achilles’ heel of the vast majority of digital humanities initiatives, long-term sustainability. COVE offers an alternative to both long-term sustainability and the collective sharing of content.
Call for Proposals – Global Digital Humanities Symposium 2023 | Deadline: December 1, 2022
“Deadline to apply: December 1, 2022 Digital Humanities at Michigan State University is proud to continue the Global DH Symposium for an 8th year. This will be the Symposium’s first year as a hybrid conference with a multi-day synchronous virtual event and a one-day, in-person event at MSU. The virtual symposium welcomes presentations in English, Spanish, and Chinese and will offer live interpretation between languages. …”
topics include:
“…Indigeneity – anywhere in the world – and the digital
Surveillance, censorship, and/or data privacy in a global context
Productive failure; failure as a part of DH praxis
Cultural heritage in a range of contexts, particularly non-Western
Open data, open access, and data preservation as resistance
Global digital pedagogies and emerging technologies
Equity and inclusion in digital access
Borders, migration, and/or diasporas and their connections to the digital
Multilingualism and the digital…”
Centre for Digital Humanities | Open Science grant awarded to Digital Humanities Lab
“The scientific developers of the Utrecht Digital Humanities Lab (DHLab) have been awarded a grant from the Open Science Fund. The main objective of the rewarded project is to make the past and future research software of DHLab as FAIR (Findable, Accessible, Interoperable and Reusable) as possible.
The Open Science Fund is an opportunity for Utrecht University and University Medical Centre Utrecht employees to access small grants with which they can apply Open Science principles into their research….”
UNBIND: Reimagining the academic monograph – CRASSH
“The monograph, or the scholarly book, is today the dominant form of knowledge production in the humanities. But can there exist a more imaginative, creative, or performative alternative? Can we unbind the monograph and transform it into something that resists the marketisation and privatisation of public knowledge? Something that engages robustly with open platforms and public infrastructures?
Cambridge Digital Humanities invites monograph-writers, publishing scholars, publishers, editors, and open access activists for a day-long conversation on the future of the monograph form….”
Leslie Chan: Is Open Scholarship Possible without Open Infrastructure? – Digital Humanities Summer Institute | 7 June 2022
“Abstract: Recently, several collaborators and I submitted a chapter proposal in response to a call for submission to a volume on critical infrastructure studies and digital humanities. The editors did not accept our proposal. They cited the high number of submissions and the “word limit” specified by the university press contracted for the volume as the reason. In this talk, I like to reflect on how networked possibilities (the multimodal forms of scholarly artifacts and modes of engagements) are still being dictated by the properties of print and it’s associated academic capital. In the meantime, much of the critical infrastructures necessary for networked open scholarship are increasingly being designed and controlled by a small handful of multinational corporate publishers turned data analytics cartel. The creation of end-to-end knowledge production and evaluation platform and its inscribed logic of data extraction has enormous implications for our aspirations for open scholarship, particularly for early career scholars. We may still be focused on infrastructures as the object of study, but we should be more concerned with how infrastructures govern our labour and scholarly practices and, above all, our autonomy. The talk will provide suggestions on how best to design community governance over infrastructure, instead of being governed by infrastructures not by our design.”
HATHI 1M: Introducing a Million Page Historical Prose Dataset in English from the Hathi Trust
Abstract: We present a new dataset built on prior work consisting of 1,671,370 randomly sampled pages of English-language prose roughly divided between modes of fictional and non-fictional writing and published between the years 1800 and 2000. In addition to focusing on the “page’’ as the basic bibliographic unit, our work employs a single predictive model for the historical period under consideration in contrast to prior work. Besides publication metadata, we also provide an enriched feature set of 107 features including part-of-speech tags, sentiment scores, word supersenses and more. Our data is designed to give researchers in the digital humanities large yet portable random samples of historical writing across two foundational modes of English prose writing. We present initial insights into transformations of linguistic patterns across this historical period using our enriched features as possible pointers to future work. The data can be accessed at https://doi.org/10.7910/DVN/HAKKUA.
Interview with Editor-in-Chief: Professor Qinglong Peng – News – New Techno-Humanities – Journal – Elsevier
“Open access publishing has attracted huge momentum in recent years. Researchers in humanities now have more opportunities to publish as open access, not to mention for colleagues from science and medicine areas. Quite often authors will have to pay a big sum in order to publish open access and I know this may actually pose serious challenges to some of our authors as fundings in humanities studies are still not such common. I am very happy to see that Shanghai Jiao Tong University will fully sponsor the publication of this journal and thus authors do not need to pay for publication. I trust this sponsorship will provide more opportunities for researchers from those under-represented regions and disciplines. Meanwhile, open access will surely improve the visibility of our contributor’s works, expanding naturally their influence in the long run….”
NEH grant to support training for high-impact public digital humanities collaborations | The University of Kansas
“The Institute for Digital Research in the Humanities at the University of Kansas has been awarded $190,000 from the National Endowment for the Humanities to offer training in public digital humanities and academic-community collaborations. An intensive weeklong summer institute — to be offered in June 2022 at the Hall Center for the Humanities — will provide foundational knowledge, skills and resources to successfully advance 12 public humanities projects, increasing their longevity, visibility and impact. This will be followed by a year of further online training, support and discussion, with a final symposium and showcase in June 2023….”