Love Data Week 2023 at Harvard Library | Harvard Library Research Data Services

“Policy change, environmental change, social change… we can move mountains with the right data guiding our decisions. This year, we are focused on helping new and seasoned data users find data training and other resources that can help move the needle on the issues they care about. Data: Agent of Change.

If you haven’t participated before, International Love Data Week is the celebration of data. Love Data Week is dedicated to spreading awareness of the importance of research data management, sharing, preservation, and reuse. Research data are the foundation of the scholarly record and crucial for changing the world around us.

Join the Harvard Library community for a week of events focused on how we can share and use data to bring about changes that matter.”

Harvard Library Responds to the NIH Data Management and Sharing Policy | STAFF PORTAL

“Beginning with the first funding deadlines in January, all NIH grant proposals will be required to include a formal, two-page Data Management and Sharing Plan (DMSP), which must include the following elements….

Crucially, in addition to adding a required DMSP, the data management strategies stated in the plan will be audited and monitored externally, and compliance with stated plans may affect the funding status of grants.

 

Fortunately, here at Harvard affiliates have access to a variety of computing infrastructure and systems to effectively manage and steward a wide range of research outputs associated with modern, data-driven, computational research.

Harvard’s libraries, Harvard University Information Technology (HUIT), Research Computing, and Sponsored Programs offices have all been adding services and building capacity to support researchers complying with this new policy next year.

In the resources section below, we’ve included links to an executive summary of the policy and a collection of FAQs that we created specifically for Harvard users. We’ve also included resources from the NIH designed to support researchers writing and implementing a DMSP for the 2023 funding cycles.

Along with the requirement to make research data publicly available, in its new policy the NIH strongly encourages the use of established data repositories. When selecting an appropriate repository, researchers should plan to utilize subject- or domain-specific repositories for their data types if possible. When a disciplinary repository does not exist, researchers should use generalist repositories that accept all data types. We’ve included information on Harvard Dataverse and other generalist repositories in the resources section below….”

Borealis: A New Name for Scholars Portal Dataverse | Library

“On June 23, the Scholars Portal Dataverse is becoming Borealis, the Canadian Dataverse Repository / le dépôt Dataverse canadien. The new name reflects Borealis’s identity as a national service connecting Canadian researchers and comes after consultation with research data management librarians and specialists across the country. 

This is the service that hosts the library managed University of Guelph Research Data Repositories. Although the name and the appearance of the repository are changing, the core service remains the same. The U of G repositories will continue to be managed and supported by the library and will continue to be known as the University of Guelph Research Data Repositories.  

Borealis is a bilingual, multidisciplinary, secure, Canadian research data repository, supported by academic libraries and research institutions across Canada. Borealis supports open discovery, management, sharing, and preservation of Canadian research data. ”

Dataverse Community Meeting 2022

“The annual Dataverse Community Meeting is an opportunity to build, grow, and enrich the global community. Like the open-source Dataverse product itself, the activities of the Dataverse Community Meetings are community-driven. Over three days of presentations, workshops, and working group meetings we aim to promote and learn about behavioral and technical solutions and standards for curating, sharing, and preserving data that can be discovered and reused across disciplines to reproduce and advance research.

The Dataverse Community Meeting is hosted by Harvard’s Institute for Quantitative Social Science. Learn more about The Dataverse Project at our dataverse.org site. …”

Depositing Data: A Usability Study of the Texas Data Repository

Abstract:  Objective: The purpose of this study is to examine the usability of the Texas Data Repository (TDR) for the data depositors who are unfamiliar with its interface and use the results to improve user experience.

Methods: This mixed-method research study collected qualitative and quantitative data through a pre-survey, a task-oriented usability test with a think-aloud protocol, and an exit questionnaire. Analysis of the quantitative (i.e., descriptive statistics) and qualitative data (e.g., content analysis of the thinking-aloud protocols) were employed to examine the TDR’s usability for first-time data depositors at Texas A&M University.

Results: While the study revealed that the users were generally satisfied with their experience, the data suggest that a majority of the participants had difficulty understanding the difference between a dataverse collection and dataset, and often found adding or editing metadata overwhelming. The platform’s tiered model for metadata description is core to its function, but many participants did not have an accurate mental model of the platform, which left them scrolling up and down the page or jumping back and forth between different tabs and pages to perform a single task. Based on the results, the authors made some recommendations.

Conclusions: While this paper relies heavily on the context of the Harvard Dataverse repository platform, the authors posit that any self-deposit model, regardless of platform, could benefit from these recommendations. We noticed that completing various metadata fields in the TDR required participants to pivot their mindset from a data creator to that of a data curator. Moreover, the methods used to investigate the usability of the repository can be used to develop additional studies in a variety of repository and service model contexts. 

Repository service for SSH (Dataverse) | SSHOPENCLOUD

The repository service for SSH is built upon the community-driven open source Dataverse software. 

Its modular design facilitates integration with other data services such as DataCite or ROpenSci, CLARIN’s Language Resource Switchboard, and supports the development of additional functionality and services. 

Two types of services are being developed: 

1) a central (ERIC-level) service in the cloud, adapted to the needs of the relevant European SSH community, for small institutes to have a research data repository for their designated community.
2) an ‘Archive in a box’ software installation package, an adapted version to the needs of the European SSH community with documentation, for downloading and usage in their own environment by institutes themselves.

Global presence of open-source research data management platform for libraries: the Dataverse project | Emerald Insight

Abstract:  Purpose

This paper aims to provide statistical information on the worldwide spread of the open-source research data management application, the Dataverse Project, to librarians, data managers and information managers who are considering using the application at their own institution.

Design/methodology/approach

To produce a list of dataverse repositories, the official Dataverse website was evaluated, and JSON data were downloaded and parsed. Data standardisation was performed to assess the state of installations in various nations and continents across the world.

Findings

Globally, the Dataverse repositories have seen a rise in overall installations. The year 2020 alone saw a 23.21% rise. In a country-by-country comparison, the USA (13) has the most dataverse installations, while Europe (25) has the highest number of installations worldwide.

Originality/value

This research will be useful to librarians, data managers and information managers, among others, who want to learn more about Dataverse repositories throughout the world before deploying at their local level.

Developing an updated plugin for Dataverse integration with OPS/OJS on Vimeo

“In this activity we present the current status of development of a plugin to integrate Dataverse with Open Preprint Servers (OPS) and Open Journal Systems (OJS) in their most recent versions (3.3.x series).

Presentation held on 11/19/21 at Open Publishing Fest 2021:
openpublishingfest.org/calendar.html#event-90/ …”

“Optional Data Curation Feature Use by Harvard Dataverse Repository Users” by Ceilyn Boyd

Abstract:  Objective: Investigate how different groups of depositors vary in their use of optional data curation features that provide support for FAIR research data in the Harvard Dataverse repository.

Methods: A numerical score based upon the presence or absence of characteristics associated with the use of optional features was assigned to each of the 29,295 datasets deposited in Harvard Dataverse between 2007 and 2019. Statistical analyses were performed to investigate patterns of optional feature use amongst different groups of depositors and their relationship to other dataset characteristics.

Results: Members of groups make greater use of Harvard Dataverse’s optional features than individual researchers. Datasets that undergo a data curation review before submission to Harvard Dataverse, are associated with a publication, or contain restricted files also make greater use of optional features.

Conclusions: Individual researchers might benefit from increased outreach and improved documentation about the benefits and use of optional features to improve their datasets’ level of curation beyond the FAIR-informed support that the Harvard Dataverse repository provides by default. Platform designers, developers, and managers may also use the numerical scoring approach to explore how different user groups use optional application features.

Opening Your Scholarship: Why should I DASH and Dataverse?

“Learn practices and platforms to achieve your open access goals!

Highlights on Harvard DASH and Dataverse.

Panelists:

– Sonia Barbosa, Manager of Data Curation, Harvard Dataverse, Manager of the Murray Research Archive

– Julie Goldman, Research Data Services Librarian

– Colin Lukens, Senior Repository Manager, Harvard Library Office for Scholarly Communication

– Katie Mika, Data Services Librarian …”

Dataverse and OpenDP: Tools for Privacy-Protective Analysis in the Cloud | Mercè Crosas

“When big data intersects with highly sensitive data, both opportunity to society and risks abound. Traditional approaches for sharing sensitive data are known to be ineffective in protecting privacy. Differential Privacy, deriving from roots in cryptography, is a strong mathematical criterion for privacy preservation that also allows for rich statistical analysis of sensitive data. Differentially private algorithms are constructed by carefully introducing “random noise” into statistical analyses so as to obscure the effect of each individual data subject.    OpenDP is an open-source project for the differential privacy community to develop general-purpose, vetted, usable, and scalable tools for differential privacy, which users can simply, robustly and confidently deploy. 

Dataverse is an open source web application to share, preserve, cite, explore, and analyze research data. It facilitates making data available to others, and allows you to replicate others’ work more easily. Researchers, journals, data authors, publishers, data distributors, and affiliated institutions all receive academic credit and web visibility.  A Dataverse repository is the software installation, which then hosts multiple virtual archives called Dataverses. Each dataverse contains datasets, and each dataset contains descriptive metadata and data files (including documentation and code that accompany the data).

This session examines ongoing efforts to realize a combined use case for these projects that will offer academic researchers privacy-preserving access to sensitive data. This would allow both novel secondary reuse and replication access to data that otherwise is commonly locked away in archives.  The session will also explore the potential impact of this work outside the academic world.”