DHQ: Digital Humanities Quarterly: Reference Rot in the Digital Humanities Literature: An Analysis of Citations Containing Website Links in DHQ

Abstract:  The ubiquity of the web has dramatically transformed scholarly communication. The shift toward digital publishing has brought great advantages, including an increased speed of knowledge dissemination and a greater uptake in open scholarship. There is also an increasing range of scholarly material being communicated and referenced. References have expanded beyond books and articles to include a broad array of assets consulted or created during the research process, such as datasets, social media content like tweets and blogs, and digital exhibitions. There are, however, numerous challenges posed by the transition to a constantly evolving digital scholarly infrastructure. This paper examines one of those challenges: link rot. Link rot is likely most familiar in the form of “404 Not Found” error messages, but there are other less prominent obstacles to accessing web content. Our study examines instances of link rot in Digital Humanities Quarterly articles and its impact on the ability to access the online content referenced in these articles after their publication.

 

EDP Sciences – EDP Sciences decides against Open Access transition under S2O for Radioprotection in 2023

“EDP Sciences and the Société Française de Radioprotection (SFRP) announced today that they have decided against transitioning Radioprotection to open access under Subscribe to Open (S2O) in 2023. Despite concerted efforts to promote the initiative and reach the required subscription threshold, the financial viability of the transition was not achieved at this time. This decision underscores the reality that open access for S2O journals is not guaranteed unless subscriptions are renewed….”

Google shared AI knowledge with the world until ChatGPT caught up – The Washington Post

“In February, Jeff Dean, Google’s longtime head of artificial intelligence, announced a stunning policy shift to his staff: They had to hold off sharing their work with the outside world.

For years Dean had run his department like a university, encouraging researchers to publish academic papers prolifically; they pushed out nearly 500 studies since 2019, according to Google Research’s website.

 

But the launch of OpenAI’s groundbreaking ChatGPT three months earlier had changed things. The San Francisco start-up kept up with Google by reading the team’s scientific papers, Dean said at the quarterly meeting for the company’s research division. Indeed, transformers — a foundational part of the latest AI tech and the T in ChatGPT — originated in a Google study.

Things had to change. Google would take advantage of its own AI discoveries, sharing papers only after the lab work had been turned into products, Dean said, according to two people with knowledge of the meeting, who spoke on the condition of anonymity to share private information….”

Restricting Reddit Data Access Threatens Online Safety & Public-Interest Research

“Last week, soon after Reddit announced plans to restrict free access to the Reddit API, the company cut off access to Pushshift, a data resource widely used by communities, journalists, and thousands of academics worldwide (see Pushshift’s official response).

We are writing to express concern about this sudden disruption to critical resources, and the uncertainty about the future it has created. We are asking for clarification and a meeting about the best ways to restore essential functionality for the communities that power your platform and the researchers who rely on your platform for essential public-interest work. To support that dialogue, we are coordinating a survey of the impact.

By preventing communities from accessing the very data they generate, Reddit has severely disrupted the safety and functionality of your platform. As you know, Reddit relies on volunteers to create moderation technologies and to do moderation labor that costs your competitors hundreds of millions of dollars per year. Tens of thousands of volunteers protect children’s safety, manage sensitive mental health support, and mediate some of the world’s largest conversation spaces for constructive civic discourse.

To succeed at their role, these unpaid leaders and workers need to access historical and contemporary community data to moderate a conversation space with over 1.5 billion active users. For many years, Reddit has relied on volunteer labor and computing infrastructure from Pushshift to provide communities with essential data services. You have now cut that off without warning to communities and haven’t offered alternatives, which will degrade safety protections across Reddit….”

PsyArXiv Preprints | Data is not available upon request

Abstract:  Many journals now require data sharing and require articles to include a Data Availability Statement. However, several studies over the past two decades have shown that promissory notes about data sharing are rarely abided by, and that data is generally not available upon request. This has negative consequences for many essential aspects of scientific knowledge production, including independent verification of results, efficient secondary use of data, and knowledge synthesis. Here, I assessed the prevalence of data sharing upon request in articles employing the Implicit Relational Assessment Procedure published within the last 5 years. Of 52 articles, 42% contained a Data Availability Statement, most of which stated that data was available upon request. This rose from 0% in 2018 to 100% in 2022. Only 25% of articles’ authors actually shared data upon request. Among articles stating that data was available upon request, only 17% shared data upon request. The presence of Data Availability Statements was not associated with higher rates of data sharing (p = .80). Results replicate those found elsewhere: data is generally not available upon request, and promissory Data Availability Statements are typically not adhered to. Issues, causes, and implications are considered.

 

Canada extends copyright protection another 20 years to meet new trade obligation – The Globe and Mail

“There will be no new books, songs or plays added to the public domain in Canada until 2043 after the government squeezed in a change to copyright laws just before the end of 2022.

Until Dec. 30, copyright protection applied to literary, dramatic, musical or artistic works for the life of their author plus another 50 years.

But as of that date, an artistic work won’t join the public domain for the life of the author plus another 70 years.

The change brings Canada into compliance with a commitment it made under the new North American free trade deal to match its copyright protections with those in place in the United States since 1998. That deal gave Canada until Dec. 31, 2022, to fall in line and it beat the deadline by one day….”

Open Plant Pathology: How Much Do We (Plant Pathologists) Value Openness and Transparency?

“We (Emerson Del Ponte and Adam Sparks) started this initiative (Open Plant Pathology) in early January 2018 with the idea that we would create a community in which plant pathologists could come together and share resources and ideas and encourage a freer exchange of information, code and data. One of the reasons for this was that a few years before that, we’d started working on analysis of randomly selected plant pathology papers, initially we looked at 300 published from 2012 until 2018, but it later grew to encompass 450 papers published from 2012 until 2021, with Kaique Alves, Zachary Foster and Nik Grünwald, which was published in Phytopathology® in January (Sparks et al. 2023b). What we were finding as we looked at papers across 21 journals that were dedicated to plant pathology research or published specialised sections or articles in the field of plant pathology was not surprising, but still disappointing. As a discipline, we simply do not make much, if any, effort to help ensure that others can easily reproduce our work after it is published (Sparks et al. 2023b).

We found that most articles were not reproducible according to our scoring system and failed to take advantage of open science and reproduciblity methods that would benefit both the authors and the readers. To wit, the vast majority of articles we looked at made no attempt to share code or data, scoring “0” in our system (Figure 1)….”

The Internet Archive’s troubles are bad news for book lovers | The Spectator

“Increasingly, the future of the Internet Archive looks under threat. What the four publishers are demanding and seem set legally to enforce is, according to Kahle, the destruction of around ‘4 million digitised files… This would be a book burning on the scale of the Library of Alexandria… If digital learners have no access to millions of books, aren’t they effectively disappeared?’ …

‘In electronic form they can change all books in all libraries all at once and irreversibly without permission,’ says Kahle. ‘This is dangerous. It is not hypothetical, it is happening.’  With the future of genuine libraries looking increasingly shaky (nearly 800 have closed down in Britain alone in the past decade) and digital borrowing correspondingly on the rise, this licensing scheme has chilling implications for readers in search of an undoctored text. Also for a reading future in which their data is not open to being harvested and every turn of a page not captured by the big corporations.”

A portal to China is closing, at least temporarily, and researchers are nervous | South China Morning Post

“CNKI, a portal for Chinese academic papers, will restrict foreign access to some databases starting April 1, for security concerns

It is unclear when access might be resumed, leading some scholars to fear the suspension might become permanent….

China’s top internet portal for academic papers will suspend foreign access to some databases starting next week, sparking concerns among scholars that they will lose not only an important resource for understanding China but also a useful guardrail to reduce misunderstanding between China and the West.

This week, research institutions around the world – including the University of California, San Diego, Kyoto University and the Berlin State Library – notified affiliates that they would indefinitely lose access to up to four databases provided by the China National Knowledge Infrastructure (CNKI) platform starting on April 1….

 

For academics studying China, CNKI is an invaluable resource, particularly with the current uncertainty surrounding visits to China for field research….

Over 95 per cent of Chinese academic papers that are formally published are available on CNKI, according to the State Administration for Market Regulation, China’s antitrust watchdog, when it conducted a separate review of the platform’s practices….”

data not found

“data not found is a dataset of datasets that were sought but not found on data portals around the world.

It invites consideration of what is and is not counted as public data, what kinds of information public bodies do and do not collect and make available, and what kinds of questions it is and is not possible to answer with public sector data.

The project traces and archives encounters between citizens and public servants on data portals, surfacing different understandings and assumptions about what data portals are for, what can be done on them, and what kinds of data one might expect institutions to gather and open up.

Rather than assuming a kind of “data universalism”, these unsuccessful attempts to obtain data through portals highlight some of the different ways in which data comes to matter to different people in different situations. What is considered missing data, absent data, a “data gap” or a “data void” is contingent, relational and situational.

The as yet unsuccessful requests have been selected from data portals around the world to illustrate a diversity of interests, concerns, curiosities and queries for data that does not exist. They are not intended to be representative of all requests on all portals as:

Not all data portals make requests and responses to these requests available.
Not all unsuccessful data requests have been included (e.g. those containing personal details, duplicate requests, requests which are redirected). …”

Science Publishing Innovation: Why Do So Many Good Ideas Fail? – Science Editor

“Over a decade ago, BioMed Central (BMC) recognized the importance of postpublication discussion. Prepublication review can improve papers and catch errors, but only time and subsequent work of other scientists can truly show which results in a publication are robust and valid. Unlike a print journal (or print as a medium, in general), the Internet permits the readers to comment on published papers over time. So in 2002 BMC developed and enabled commenting on every one of its articles across its suite of journals. Not only does this allow for postpublication review, but it enables readers to easily ask authors and other readers a question, with public responses enriching the original manuscript, clarifying, and helping to improve the comprehension of the work.

This is a terrific idea, but it didn’t really catch on….

Remarkably, despite the creation of arXiv for physicists in 1990 and despite the enthusiastic embrace of preprints by the physics community, it has been assumed this is impossible for biology. The common argument is that biologists are different from physicists and the arXiv success is not informative. What many did find telling is the death of the 2007 preprint initiative from the Nature Publishing Group (NPG). NPG tried preprints with Nature Precedings, but adoption was low and in 2012 NPG pulled the plug on the experiment.3 This triggered some skepticism about the prospects of the bioRxiv preprint effort from Cold Spring Harbor Lab (CSHL) Press.4 Critics told the director of CSHL Press, John Inglis, that a preprint for biologists simply couldn’t work.5

Once again, we must ask the cause of the Nature Precedings failure. Did NPG kill it because biologists wouldn’t behave in the same way as physicists? We know that isn’t the case. Preprints in biology are all the rage today….

In the winter of 2012, Alexei Stoliartchouk and I came up with the idea for protocols.io—a central place where scientists can share and discover science methods. We wanted to create a site where corrections and the constant tweaking of science methods could be shared, even after publication in a journal….

Few people know about bioprotocols.com, but many know about OpenWetWare (OWW) and Nature Protocol Exchange—both open-access community resources for sharing protocols. Both have been mentioned to me countless times as evidence that protocols.io wouldn’t work. As with preprints, the problems that OWW and Protocol Exchange faced seemed to be proof that biologists would not share details of their methods on such a platform. As with bioRxiv, we are in the early days of protocols.io, but judging from the growth in the figure below, it’s hard to argue that biologists don’t need this or that they won’t take the time to publicly share their methods….”

The Internet Archive has lost its first fight to scan and lend e-books like a library – The Verge

“A federal judge has ruled against the Internet Archive in Hachette v. Internet Archive, a lawsuit brought against it by four book publishers, deciding that the website does not have the right to scan books and lend them out like a library.

Judge John G. Koeltl decided that the Internet Archive had done nothing more than create “derivative works,” and so would have needed authorization from the books’ copyright holders — the publishers — before lending them out through its National Emergency Library program….

The Internet Archive says it will appeal. “Today’s lower court decision in Hachette v. Internet Archive is a blow to all libraries and the communities we serve,” Chris Freeland, the director of Open Libraries at the Internet Archive, writes in a blog post. “This decision impacts libraries across the US who rely on controlled digital lending to connect their patrons with books online. It hurts authors by saying that unfair licensing models are the only way their books can be read online. And it holds back access to information in the digital age, harming all readers, everywhere.”

The two sides went to court on Monday, with HarperCollins, John Wiley & Sons, and Penguin Random House joining Hachette as plaintiffs….”

OpenAI co-founder on company’s past approach to openly sharing research: ‘We were wrong’ – The Verge

“Yesterday, OpenAI announced GPT-4, its long-awaited next-generation AI language model. The system’s capabilities are still being assessed, but as researchers and experts pore over its accompanying materials, many have expressed disappointment at one particular feature: that despite the name of its parent company, GPT-4 is not an open AI model.

OpenAI has shared plenty of benchmark and test results for GPT-4, as well as some intriguing demos, but has offered essentially no information on the data used to train the system, its energy costs, or the specific hardware or methods used to create it….

Speaking to The Verge in an interview, Ilya Sutskever, OpenAI’s chief scientist and co-founder, expanded on this point. Sutskever said OpenAI’s reasons for not sharing more information about GPT-4 — fear of competition and fears over safety — were “self evident”:…

OpenAI was founded as a nonprofit but later became a “capped profit” in order to secure billions in investment, primarily from Microsoft, with whom it now has exclusive business licenses….

When asked why OpenAI changed its approach to sharing its research, Sutskever replied simply, “We were wrong. Flat out, we were wrong. If you believe, as we do, that at some point, AI — AGI — is going to be extremely, unbelievably potent, then it just does not make sense to open-source. It is a bad idea… I fully expect that in a few years it’s going to be completely obvious to everyone that open-sourcing AI is just not wise.” …”

Docker is deleting Open Source organisations – what you need to know

by Alex Ellis

Coming up with a title that explains the full story here was difficult, so I’m going to try to explain quickly.

Yesterday, Docker sent an email to any Docker Hub user who had created an “organisation”, telling them their account will be deleted including all images, if they do not upgrade to a paid team plan. The email contained a link to a tersely written PDF (since, silently edited) which was missing many important details which caused significant anxiety and additional work for open source maintainers.

As far as we know, this only affects organisation accounts that are often used by open source communities. There was no change to personal accounts. Free personal accounts have a a 6 month retention period.

Why is this a problem?

Paid team plans cost 420USD per year (paid monthly)
Many open source projects including ones I maintain have published images to the Docker Hub for years
Docker’s Open Source program is hostile and out of touch

 

Researchers Forget to Report How to Share Data From Studies Published in Spanish Medical Journals – ScienceDirect

“Some time ago, Archivos de Bronconeumología reported on a radical turnabout by the ICMJE: after announcing in 2016 that they would require clinical trial researchers to share individual-level anonymized participant data with third parties, in 2017 they decided that such transfer would be voluntary.4 The news had a precedent in the Recommendations published a few years earlier, to the effect that some journal editors “ask authors to say whether the study data are available to third parties to view and/or use/reanalyze, while still others encourage or require authors to share their data with others for review or reanalysis”.1 It would be interesting to know which Spanish journals have included this requirement in their ‘instructions for authors’ and whether they comply with it.

To answer this question, we reviewed the portals of 24 Spanish journals with an impact factor greater than 1, on the understanding that they have greater influence than those with an impact factor ?1 and those with no impact factor. Of these 24, 14 are included in the list of ICMJE Recommendations (Supplementary material A). Of these, only 5 (Archivos of Bronconeumología, Atención Primaria, Enfermedades Infecciosas y Microbiología Clínica, Gaceta Sanitaria, and Medicina Intensiva) include a specific section, that we shall call “link to data repository”, that recommends, supports and encourages authors to share raw data from their studies with other researchers, and gives instructions on how to go about it. A sixth journal, the Revista de Neurología, recommends this procedure only for clinical trials (Supplementary material B). To determine the frequency with which authors report how data can be accessed compared to other requirements requested by the same journals, 2 control requirements were selected: reporting on conflicts of interest and study funding, that were included in the Recommendations much earlier. It is also of interest to determine whether supplementary material may be included online, as this is sometimes a way of including raw study data….

Sharing data from quantitative studies is much easier than from qualitative studies. Researchers performing qualitative studies frequently cite the lack of authorization of the participants, the sensitive nature of the data, and loss of confidentiality as reasons for not sharing data.6 However, qualitative studies are the exception among Spanish medical publications. By 2011, most researchers were already sharing their data, although this was challenging for more than a third of them; in the case of clinical trials, it has recently been reported that access7 to data is difficult despite authors’ commitment to share.8 Ideally, Spanish medical journals should require authors to share them in all the articles they publish, and if data sharing is impossible, to explain why.”