[Code of Practice for data mining in the UK]

“While many other countries have clarified their intellectual property laws to support AI and innovation, the UK has yet to introduce a text and data mining exception to explicitly support knowledge transfer and commercial AI. Given this, the Code of Practice provides a particularly important opportunity to provide clarity and ensure that the UK remains an atractive place to undertake and invest in machine learning.?…”

In order that the UK remains competitve in?scientific and technology markets, the government should ensure that a Code of Practice: · Clarifies that access to broad and varied data sets that are publicly available online remain available for analysis, including text and data mining, without the need for licensing. · Recognises that even without an explicit commercial text and data mining exception, exceptions and limits on copyright law exist that would permit text and data mining for commercial purposes….”

RLUK signs text and data mining (TDM) in the UK letter – Research Libraries UK

“RLUK has signed a multi-organisation letter urging the UK Government to ensure the UK is a favourable place to develop and use safe AI, by clarifying that public and legally accessed data is available for AI training and analysis in its Code of Practice.

The availability of public and legally accessed data is key to lowering barriers to entry, both technical and financial. It incentivises innovation and helps to create an environment in which the UK can be competitive in the AI market.

The letter describes a number of principles that should underpin the Code of Practice and lists specific features that should be included to ensure TDM and development of AI is not unnecessarily restricted. The letter can be read in full on the IP Federation’s website.”

AI writes summaries of preprints in bioRxiv trial

“The bioRxiv pilot is part of a broader trend of using LLMs to help researchers — and the public — keep afloat in the tsunami of scientific literature. The physics-focused preprint server arXiv uses AI to generate audio summaries of some papers, and publishers and funders are starting to roll out features that allow users to ‘talk to a paper’ through a chatbot….

Before rolling out the service, Sever and his colleagues evaluated several dozen summaries produced by the tool. Most were pretty good, he says, and some were even better than the abstracts scientists had written. Others included clear falsehoods. “We know there are going to be errors in these things,” Sever says….

If the pilot becomes a fully fledged service, bioRxiv might look at routinely involving authors in proofreading and approving the content, Sever says. For now, to minimize the consequences of errors, the pilot is not being rolled out to medRxiv, a sister medical-research preprint server run by Cold Spring Harbor Laboratory Press, the London-based publisher BMJ and Yale University in New Haven, Connecticut. MedRxiv studies typically have clinical relevance, and errors could guide patient behaviour. By limiting the pilot to bioRxiv, says Sever, “the consequences of being wrong are more that somebody might feel misled or misunderstand a fairly arcane study in cell biology”. …”

OASIS Mobilizes Open Source Community to Combat the Spread of Disinformation and Online Harms from Foreign State Actors – OASIS Open

“OASIS Open, the international open source and standards consortium, launched the DAD-CDM project, an open source initiative to develop data exchange standards for normalizing and sharing disinformation and influence campaigns. DAD-CDM will serve as a valuable resource, particularly in the identification and alerting of AI-empowered attacks….”

Broadening audience, increasing understanding

“Many biomedical research papers are readily understood only by those who know as much about the topic as their authors do. There are understandable reasons for this. Science is increasingly specialized, which means that aficionados of specific fields develop terminologies, nomenclatures, and even technologies that can make the work feel impenetrable. So even working scientists reading outside their own area of expertise can struggle to understand what was actually done and why it is important.

As a preprint server that costs nothing to read, bioRxiv has a massive, worldwide audience that views and downloads millions of articles each month. We don’t track readers or ask them to register but we have anecdotal evidence for who they are. Enormous numbers of professional scientists, clearly, but many other kinds of readers too, including undergraduate and medical students, teachers at every level, journalists, patients and their advocates, and members of the public who are intellectually curious about our world and biology. These more general readers must also grapple with articles not written with them in mind….”

Generative AI, Synthetic Contents, Open Educational Resources (OER), and Open Educational Practices (OEP): A New Front in the Openness Landscape – Open Praxis

Abstract:  This paper critically examines the transformation of the educational landscape through the integration of generative AI with Open Educational Resources (OER) and Open Educational Practices (OEP). The emergence of AI in content creation has ignited debate regarding its potential to comprehend and generate human language, creating content that is often indistinguishable from that produced by humans. This shift from organic (human-created) to synthetic (AI-created) content presents a new frontier in the educational sphere, particularly in the context of OER and OEP. The paper explores the generative AI’s capabilities in OER and OEP, such as automatic content generation, resource curation, updating existing resources, co-creation and facilitating collaborative learning. Nevertheless, it underscores the importance of addressing challenges like the quality and reliability of AI-generated content, data privacy, and equitable access to AI technologies. The critical discussion extends to a contentious issue, ownership in OER/OEP. While AI-generated works lack human authorship and copyright protection, the question of legal liability and recognition of authorship remains a significant concern. In response, the concept of prompt engineering and co-creation with AI is presented as a potential solution, viewing AI not as authors, but powerful tools augmenting authors’ abilities. By examining generative AI’s integration with OER and OEP, this paper encourages further research and discussion to harness AI’s power while addressing potential concerns, thereby contributing to the dialogue on responsible and effective use of generative AI in education.


Datasheets for Digital Cultural Heritage Datasets – Journal of Open Humanities Data

Abstract:  Sparked by issues of quality and lack of proper documentation for datasets, the machine learning community has begun developing standardised processes for establishing datasheets for machine learning datasets, with the intent to provide context and information on provenance, purposes, composition, the collection process, recommended uses or societal biases reflected in training datasets. This approach fits well with practices and procedures established in GLAM institutions, such as establishing collections’ descriptions. However, digital cultural heritage datasets are marked by specific characteristics. They are often the product of multiple layers of selection; they may have been created for different purposes than establishing a statistical sample according to a specific research question; they change over time and are heterogeneous. Punctuated by a series of recommendations to create datasheets for digital cultural heritage, the paper addresses the scope and characteristics of digital cultural heritage datasets; possible metrics and measures; lessons from concepts similar to datasheets and/or established workflows in the cultural heritage sector. This paper includes a proposal for a datasheet template that has been adapted for use in cultural heritage institutions, and which proposes to incorporate information on the motivation and selection criteria, digitisation pipeline, data provenance, the use of linked open data, and version information.

Can AI help with the heavy lifting of research communications? | Impact of Social Sciences

“From my own experience, a key skill involved in research communication is translation. Primarily, the translation of complex academic writing into ‘simpler’ media and formats accessible to different audiences, eg. lay summaries, press releases and blog posts, or for the public affairs oriented, policy briefings….”

Springer Nature expands its AI capability with acquisition of Slimmer AI’s Science division | Springer Nature Group | Springer Nature

With technology having played an important role in improving the way trusted science is published for decades, global academic publisher Springer Nature has signed a definitive agreement which will see the Netherlands’ Slimmer AI’s Science division (S-AI) join the company.

Springer Nature and Slimmer AI have been working together since 2015. Using Slimmer AI’s advanced software that leverages emerging technology, the partnership has created AI tools to speed up and improve the publishing process by:

Identifying appropriate editors to guide a manuscript through the submission process, 
Increasing reviewer acceptance rates by recommending the best people to peer review a manuscript, and
Safeguarding the integrity of the scientific record by automating the identification of potential problems with papers, such as plagiarism.


What is open access publishing? | Perplexity AI

“Open access publishing is a set of principles and practices that make research outputs available online, free of access charges or other barriers 1 . It refers to the free, immediate, online availability of research outputs such as journal articles or books, combined with the rights to use these outputs fully in the digital environment 4 . Open access can be applied to all forms of published research output, including peer-reviewed and non-peer-reviewed academic journal articles, conference papers, theses, book chapters, monographs, research reports, and images 1 . There are different ways of publishing open access, including full open access journals, hybrid open access, and diamond open access 3 . Full open access journals publish articles in open access format, and the publication costs are covered by authors or their institutions 3 . Hybrid open access offers authors the option of making their articles open access for a fee, but these journals are still fundamentally subscription journals with an open access option for individual articles 2 . Diamond open access describes journals that are completely free to publish and to read, and the cost of maintaining and publishing the journal is usually borne by the organization that sponsors the journal 2 ….”

Meta’s AI research head wants open source licensing to change – The Verge

“In July, Meta released its large language model Llama 2 relatively openly and for free, a stark contrast to its biggest competitors. But in the world of open-source software, some still see the company’s openness with an asterisk….”

UC Berkeley Library to Copyright Office: Protect fair uses in AI training for research and education – UC Berkeley Library Update

“We are pleased to share the UC Berkeley Library’s response to the U.S. Copyright Office’s Notice of Inquiry regarding artificial intelligence and copyright. Our response addresses the essential fair use right relied upon by UC Berkeley scholars in undertaking groundbreaking research, and the need to preserve access to the underlying copyright-protected content so that scholars using AI systems can conduct research inquiries….”

Improving Wikipedia verifiability with AI | Nature Machine Intelligence

Abstract:  Verifiability is a core content policy of Wikipedia: claims need to be backed by citations. Maintaining and improving the quality of Wikipedia references is an important challenge and there is a pressing need for better tools to assist humans in this effort. We show that the process of improving references can be tackled with the help of artificial intelligence (AI) powered by an information retrieval system and a language model. This neural-network-based system, which we call SIDE, can identify Wikipedia citations that are unlikely to support their claims, and subsequently recommend better ones from the web. We train this model on existing Wikipedia references, therefore learning from the contributions and combined wisdom of thousands of Wikipedia editors. Using crowdsourcing, we observe that for the top 10% most likely citations to be tagged as unverifiable by our system, humans prefer our system’s suggested alternatives compared with the originally cited reference 70% of the time. To validate the applicability of our system, we built a demo to engage with the English-speaking Wikipedia community and find that SIDE’s first citation recommendation is preferred twice as often as the existing Wikipedia citation for the same top 10% most likely unverifiable claims according to SIDE. Our results indicate that an AI-based system could be used, in tandem with humans, to improve the verifiability of Wikipedia.


Taking and Giving Back? Open Access, Generative AI, and the Transformation of Scholarly Communication / IUB Libraries Calendar

“Generative AI systems trained on decades of open access, digitized scholarly publications, and other human-written texts can now produce non-copyrightable(?), (mostly) high-quality, and (sometimes) trustworthy text, images, and media at scale. In the context of scholarly communication, these AI systems can be trained to perform useful tasks such as quickly summarizing research findings, generating visual diagrams of scientific content, and simplifying technical jargon.

Scholarly communication will undergo a major transformation with the emergence of these model capabilities. On the plus side, AI has the potential to help tailor language, format, tone, and examples to make research more accessible, understandable, engaging, and useful for different audiences. However, its use also raises questions about credit and attribution, informational provenance, the responsibilities of authorship, control over science communication, and more. This talk will discuss how open access scholarly publishing has helped power the rise of the current generation of AI systems (especially large language models), some ways that AI is primed to change/has already changed scholarly publishing, and how the OA community might work with these models to improve scholarly communication, for example, by introducing different and more flexible forms of science communication artifacts, incorporating human feedback in the generative process, or mitigating the production of false/misleading information.”

Kudos expands article promotion for publishers with new services powered by AI

“Kudos, the platform for showcasing research, has announced that it is leveraging Artificial Intelligence to generate plain language summaries and further boost article performance for scholarly publishers and societies. Kudos now offers AI-generation of plain language summaries for research publications….”