Guest Post — Accessibility Powered by AI: How Artificial Intelligence Can Help Universalize Access to Digital Content

Digital transformation can revolutionize the world, turning it into an inclusive place for people with and without disabilities, with accessibility powered by artificial intelligence.

The post Guest Post — Accessibility Powered by AI: How Artificial Intelligence Can Help Universalize Access to Digital Content appeared first on The Scholarly Kitchen.

Smorgasbord: Trends from Spring 2023 Meetings and Conferences (Part Two)

Read what Chefs Angela Cochran and Alice Meadows (respectively) have to say about the recent ISMPP conference and RDA 20th Plenary Meeting in today’s Smorgasbord

The post Smorgasbord: Trends from Spring 2023 Meetings and Conferences (Part Two) appeared first on The Scholarly Kitchen.

Swimming in the AI Data Lake: Why Disclosure and Versions of Record Are More Important than Ever

Data quality and record keeping are going to grow in importance as a result of AI applications.

The post Swimming in the AI Data Lake: Why Disclosure and Versions of Record Are More Important than Ever appeared first on The Scholarly Kitchen.

Guest Post: AI and Scholarly Publishing — A (Slightly) Hopeful View

The impact of the changes artificial intelligence will cause rests on how creative humans can be at harnessing novel technologies to the greatest benefit. The challenge, then, for publishers, is to ensure they are the creative adopters leading the charge, as opposed to being trampled by better customer experiences created by other technological disruptors.

The post Guest Post: AI and Scholarly Publishing — A (Slightly) Hopeful View appeared first on The Scholarly Kitchen.

Textpocalypse: A Literary Scholar Eyes the “Grey Goo” of AI

What will the “grey goo” of AI generated text do to us? A scholar of writing and technology talks with us about AI and Large Language Models.

The post Textpocalypse: A Literary Scholar Eyes the “Grey Goo” of AI appeared first on The Scholarly Kitchen.

Guest Post – GPT-3 Wrote an Entire Paper on Itself. Should Publishers be Concerned?

Saikiran Chandha discusses the impact of GPT-3 and related models on research, the potential question marks, and the steps that scholarly publishers can take to protect their interests.

The post Guest Post – GPT-3 Wrote an Entire Paper on Itself. Should Publishers be Concerned? appeared first on The Scholarly Kitchen.

SXSW Interactive: Slow Down To Speed Up

Back to SXSW this year! Hear about the conference, the speakers, and the themes. Tell us what resonates with you the most!

The post SXSW Interactive: Slow Down To Speed Up appeared first on The Scholarly Kitchen.

Guest Post — ChatGPT: Applications in Scholarly Publishing

Craig Griffin looks at potential applications we might see for tools like ChatGPT in scholarly publishing. Also included — a research results haiku.

The post Guest Post — ChatGPT: Applications in Scholarly Publishing appeared first on The Scholarly Kitchen.

ChatGPT & Co.: When the Search Slot turns into an AI Chatbox

by André Vatter

The claim that AI language models are here to stay should be irrefutable by now. Although just introduced to the public in November 2022, ChatGPT has made rapid progress in this short time. How rapid? Let’s compare: After its founding, it took Twitter two years and Facebook ten months to build up a base of one million users. ChatGPT managed to reach this milestone in only five days. Two months after its launch, almost forty percent (German) of all Germans said they had heard of the chat robot or had already tried it.

A brutal race

But as impressive as the private adoption rate is today, what is more exciting is the escalations that ChatGPT has caused hitting the corporate world. Microsoft’s announcement of its plan to integrate the generative language model into its own search engine Bing caused sheer panic among the undisputed global industry leader Google. Google has been tinkering with an AI-supported web search for some time, but it has not yet been able to demonstrate that it is really ready for the market. There is a name, “Bard“, but CEO Sundar Pinchai is silent about concrete integrations. On the other hand, Microsoft was able to announce just a few days ago that its own search engine, which had hardly been noticed by users for decades, had to cope with a sudden rush of visitors:

“We have crossed 100M Daily Active Users of Bing. This is a surprisingly notable figure, and yet we are fully aware we remain a small, low, single digit share player. That said, it feels good to be at the dance!”

Redmond, Washington, is in an AI frenzy. In the future, there will hardly be a business area at Microsoft – whether B2B or B2C – in which ChatGPT does not play a role.

The disruption is also leaving its mark on the smaller competitors. Brave Search, the web search engine created by the US browser manufacturer Brave Software Inc. recently got a new AI feature. The “Summarizer” not only summarises facts directly at the top of the search results page, but also provides relevant content information for each result found. There are also changes at the privacy-focussed search engine DuckDuckGo, which has just launched “DuckAssist“. Depending on the question, the new AI feature taps Wikipedia for relevant information and offers concrete answers while still being on the search results page. But this is just the beginning: “This is the first in a series of generative AI-assisted features we hope to roll out in the coming months.”

All these integrations of AI language models into search engines are not about creating extensions to the existing, respective business model. It’s about a complete upheaval in the way we search the web today, how we interpret results and understand them.

How finding replaces searching

Whereas the previous promise of search consisted of an effortful “I’ll show you where you might find the answer”, in combination with AI it suddenly advances to: “Here’s the answer.” Since their invention, search engines have only ever shown us possible ways where answers to our questions might be found. In fact, it has never been the inevitable goal of advertising-based business concepts to provide users with a quick answer. After all, the goal is to keep them in one’s ecosystem as long as possible in order to maximise the likelihood of ad clicks. This is also the reason why Google at some point began to present generally available information – such as times, weather, stock market prices, sports results or flight information – directly on the search results pages (SERPs), for example, in the so-called OneBox. The ultimate ambition is that no one leaves the Googleverse!

Intelligent chatbots, like ChatGPT, get around this problem. On the one hand, they transform the type of search by replacing keywords with questions. Soon, many users are likely to say goodbye to so-called “search terms” or even Boolean operators. Instead, they’ll learn to tweak their prompts more and more to make their communication with the machine more precise. And on the other hand, intelligent chatbots reduce the importance of the original sources; often there is no longer any reason to leave the conversation. Those who search with the help of AI want and get an answer. They do not want a card catalogue with shelf numbers.

Despite all answers, questions remain open

We can already see that comfort does not come without critical implications. For example, with regard to the transparency of sources that we may no longer be able to see. Where do they come from? How were they selected? Are they trustworthy? Can I access them specifically? Especially in the scientific sector, reliable answers to these questions are indispensable. Other problems revolve around copyright. After all, AI does not create new information, but relies on the work of journalists who publish on the internet. How will they be remunerated if no one reads their texts and only rely on machine summaries?

Data protection concerns will not be long in coming. In communicating with the machine, a close relationship develops over time; the more it knows about us and can understand our perspective, the more accurately it can respond. In addition, the models need to be trained. Personalisation, however, inevitably means a critical wealth of data in the hands of third parties in return. In the hands of companies that will have to build entirely new business models around a question-answering game – quite a few of which, if not all, will be ad-supported.

AI provides answers. But not really to all questions at the moment. Search will change radically in a short time. Academic libraries with online services will also have to orient themselves accordingly and adapt. Perhaps the “catalogue” as a static directory or list will take a step back. Let’s imagine for a moment the scenario of an AI that has access to a gigantic corpus of Open Access texts. Researchers access several sources simultaneously, have them sorted, summarised in terms of content and classified: Have these papers been supported or falsified? The picture that emerges is of a new mechanism for making scientific knowledge accessible and comprehensible. Provided, of course, that the underlying content is openly accessible. From this perspective, too, here is once again a clear plea for Open Science.

So, how do academic libraries implement these technologies in the future? How do they create source transparency, how do they build trust and which disciplines of media literacy move to the foreground when new, machine-friendly communication is part of the research toolbox? Many questions, many uncertainties – but at the same time a great potential for the future supply of scientific information. A potential that libraries should use to actively shape the unstoppable change.

This text has been translated from German.

This might also interest you:

The post ChatGPT & Co.: When the Search Slot turns into an AI Chatbox first appeared on ZBW MediaTalk.

Thinking About ChatGPT and the Future — Where Are We On AI’s Development Curve?

A compilation of links and a video to incisive analyses of ChatGPT and what it means for the future.

The post Thinking About ChatGPT and the Future — Where Are We On AI’s Development Curve? appeared first on The Scholarly Kitchen.

Guest Post – AI and Scholarly Publishing: A View from Three Experts

A recap of a recent SSP webinar on artificial intelligence (AI) and scholarly publishing. How can this set of technologies help or harm scholarly publishing, and what are some current trends? What are the risks of AI, and what should we look out for?

The post Guest Post – AI and Scholarly Publishing: A View from Three Experts appeared first on The Scholarly Kitchen.

Thoughts on AI’s Impact on Scholarly Communications? An Interview with ChatGPT

An interview with ChatGPT on issues related to scholarly communication.

The post Thoughts on AI’s Impact on Scholarly Communications? An Interview with ChatGPT appeared first on The Scholarly Kitchen.

AI in Academic Libraries, Part 3: Prerequisites and Conditions for Successful Use

Interview with Frank Seeliger (TH Wildau) and Anna Kasprzik (ZBW)

We recently had a long talk with experts Anna Kasprzik (ZBW – Leibniz Information Centre for Economics) and Frank Seeliger (Technical University of Applied Sciences Wildau – TH Wildau) about the use of artificial intelligence in academic libraries. The occasion: Both of them were involved in two wide-ranging articles: “On the promising use of AI in libraries: Discussion stage of a white paper in progress – part 1” (German) and “part 2” (German).

In their working context, both of them have an intense connection and great interest in the use of AI in the context of infrastructure institutions and libraries. Dr Frank Seeliger is the director of the university library at the TH Wildau and has been jointly responsible for the part-time programme Master of Science in Library Computer Sciences (M.Sc.) at the Wildau Institute of Technology. Anna Kasprzik is the coordinator of the automation of subject indexing (AutoSE) at the ZBW.

This slightly shortened, three-part series has emerged from our spoken interview. These two articles are also part of the series:

What are the basic prerequisites for the successful and sustainable use of AI at academic libraries and information institutions?

Anna Kasprzik: I have a very clear opinion here and have already written several articles about it. For years, I have been fighting for the necessary resources and I would say that we have manoeuvred ourselves into a really good starting position by now, even if we are not out of the woods yet. The main issue for me is commitment – right up to the level of decision makers. I’ve developed an allergy to the “project” format. Decision makers often say things like, “Oh yes, we should also do something with AI. Let’s do a project, then a working service will develop from it and that’s it.” But it’s not that easy. Things that are developed as projects tend to disappear without a trace in most cases.

We also had a forerunner project at the ZBW. We deliberately raised it to the status of a long-term commitment together with the management. We realised that automation with machine learning methods is a long-term endeavour. This commitment was essential. It was an important change of strategy. We have a team of three people here and I coordinate the whole thing. There’s a doctoral position for a scientific employee who is carrying out applied research, i.e. research that is very much focused on practice. When we received this long-term commitment status, we started a pilot phase. In this pilot phase, we recruited an additional software architect. We therefore have three positions for this, which correspond to three roles and I regard all three of them as very important.

The ZBW has also purchased a lot of hardware because machine learning experiments require serious computing power. We have then started to develop the corresponding software infrastructure. This system is already productive, but will be continually developed based on the results of our in-house applied research. What I’m trying to say is this: the commitment is important and the resources must reflect this commitment.

Frank Seeliger: This is naturally the answer of a Leibniz institution that is well endowed with research professors. However, apart from some national state libraries and larger libraries, this is usually difficult to achieve. Most libraries do not have a corresponding research mandate nor the personnel resources to finance such projects on a long-term basis. Nevertheless, there are also technologies that smaller institutions need to invest in such as cloud-based services or infrastructure as service. But they need to commit to this, including beyond the project phases. It is anchored in the Agenda 2025/30 that it is a long-term commitment within the context of the automation that is coming up anyway. This has been boosted by the coronavirus pandemic in particular, when people saw how well things can function even when they take place online. The fact that people regard this as a task and seek out information about it correspondingly. The mandate is to explore the technology deliberately. Only in this way can people at working or management level see not only the degree of investment required, but also what successes they can expect.

But it’s not only libraries that have recently, i.e. in the last ten years, begun to explore the topic of AI. It is comparable with small and medium-sized businesses or other public institutions that deal with the Online Access Act and other issues. They are also exploring these kinds of algorithms, in order to find solidarity. Libraries are not the only ones here. This is very important because many of the measures, particularly those at the level of the German federal states, were not necessarily designed with libraries in mind in respect of the distribution of AI tasks or funding.

That’s why we intended our publication (German) also as a political paper. Political in the sense of informing politicians or decision-makers about financial possibilities that we also need the framework to be able to apply. In order to then test things and decide whether we want to use any indexing or other tools such as language tools permanently in the library world and to network with other organisations.

The task for smaller libraries who cannot manage to have research groups is definitely to explore the technology and to develop their position for the next five to ten years. This requires such counterpoints to what is commonly covered by meta-search engines such as Wikipedia. Especially as libraries have a completely different lifespan than companies, in terms of their way of thinking and sustainability. Libraries are designed to last as long as the state or the university exists. Our lifecycles are therefore measured differently. And we need to position ourselves accordingly.

Not all libraries and infrastructure institutions have the capacity to develop a comprehensive AI department with corresponding personnel. So does it make sense to bundle competences and use synergy effects?

Anna Kasprzik:Yes and no. We are in touch with other institutions such as the German National Library. Our scientific employee and developer is working on the further development of the Finnish toolkit Annif with colleagues from the National Library of Finland, for example. This toolkit is also interesting for many other institutions for primary use. I think it’s very good to exchange ideas, also regarding our experiences with toolkits such as this one.

However, I discover time and again that there are limits to this when I advise other institutions; for example, just last week I advised some representatives from Swiss libraries. You can’t do everything for the other institutions. If they want to use these instruments, institutions have to train them on their own data. You can’t just train the models and then plant them one-to-one into other institutions. For sure, we can exchange ideas, give support and try to develop central hubs where at least structures or computing power resources are provided. However, nothing will be developed in this kind of hub that is an off-the-shelf solution for everyone. This is not how machine learning works.

Frank Seeliger: The library landscape in Germany is like a settlement and not like a skyscraper. In the past, there was a German library institute (DBI) that tried to bundle many matters in the academic libraries in Germany across all sectors. This kind of central unit no longer exists, merely several library groups relating to institutions and library associations relating to personnel. So a central library structure that could take on the topic of AI doesn’t exist. There was an RFID working group (German) (or also Special Interest Group RFID at the IFLA), and there should actually also be a working group for robots (German), but of course someone has to do it, usually alongside their actual job.

In any case, there is no central library infrastructure that could take up this kind of topic as a lobby organisation, such as Bitkom, and break it down into the individual companies. The route that we are pursuing is broadly based. This is related to the fact that we operate in very different ways in the different German federal states, owing to the relationship between national government and federal states. The latter have sovereignty in many areas, meaning that we have to work together on a project basis. It will be important to locate cooperation partners and not try to work alone, because it is simply too much. There is definitely not going to be a central contact point. The German Research Center for Artificial Intelligence (DFKI) does not have libraries on its radar either. There’s no one to call. Everything is going to run on a case-by-case and interest-related basis.

How do you find the right cooperation partners?

Frank Seeliger: That’s why there are library congresses where people can discuss issues. Someone gives a presentation about something they have done and then other people are interested: they get together, write applications for third-party funding or articles together, or try to organise a conference themselves. Such conference already exist, and thus a certain structure of exchange has been established.

I am the conservative type. I read articles in library journals, listen to conference news or attend congresses. That’s where you have the informal exchange – you meet other people. Alongside social media, which is also important. But if you don’t reach people via the social media channels, then there is (hopefully soon to return) physical exchange on site via certain section days, for example. Next week we have another Section IV meeting of the German Library Association (DBV) in Dresden where 100 people will get together. The chances of finding colleagues who have similar issues or are dealing with a similar topic are high. Then you can exchange ideas – the traditional way.

Anna Kasprzik: But there are also smaller workshops for specialists. For example, the German National Library has been organising a specialist congress of the network for automated subject indexing (German) (FNMVE) for those who are interested in automated approaches to subject indexing.

I also enjoy networking via social media. You can also find most people who are active in the field on the internet, e.g. on Twitter or Mastodon. I started using Twitter in 2016 and deliberately developed my account by following people with an interest in semantic web technologies. These are individuals, but they represent an entire network. I can’t name individual institutions; what is relevant are individual community members.

And how did you get to know each other? I’m referring to the working group that compiled this non-white paper.

Anna Kasprzik: It’s all Frank’s fault.

Frank Seeliger: Anna came here once. I had invited Mr Puppe in the context of a digitalisation project in which AI methods supported optical character recognition (OCR) and image identification of historical works. Exactly via the traditional route that I’ve just described, i.e. via a symposium; this was how the first people were invited..

Then the need to position ourselves on this topic developed. I had spoken with a colleague from the Netherlands at a conference shortly before. He said that they had been too late with their AI white paper, meaning that politics had not taken them into account and libraries had not received any special funding for AI tools. That was the wake-up call for me and I thought, here in Germany there is also nothing I am aware of that is specifically for information institutions. I then researched who had publications on the topic. That’s how the network, which is still active, developed. We are working on the English translation at the moment.

What is your plea to the management of information institutions? At the beginning, Anna, you already spoke about commitment, also from “the very top”, being a crucial factor. But going beyond this: what course needs to be set now and which resources need to be built up, to ensure that libraries don’t lose out in the age of AI?

Anna Kasprzik: For institutions who can, it’s important to develop long-term expertise. But I completely understand Frank’s point of view: it is valid to say that not every institution can afford this. So two aspects are important for me: one is to cluster expertise and resources at certain central institutions. The other is to develop communication structures across institutions or to share a cloud structure or something similar. To create a network in order to spread it around. To enable dissemination, i.e. the sharing of these experiences for reuse.

Frank Seeliger: Perhaps there is a third aspect: to reflect on the business process that you are responsible for so that you can identify whether it is suitable for an AI-supported automation, for example. To reflect on this yourself, but to encourage your colleagues to reflect on their own workflows too, as to whether routine tasks can be taken over by machines and thereby relieve them of some of the workload. For example, in our library association, the Kooperativer Bibliotheksverbund Berlin-Brandenburg (KOBV), we had the problem that we would have liked to set up a lab. Not only to play, but also to see together how we can technically support tasks that are really very close to real life. I don’t want to say that the project failed, but the problem was that first you needed the ideas: What can you actually tackle with AI? What requires a lot of time? Is it the indexing? Other work processes that are done over and over again like a routine with a high degree of similarity? We wanted the lab to look at exactly these processes and check if we could automate them, independently of what library management systems do or all the other tools with which we work.

It’s important to initiate the process of self-reflection on automation and digitalisation in order to identify fields of work. Some have expertise in AI, others in their own fields, and they have to come together. The path leads through one’s own reflection to enter into conversation and to sound out whether solutions can be found..

And to what extent can the management support?

Frank Seeliger: Leadership is about bringing people together and giving impetus. The coronavirus pandemic and digitalisation have put a lot of pressure on many people. There is a saying by Angela Merkel. She once said that she only got around to thinking during the Christmas period. However, you want to interpret that now. Out of habit and because you want to clear the pile of work on your desk during working hours, it’s often difficult to reflect on what you are doing and if there isn’t already a tool that could help. Then it’s the task of the management level to look at these processes and where appropriate to say, yes, maybe the person could be helped with this. Let’s organise a project and take a closer look.

Anna Kasprzik: Yes, that’s one of the tasks, but for me the role of management is above all to take the load off the employees and clear a path for them. This brings another buzzword into play: agile working. It’s not only about giving an impetus, but also about supporting people by giving them some leeway so that they can work in a self-dependent manner. The agile manifesto, so to speak, which also leads to the fact that one creates space for experimenting and allows for failure sometimes. Otherwise, nothing will come to fruition.

Frank Seeliger:We will soon be doing a “Best of Failure” survey, because we want to ask what kind of error culture we really have, as it is sacrosanct. This will also be the topic of the Wildau Library Symposium (German) from 13 to 14 September 2022. In it, we will explore this error culture more intensively. Because it is right. Even in IT projects, you simply have to allow things to go wrong. Of course, they don’t have to be taken on as a permanent task if they don’t go well. But sometimes it’s good to just try, because you can’t predict whether a service will be accepted or not. What do we learn from these mistakes? We talk about it relatively little, mostly about successful projects that go well and attract crazy amounts of funding. But the other part also has to come into focus in order to learn better from it and be able to utilise aspects of it for the next project.

Is there anything else that you would like to say at the end?

Frank Seeliger: AI is not just a task for large institutions.

Anna Kasprzik: Exactly, AI concerns everyone. Even though AI should not be dealt with just for the sake of AI, but rather to develop new innovative services that would otherwise not be possible.

Frank Seeliger: There are naturally other topics, no question about that. But you have to address it and sort out the various topics.

Anna Kasprzik: : It’s important that we get the message across to people that automated approaches should not be regarded as a threat, but rather that by now this digital jungle exists anyway, so we need tools to find our way through it. AI therefore represents new potential and added value, and not a threat that will be used to eliminates people’s jobs..

Frank Seeliger: We have also been asked the question: What is the added value of automation? Of course, you spend less time on routine processes that are very manually. This creates scope to explore new technologies, to do advanced training or to have more time for customers. And we need this scope to develop new services. You simply have to create that scope, also for agile project management, so that you don’t spend 100% of your time clearing some pile of work or other from your desks, but can instead use 20% for something new. AI can help give us this time.

Thank you for the interview, Anna and Frank.

Part 1 of the interview on “AI in Academic Libraries” is about areas of activity, the big players and the automation of indexing.
In part 2 of the interview on “AI in Academic Libraries” we explore interesting projects, the future of chatbots and the problem of discrimination through AI.

This might also interest you:

We were talking to:

Dr Anna Kasprzik, coordinator of the automation of subject indexing (AutoSE) at the ZBW – Leibniz Information Centre for Economics. Anna’s main focus lies on the transfer of current research results from the areas of machine learning, semantic technologies, semantic web and knowledge graphs into productive operations of subject indexing of the ZBW. You can also find Anna on Twitter and Mastodon.
Portrait: Photographer: Carola Gruebner, ZBW©

Dr Frank Seeliger (German) has been the director of the university library at the Technical University of Applied Sciences Wildau since 2006 and has been jointly responsible for the part-time programme Master of Science in Library Computer Sciences (M.Sc.) at the Wildau Institute of Technology since 2015. One module explores AI. You can find Frank on ORCID.
Portrait: TH Wildau

Featured Image: Alina Constantin / Better Images of AI / Handmade A.I / Licensed by CC-BY 4.0

The post AI in Academic Libraries, Part 3: Prerequisites and Conditions for Successful Use first appeared on ZBW MediaTalk.

AI in Academic Libraries, Part 2: Interesting Projects, the Future of Chatbots and Discrimination Through AI

Interview with Frank Seeliger (TH Wildau) and Anna Kasprzik (ZBW)

We recently had an intense discussion with Anna Kasprzik (ZBW) and Frank Seeliger (Technical University of Applied Sciences Wildau – TH Wildau) on the use of artificial intelligence in academic libraries. Both of them were recently involved in two wide-ranging articles: “On the promising use of AI in libraries: Discussion stage of a white paper in progress – part 1” (German) and “part 2” (German).

Dr Anna Kasprzik coordinates the automation of subject indexing (AutoSE) at the ZBW – Leibniz Information Centre for Economics. Dr Frank Seeliger (German) is the director of the university library at the Technical University of Applied Sciences Wildau and is jointly responsible for part-time programme Master of Science in Library Computer Sciences (M.Sc.) at the Wildau Institute of Technology.

This slightly shortened, three-part series has been drawn up from our spoken interview. These two articles are also part of it:

What are currently the most interesting AI projects in libraries and infrastructure institutions?

Anna Kasprzik: Of course, there are many interesting AI projects. Off the top of my head, the following two come to mind: The first one is interesting for you if you are interested in the issue of optical character recognition (OCR). Because, before you can even start to think about automated subject indexing, you have to create metadata, i.e. “food” for the machine. So to speak: segmenting digital texts into their structural fragments, extracting an abstract automatically. In order to do this, you run OCR on the scanned text. Qurator (German) is an interesting project in which machine learning methods are used as well. The Staatsbibliothek zu Berlin (Berlin State Library) and the German Research Center for Artificial Intelligence (DFKI) are involved, among others. This is interesting because at some point in the future it might give us the tools we need in order to be able to obtain the data input required for automated subject indexing.

The other project is the Open Research Knowledge Graph (ORKG) of the TIB Hannover. The Open Research Knowledge Graph is a way of representing scientific results no longer as a document, i.e. as a PDF, but rather in an entity-based way. Author, research topic or method – all nodes in one graph. This is the semantic level and one could use machine learning methods in order to populate it.

Frank Seeliger: Only one project: it is running at the ZBW and the TH Wildau and explores the development of a chatbot with new technologies. The idea of chatbots is actually relatively old. A machine conducts a dialogue with a human being. In the best case, the human being does not recognise that a machine is running in the background – the Turing Test. Things are not quite this advanced yet, but the issue we are all concerned with is that libraries are being consulted – in chat rooms, for example. Many libraries aim to offer a high level of service at the times when researchers and students work, i.e. round the clock. This can only take place if procedures are automated, via chatbots for example, so that difficult questions can be also answered outside the opening hours, at weekends and on public holidays.

I am therefore hoping firstly that the input we receive concerning chatbot development means that it will become a high-quality standard service that offers fast orientation and gives information with excellent predictive quality about a library or special services. This would create the starting point for other machines such as moving robots. Many people are investing in robots, playing around with them and trying out various things. People are expecting that they will be able to go to them and ask, “Where is book XY?” or “How do I find this and that?”, and that these robots can deal with such questions profitably and show “there’s that” in an oriented way and point their finger at it. That’s one thing.

The second thing that I find very exciting for projects is to win people over to AI at an early stage. Not just to save AI as a buzzword, but to look behind the scenes of this technology complex. We tried to offer a certificate course (German). However, demand has been too low for us to offer the course. But we will try it again. The German National Library provides a similar course that was well attended. I think it’s important to make a low-threshold offer across the board, i.e. for a one-person library or for small municipal libraries that are set up on a communal basis, as well as for larger university libraries. That people get to grips with the subject matter and find their own way, where they can reuse something, where there are providers or cooperation partners. I find this kind of project is very interesting and important for the world of libraries.

But this too can only be the starting point for many other offers of special workshops, on Annif for example or other topics that can be discussed at a level that non-informaticians can understand as well. It’s an offer to colleagues who are concerned with it, but not necessarily at an in-depth level. As with a car – they don’t manufacture the vehicle themselves, but want to be able to repair or fine-tune it sometimes. At this level, we definitely need more dialogue with the people who are going to have to work with it, for example as system administrators who set up or manage such projects. The offers must also be focused towards the management level – the people who are in charge of budgeting, i.e. those who sign third-party funding applications.

At both institutions, the TH Wildau and the ZBW, you are working on the use of chatbots. Why is this AI application area for academic libraries so promising? What are the biggest challenges?

Frank Seeliger: The interesting perspective for me is that we can operate the development of a chatbot together with other libraries. It is nice when not only one library serves as a knowledge base in the background for the typical examples. This is not possible with locally specific information such as opening hours or spatial conditions. Nevertheless, many synergy effects are created. We can bring them together and be in a position to generate as large a quantity of data as possible, so that the quality of the assertions that are automatically generated is simply better than if we were to set it up individually. The output quality has a lot to do with the data quality. Although it is not true that the more data, the better the information. Other factors also play a role. But generally, small solutions tend to fail because of the small quantity of data.

Especially in view of the fact that a relatively high number of libraries are keen to invest in robot solutions that “walk” through the library outside the opening hours and offer services, like the robot librarian. If the service is used, it therefore makes twice as much sense to offer something online, but also to retrieve it using a machine that rolls through the premises and offers the service. This is important, because the personal approach from the library to the clients is a very decisive and differentiating feature as opposed to the large meta levels that offer their services in the commercial field. Looking for dialogue and paying attention to the special requirements of the users: this is what makes the difference.

Anna Kasprzik: Even though I am not involved in the chatbot project at ZBW, I can think of three challenges. The first is that you need an incredible amount of training data. Getting hold of that much data is relatively difficult. Here at ZBW we have had a chat feature for a long time – without a bot. These chats have been recorded but first they had to be cleaned of all personal data. This was an immense amount of editorial work. That is the first challenge.

The second challenge: it’s a fact that relatively trivial questions, such as the opening hours, are easily answered. But as soon as things become more complex, i.e. when there are specialised questions, you need a knowledge graph behind the chatbot. And setting this up is relatively complex.

Which brings me to the third challenge: during the initial runs, the project team established that quite a few of the users had reservations and quickly thought, “It doesn’t understand me”. So there were reservations on both sides. We therefore have to be mindful of the quality aspect and also of the “trust” of the users.

Frank Seeliger: But the interactions also follow the direction of speech, particularly from the younger generations who are now coming through as students in the libraries. This generation communicates via voice messages: the students speak with Siri or Alexa and they are informal when speaking to technologies. FIZ Karlsruhe has attempted to define search queries using Alexa. That went well in itself, but it failed because of the European General Data Protection Regulation (GDPR), the privacy of information and the fact that data was processed somewhere in the USA. Naturally, that is not acceptable.

That’s why it is good that libraries are doing their own thing – they have data sovereignty and can therefore ensure that the GDPR is maintained and that user data is treated carefully. But it would be a strategic mistake if libraries did not adapt to the corresponding dialogue. Very simply because a lot of these interactions no longer take place with writing and reading alone, but via speech. As far as apps and features are concerned, much is communicated via voice messages, and libraries need to adapt to this fact. It starts with chatbots, but the question is whether search engines will be able to cope with (voice) messages at some point and then filter out the actual question. Making a chatbot functional and usable in everyday life is only the first step. With spoken language, this then incorporates listening and understanding.

Is there a timeframe for the development of the chatbot?

Anna Kasprzik: I’m not sure, when the ZBW is planning to put its chatbot online; it could take one or two years. The real question is: when will such chatbots become viable solutions in libraries globally? This may take at least ten years or longer – without wanting to crush hopes too much.

Frank Seeliger: There are always unanticipated revivals popping up, for which a certain impetus is needed. For example, I was in the IT section of the International Federation of Library Associations and Institutions (IFLA) on statistics. We considered whether we could determine statistics clearly and globally, and depict them as a portfolio. Initially it didn’t work – it was limited to one continent: Latin America. Then the section received a huge surprise donation from the Bill and Melinda Gates Foundation and with it, the project IFLA Library Map of the World could be implemented.

It was therefore a very special impetus that led to something that we would normally not have achieved with ten years’ work. And when this impetus exists through tenders, funding, third-party donors that accelerate exactly this kind of project, perhaps also from a long-term perspective, the whole thing takes on a new dynamic. If the development of chatbots in libraries continues to stagnate like this, they will not use them on a market-wide scale. There was also a movement with contactless object recognition via radio waves (Radio-Frequency Identification, RFID). It started in 2001 in Siegburg, then Stuttgart and Munich. Now, it is used in 2,000 to 3,000 libraries. I don’t see this impetus with chatbots at all. That’s why I don’t think that, in ten or 15 years, chatbots will be used in 10% to 20% of libraries. It’s an experimental field. Maybe some libraries will introduce them, but it will be a handful, perhaps a dozen. However if a driving force occurs owing to external factors such as funding or a network initiative, the whole concept may receive new momentum.

The fact that AI-based systems make discriminatory decisions is often regarded as a general problem. Does this also apply to the library context? How can this be prevented?

Anna Kasprzik: That’s a very tricky question. Not many people are aware that potential difficulties almost always arise from the training data because training data is human data. These data sources contain our prejudices. In other words, whether the results may have a discriminating effect or not depends on the data itself and on the knowledge organisation systems that underpin it.

One movement that is gathering pace is known as de-colonisalisation. People are therefore taking a close look at the vocabularies they use, thesauri and ontologies. The problem has come up for us as well: since we also provide historical texts, terms that have racist connotations today appeared in the thesaurus . Naturally, we primarily incorporate terms that are considered politically correct. But these definitions can shift over time. The question is: what do you do with historical texts where this word occurs in the title? The task is then to find different ways to provide them as hidden elements of the thesaurus but not to display them in the interface.

There are knowledge organisation systems that are very old and have developed in times very different from ours. We need to restructure them completely as a matter of urgency. It’s always a balancing act if you want to display texts from earlier periods with the structures that were in use at that time. Because I must both not falsify the historical context, but also not offend anyone who wants to search in these texts and feel represented or at least not discriminated against. This is a very difficult question, particularly in libraries. People often think: that’s not an issue for libraries, it’s only relevant in politics, or that sort of thing. But on the contrary, libraries reflect the times in which they exist, and rightly so.

Frank Seeliger: Everything that you can use can also be misused. This applies to every object. For example, I was very impressed in Turkey. They are working with a big Koha approach (library software), meaning that more than 1,000 public libraries are using the open source solution Koha as their library management software. They therefore know, among other things, which book is most often borrowed in Turkey. We do not have this kind of information at all in Germany via the German Library Statistics (DBS, German). This doesn’t mean that this knowledge discredits the other books, that they are automatically “leftovers”. You can do a lot with knowledge. The bias that exists with AI is certainly the best known. But it is the same for all information: should monuments be pulled down or left standing? We need to find a path through the various moral phases that we live through as a society.

In my own studies, I specialised in pre-Colombian America. To name one example, the Aztecs never referred to themselves as Aztecs. If you searched in catalogues of libraries pre-1763, the term “Aztec” did not exist. They called themselves Mexi‘ca. Or we could take the Kerensky Offensive – search engines do not have much to offer on that. It was a military offensive that was only named that afterwards. It used to be called something else. It is the same challenge: to refer to both terms, even if the terminology has changed, or if it is no longer “en vogue” to work with a certain term.

Anna Kasprzik: This is also called concept drift and it is generally a big problem. It’s why you always have to retrain the machines: concepts are continually developing, new ones emerge or old terms change their meaning. Even if there is no discrimination, terminology is constantly evolving
.

And who does this work?

Anna Kasprzik: The machine learning experts at the institution.

Frank Seeliger: The respective zeitgeist and its intended structure.

Thank you for the interview, Anna and Frank.

Part 1 of the interview on “AI in Academic Libraries” is about areas of activity, the big players and the automation of indexing.
Part 3 of the interview on “AI in Academic Libraries” focuses on prerequisites and conditions for successful use
We will share the link here as soon as the post is published

This text has been translated from German.

This might also interest you:

We were talking to:

Dr Anna Kasprzik, coordinator of the automation of subject indexing (AutoSE) at the ZBW – Leibniz Information Centre for Economics. Anna’s main focus lies on the transfer of current research results from the areas of machine learning, semantic technologies, semantic web and knowledge graphs into productive operations of subject indexing of the ZBW. You can also find Anna on Twitter and Mastodon.
Portrait: Photographer: Carola Gruebner, ZBW©

Dr Frank Seeliger (German) has been the director of the university library at the Technical University of Applied Sciences Wildau since 2006 and has been jointly responsible for the part-time programme part-time programme Master of Science in Library Computer Sciences (M.Sc.) at the Wildau Institute of Technology since 2015. One module explores AI. You can find Frank on ORCID.
Portrait: TH Wildau

Featured Image: Alina Constantin / Better Images of AI / Handmade A.I / Licensed by CC-BY 4.0

The post AI in Academic Libraries, Part 2: Interesting Projects, the Future of Chatbots and Discrimination Through AI first appeared on ZBW MediaTalk.