AI in Academic Libraries, Part 3: Prerequisites and Conditions for Successful Use

Interview with Frank Seeliger (TH Wildau) and Anna Kasprzik (ZBW)

We recently had a long talk with experts Anna Kasprzik (ZBW – Leibniz Information Centre for Economics) and Frank Seeliger (Technical University of Applied Sciences Wildau – TH Wildau) about the use of artificial intelligence in academic libraries. The occasion: Both of them were involved in two wide-ranging articles: “On the promising use of AI in libraries: Discussion stage of a white paper in progress – part 1” (German) and “part 2” (German).

In their working context, both of them have an intense connection and great interest in the use of AI in the context of infrastructure institutions and libraries. Dr Frank Seeliger is the director of the university library at the TH Wildau and has been jointly responsible for the part-time programme Master of Science in Library Computer Sciences (M.Sc.) at the Wildau Institute of Technology. Anna Kasprzik is the coordinator of the automation of subject indexing (AutoSE) at the ZBW.

This slightly shortened, three-part series has emerged from our spoken interview. These two articles are also part of the series:

What are the basic prerequisites for the successful and sustainable use of AI at academic libraries and information institutions?

Anna Kasprzik: I have a very clear opinion here and have already written several articles about it. For years, I have been fighting for the necessary resources and I would say that we have manoeuvred ourselves into a really good starting position by now, even if we are not out of the woods yet. The main issue for me is commitment – right up to the level of decision makers. I’ve developed an allergy to the “project” format. Decision makers often say things like, “Oh yes, we should also do something with AI. Let’s do a project, then a working service will develop from it and that’s it.” But it’s not that easy. Things that are developed as projects tend to disappear without a trace in most cases.

We also had a forerunner project at the ZBW. We deliberately raised it to the status of a long-term commitment together with the management. We realised that automation with machine learning methods is a long-term endeavour. This commitment was essential. It was an important change of strategy. We have a team of three people here and I coordinate the whole thing. There’s a doctoral position for a scientific employee who is carrying out applied research, i.e. research that is very much focused on practice. When we received this long-term commitment status, we started a pilot phase. In this pilot phase, we recruited an additional software architect. We therefore have three positions for this, which correspond to three roles and I regard all three of them as very important.

The ZBW has also purchased a lot of hardware because machine learning experiments require serious computing power. We have then started to develop the corresponding software infrastructure. This system is already productive, but will be continually developed based on the results of our in-house applied research. What I’m trying to say is this: the commitment is important and the resources must reflect this commitment.

Frank Seeliger: This is naturally the answer of a Leibniz institution that is well endowed with research professors. However, apart from some national state libraries and larger libraries, this is usually difficult to achieve. Most libraries do not have a corresponding research mandate nor the personnel resources to finance such projects on a long-term basis. Nevertheless, there are also technologies that smaller institutions need to invest in such as cloud-based services or infrastructure as service. But they need to commit to this, including beyond the project phases. It is anchored in the Agenda 2025/30 that it is a long-term commitment within the context of the automation that is coming up anyway. This has been boosted by the coronavirus pandemic in particular, when people saw how well things can function even when they take place online. The fact that people regard this as a task and seek out information about it correspondingly. The mandate is to explore the technology deliberately. Only in this way can people at working or management level see not only the degree of investment required, but also what successes they can expect.

But it’s not only libraries that have recently, i.e. in the last ten years, begun to explore the topic of AI. It is comparable with small and medium-sized businesses or other public institutions that deal with the Online Access Act and other issues. They are also exploring these kinds of algorithms, in order to find solidarity. Libraries are not the only ones here. This is very important because many of the measures, particularly those at the level of the German federal states, were not necessarily designed with libraries in mind in respect of the distribution of AI tasks or funding.

That’s why we intended our publication (German) also as a political paper. Political in the sense of informing politicians or decision-makers about financial possibilities that we also need the framework to be able to apply. In order to then test things and decide whether we want to use any indexing or other tools such as language tools permanently in the library world and to network with other organisations.

The task for smaller libraries who cannot manage to have research groups is definitely to explore the technology and to develop their position for the next five to ten years. This requires such counterpoints to what is commonly covered by meta-search engines such as Wikipedia. Especially as libraries have a completely different lifespan than companies, in terms of their way of thinking and sustainability. Libraries are designed to last as long as the state or the university exists. Our lifecycles are therefore measured differently. And we need to position ourselves accordingly.

Not all libraries and infrastructure institutions have the capacity to develop a comprehensive AI department with corresponding personnel. So does it make sense to bundle competences and use synergy effects?

Anna Kasprzik:Yes and no. We are in touch with other institutions such as the German National Library. Our scientific employee and developer is working on the further development of the Finnish toolkit Annif with colleagues from the National Library of Finland, for example. This toolkit is also interesting for many other institutions for primary use. I think it’s very good to exchange ideas, also regarding our experiences with toolkits such as this one.

However, I discover time and again that there are limits to this when I advise other institutions; for example, just last week I advised some representatives from Swiss libraries. You can’t do everything for the other institutions. If they want to use these instruments, institutions have to train them on their own data. You can’t just train the models and then plant them one-to-one into other institutions. For sure, we can exchange ideas, give support and try to develop central hubs where at least structures or computing power resources are provided. However, nothing will be developed in this kind of hub that is an off-the-shelf solution for everyone. This is not how machine learning works.

Frank Seeliger: The library landscape in Germany is like a settlement and not like a skyscraper. In the past, there was a German library institute (DBI) that tried to bundle many matters in the academic libraries in Germany across all sectors. This kind of central unit no longer exists, merely several library groups relating to institutions and library associations relating to personnel. So a central library structure that could take on the topic of AI doesn’t exist. There was an RFID working group (German) (or also Special Interest Group RFID at the IFLA), and there should actually also be a working group for robots (German), but of course someone has to do it, usually alongside their actual job.

In any case, there is no central library infrastructure that could take up this kind of topic as a lobby organisation, such as Bitkom, and break it down into the individual companies. The route that we are pursuing is broadly based. This is related to the fact that we operate in very different ways in the different German federal states, owing to the relationship between national government and federal states. The latter have sovereignty in many areas, meaning that we have to work together on a project basis. It will be important to locate cooperation partners and not try to work alone, because it is simply too much. There is definitely not going to be a central contact point. The German Research Center for Artificial Intelligence (DFKI) does not have libraries on its radar either. There’s no one to call. Everything is going to run on a case-by-case and interest-related basis.

How do you find the right cooperation partners?

Frank Seeliger: That’s why there are library congresses where people can discuss issues. Someone gives a presentation about something they have done and then other people are interested: they get together, write applications for third-party funding or articles together, or try to organise a conference themselves. Such conference already exist, and thus a certain structure of exchange has been established.

I am the conservative type. I read articles in library journals, listen to conference news or attend congresses. That’s where you have the informal exchange – you meet other people. Alongside social media, which is also important. But if you don’t reach people via the social media channels, then there is (hopefully soon to return) physical exchange on site via certain section days, for example. Next week we have another Section IV meeting of the German Library Association (DBV) in Dresden where 100 people will get together. The chances of finding colleagues who have similar issues or are dealing with a similar topic are high. Then you can exchange ideas – the traditional way.

Anna Kasprzik: But there are also smaller workshops for specialists. For example, the German National Library has been organising a specialist congress of the network for automated subject indexing (German) (FNMVE) for those who are interested in automated approaches to subject indexing.

I also enjoy networking via social media. You can also find most people who are active in the field on the internet, e.g. on Twitter or Mastodon. I started using Twitter in 2016 and deliberately developed my account by following people with an interest in semantic web technologies. These are individuals, but they represent an entire network. I can’t name individual institutions; what is relevant are individual community members.

And how did you get to know each other? I’m referring to the working group that compiled this non-white paper.

Anna Kasprzik: It’s all Frank’s fault.

Frank Seeliger: Anna came here once. I had invited Mr Puppe in the context of a digitalisation project in which AI methods supported optical character recognition (OCR) and image identification of historical works. Exactly via the traditional route that I’ve just described, i.e. via a symposium; this was how the first people were invited..

Then the need to position ourselves on this topic developed. I had spoken with a colleague from the Netherlands at a conference shortly before. He said that they had been too late with their AI white paper, meaning that politics had not taken them into account and libraries had not received any special funding for AI tools. That was the wake-up call for me and I thought, here in Germany there is also nothing I am aware of that is specifically for information institutions. I then researched who had publications on the topic. That’s how the network, which is still active, developed. We are working on the English translation at the moment.

What is your plea to the management of information institutions? At the beginning, Anna, you already spoke about commitment, also from “the very top”, being a crucial factor. But going beyond this: what course needs to be set now and which resources need to be built up, to ensure that libraries don’t lose out in the age of AI?

Anna Kasprzik: For institutions who can, it’s important to develop long-term expertise. But I completely understand Frank’s point of view: it is valid to say that not every institution can afford this. So two aspects are important for me: one is to cluster expertise and resources at certain central institutions. The other is to develop communication structures across institutions or to share a cloud structure or something similar. To create a network in order to spread it around. To enable dissemination, i.e. the sharing of these experiences for reuse.

Frank Seeliger: Perhaps there is a third aspect: to reflect on the business process that you are responsible for so that you can identify whether it is suitable for an AI-supported automation, for example. To reflect on this yourself, but to encourage your colleagues to reflect on their own workflows too, as to whether routine tasks can be taken over by machines and thereby relieve them of some of the workload. For example, in our library association, the Kooperativer Bibliotheksverbund Berlin-Brandenburg (KOBV), we had the problem that we would have liked to set up a lab. Not only to play, but also to see together how we can technically support tasks that are really very close to real life. I don’t want to say that the project failed, but the problem was that first you needed the ideas: What can you actually tackle with AI? What requires a lot of time? Is it the indexing? Other work processes that are done over and over again like a routine with a high degree of similarity? We wanted the lab to look at exactly these processes and check if we could automate them, independently of what library management systems do or all the other tools with which we work.

It’s important to initiate the process of self-reflection on automation and digitalisation in order to identify fields of work. Some have expertise in AI, others in their own fields, and they have to come together. The path leads through one’s own reflection to enter into conversation and to sound out whether solutions can be found..

And to what extent can the management support?

Frank Seeliger: Leadership is about bringing people together and giving impetus. The coronavirus pandemic and digitalisation have put a lot of pressure on many people. There is a saying by Angela Merkel. She once said that she only got around to thinking during the Christmas period. However, you want to interpret that now. Out of habit and because you want to clear the pile of work on your desk during working hours, it’s often difficult to reflect on what you are doing and if there isn’t already a tool that could help. Then it’s the task of the management level to look at these processes and where appropriate to say, yes, maybe the person could be helped with this. Let’s organise a project and take a closer look.

Anna Kasprzik: Yes, that’s one of the tasks, but for me the role of management is above all to take the load off the employees and clear a path for them. This brings another buzzword into play: agile working. It’s not only about giving an impetus, but also about supporting people by giving them some leeway so that they can work in a self-dependent manner. The agile manifesto, so to speak, which also leads to the fact that one creates space for experimenting and allows for failure sometimes. Otherwise, nothing will come to fruition.

Frank Seeliger:We will soon be doing a “Best of Failure” survey, because we want to ask what kind of error culture we really have, as it is sacrosanct. This will also be the topic of the Wildau Library Symposium (German) from 13 to 14 September 2022. In it, we will explore this error culture more intensively. Because it is right. Even in IT projects, you simply have to allow things to go wrong. Of course, they don’t have to be taken on as a permanent task if they don’t go well. But sometimes it’s good to just try, because you can’t predict whether a service will be accepted or not. What do we learn from these mistakes? We talk about it relatively little, mostly about successful projects that go well and attract crazy amounts of funding. But the other part also has to come into focus in order to learn better from it and be able to utilise aspects of it for the next project.

Is there anything else that you would like to say at the end?

Frank Seeliger: AI is not just a task for large institutions.

Anna Kasprzik: Exactly, AI concerns everyone. Even though AI should not be dealt with just for the sake of AI, but rather to develop new innovative services that would otherwise not be possible.

Frank Seeliger: There are naturally other topics, no question about that. But you have to address it and sort out the various topics.

Anna Kasprzik: : It’s important that we get the message across to people that automated approaches should not be regarded as a threat, but rather that by now this digital jungle exists anyway, so we need tools to find our way through it. AI therefore represents new potential and added value, and not a threat that will be used to eliminates people’s jobs..

Frank Seeliger: We have also been asked the question: What is the added value of automation? Of course, you spend less time on routine processes that are very manually. This creates scope to explore new technologies, to do advanced training or to have more time for customers. And we need this scope to develop new services. You simply have to create that scope, also for agile project management, so that you don’t spend 100% of your time clearing some pile of work or other from your desks, but can instead use 20% for something new. AI can help give us this time.

Thank you for the interview, Anna and Frank.

Part 1 of the interview on “AI in Academic Libraries” is about areas of activity, the big players and the automation of indexing.
In part 2 of the interview on “AI in Academic Libraries” we explore interesting projects, the future of chatbots and the problem of discrimination through AI.

This might also interest you:

We were talking to:

Dr Anna Kasprzik, coordinator of the automation of subject indexing (AutoSE) at the ZBW – Leibniz Information Centre for Economics. Anna’s main focus lies on the transfer of current research results from the areas of machine learning, semantic technologies, semantic web and knowledge graphs into productive operations of subject indexing of the ZBW. You can also find Anna on Twitter and Mastodon.
Portrait: Photographer: Carola Gruebner, ZBW©

Dr Frank Seeliger (German) has been the director of the university library at the Technical University of Applied Sciences Wildau since 2006 and has been jointly responsible for the part-time programme Master of Science in Library Computer Sciences (M.Sc.) at the Wildau Institute of Technology since 2015. One module explores AI. You can find Frank on ORCID.
Portrait: TH Wildau

Featured Image: Alina Constantin / Better Images of AI / Handmade A.I / Licensed by CC-BY 4.0

The post AI in Academic Libraries, Part 3: Prerequisites and Conditions for Successful Use first appeared on ZBW MediaTalk.

AI in Academic Libraries, Part 2: Interesting Projects, the Future of Chatbots and Discrimination Through AI

Interview with Frank Seeliger (TH Wildau) and Anna Kasprzik (ZBW)

We recently had an intense discussion with Anna Kasprzik (ZBW) and Frank Seeliger (Technical University of Applied Sciences Wildau – TH Wildau) on the use of artificial intelligence in academic libraries. Both of them were recently involved in two wide-ranging articles: “On the promising use of AI in libraries: Discussion stage of a white paper in progress – part 1” (German) and “part 2” (German).

Dr Anna Kasprzik coordinates the automation of subject indexing (AutoSE) at the ZBW – Leibniz Information Centre for Economics. Dr Frank Seeliger (German) is the director of the university library at the Technical University of Applied Sciences Wildau and is jointly responsible for part-time programme Master of Science in Library Computer Sciences (M.Sc.) at the Wildau Institute of Technology.

This slightly shortened, three-part series has been drawn up from our spoken interview. These two articles are also part of it:

What are currently the most interesting AI projects in libraries and infrastructure institutions?

Anna Kasprzik: Of course, there are many interesting AI projects. Off the top of my head, the following two come to mind: The first one is interesting for you if you are interested in the issue of optical character recognition (OCR). Because, before you can even start to think about automated subject indexing, you have to create metadata, i.e. “food” for the machine. So to speak: segmenting digital texts into their structural fragments, extracting an abstract automatically. In order to do this, you run OCR on the scanned text. Qurator (German) is an interesting project in which machine learning methods are used as well. The Staatsbibliothek zu Berlin (Berlin State Library) and the German Research Center for Artificial Intelligence (DFKI) are involved, among others. This is interesting because at some point in the future it might give us the tools we need in order to be able to obtain the data input required for automated subject indexing.

The other project is the Open Research Knowledge Graph (ORKG) of the TIB Hannover. The Open Research Knowledge Graph is a way of representing scientific results no longer as a document, i.e. as a PDF, but rather in an entity-based way. Author, research topic or method – all nodes in one graph. This is the semantic level and one could use machine learning methods in order to populate it.

Frank Seeliger: Only one project: it is running at the ZBW and the TH Wildau and explores the development of a chatbot with new technologies. The idea of chatbots is actually relatively old. A machine conducts a dialogue with a human being. In the best case, the human being does not recognise that a machine is running in the background – the Turing Test. Things are not quite this advanced yet, but the issue we are all concerned with is that libraries are being consulted – in chat rooms, for example. Many libraries aim to offer a high level of service at the times when researchers and students work, i.e. round the clock. This can only take place if procedures are automated, via chatbots for example, so that difficult questions can be also answered outside the opening hours, at weekends and on public holidays.

I am therefore hoping firstly that the input we receive concerning chatbot development means that it will become a high-quality standard service that offers fast orientation and gives information with excellent predictive quality about a library or special services. This would create the starting point for other machines such as moving robots. Many people are investing in robots, playing around with them and trying out various things. People are expecting that they will be able to go to them and ask, “Where is book XY?” or “How do I find this and that?”, and that these robots can deal with such questions profitably and show “there’s that” in an oriented way and point their finger at it. That’s one thing.

The second thing that I find very exciting for projects is to win people over to AI at an early stage. Not just to save AI as a buzzword, but to look behind the scenes of this technology complex. We tried to offer a certificate course (German). However, demand has been too low for us to offer the course. But we will try it again. The German National Library provides a similar course that was well attended. I think it’s important to make a low-threshold offer across the board, i.e. for a one-person library or for small municipal libraries that are set up on a communal basis, as well as for larger university libraries. That people get to grips with the subject matter and find their own way, where they can reuse something, where there are providers or cooperation partners. I find this kind of project is very interesting and important for the world of libraries.

But this too can only be the starting point for many other offers of special workshops, on Annif for example or other topics that can be discussed at a level that non-informaticians can understand as well. It’s an offer to colleagues who are concerned with it, but not necessarily at an in-depth level. As with a car – they don’t manufacture the vehicle themselves, but want to be able to repair or fine-tune it sometimes. At this level, we definitely need more dialogue with the people who are going to have to work with it, for example as system administrators who set up or manage such projects. The offers must also be focused towards the management level – the people who are in charge of budgeting, i.e. those who sign third-party funding applications.

At both institutions, the TH Wildau and the ZBW, you are working on the use of chatbots. Why is this AI application area for academic libraries so promising? What are the biggest challenges?

Frank Seeliger: The interesting perspective for me is that we can operate the development of a chatbot together with other libraries. It is nice when not only one library serves as a knowledge base in the background for the typical examples. This is not possible with locally specific information such as opening hours or spatial conditions. Nevertheless, many synergy effects are created. We can bring them together and be in a position to generate as large a quantity of data as possible, so that the quality of the assertions that are automatically generated is simply better than if we were to set it up individually. The output quality has a lot to do with the data quality. Although it is not true that the more data, the better the information. Other factors also play a role. But generally, small solutions tend to fail because of the small quantity of data.

Especially in view of the fact that a relatively high number of libraries are keen to invest in robot solutions that “walk” through the library outside the opening hours and offer services, like the robot librarian. If the service is used, it therefore makes twice as much sense to offer something online, but also to retrieve it using a machine that rolls through the premises and offers the service. This is important, because the personal approach from the library to the clients is a very decisive and differentiating feature as opposed to the large meta levels that offer their services in the commercial field. Looking for dialogue and paying attention to the special requirements of the users: this is what makes the difference.

Anna Kasprzik: Even though I am not involved in the chatbot project at ZBW, I can think of three challenges. The first is that you need an incredible amount of training data. Getting hold of that much data is relatively difficult. Here at ZBW we have had a chat feature for a long time – without a bot. These chats have been recorded but first they had to be cleaned of all personal data. This was an immense amount of editorial work. That is the first challenge.

The second challenge: it’s a fact that relatively trivial questions, such as the opening hours, are easily answered. But as soon as things become more complex, i.e. when there are specialised questions, you need a knowledge graph behind the chatbot. And setting this up is relatively complex.

Which brings me to the third challenge: during the initial runs, the project team established that quite a few of the users had reservations and quickly thought, “It doesn’t understand me”. So there were reservations on both sides. We therefore have to be mindful of the quality aspect and also of the “trust” of the users.

Frank Seeliger: But the interactions also follow the direction of speech, particularly from the younger generations who are now coming through as students in the libraries. This generation communicates via voice messages: the students speak with Siri or Alexa and they are informal when speaking to technologies. FIZ Karlsruhe has attempted to define search queries using Alexa. That went well in itself, but it failed because of the European General Data Protection Regulation (GDPR), the privacy of information and the fact that data was processed somewhere in the USA. Naturally, that is not acceptable.

That’s why it is good that libraries are doing their own thing – they have data sovereignty and can therefore ensure that the GDPR is maintained and that user data is treated carefully. But it would be a strategic mistake if libraries did not adapt to the corresponding dialogue. Very simply because a lot of these interactions no longer take place with writing and reading alone, but via speech. As far as apps and features are concerned, much is communicated via voice messages, and libraries need to adapt to this fact. It starts with chatbots, but the question is whether search engines will be able to cope with (voice) messages at some point and then filter out the actual question. Making a chatbot functional and usable in everyday life is only the first step. With spoken language, this then incorporates listening and understanding.

Is there a timeframe for the development of the chatbot?

Anna Kasprzik: I’m not sure, when the ZBW is planning to put its chatbot online; it could take one or two years. The real question is: when will such chatbots become viable solutions in libraries globally? This may take at least ten years or longer – without wanting to crush hopes too much.

Frank Seeliger: There are always unanticipated revivals popping up, for which a certain impetus is needed. For example, I was in the IT section of the International Federation of Library Associations and Institutions (IFLA) on statistics. We considered whether we could determine statistics clearly and globally, and depict them as a portfolio. Initially it didn’t work – it was limited to one continent: Latin America. Then the section received a huge surprise donation from the Bill and Melinda Gates Foundation and with it, the project IFLA Library Map of the World could be implemented.

It was therefore a very special impetus that led to something that we would normally not have achieved with ten years’ work. And when this impetus exists through tenders, funding, third-party donors that accelerate exactly this kind of project, perhaps also from a long-term perspective, the whole thing takes on a new dynamic. If the development of chatbots in libraries continues to stagnate like this, they will not use them on a market-wide scale. There was also a movement with contactless object recognition via radio waves (Radio-Frequency Identification, RFID). It started in 2001 in Siegburg, then Stuttgart and Munich. Now, it is used in 2,000 to 3,000 libraries. I don’t see this impetus with chatbots at all. That’s why I don’t think that, in ten or 15 years, chatbots will be used in 10% to 20% of libraries. It’s an experimental field. Maybe some libraries will introduce them, but it will be a handful, perhaps a dozen. However if a driving force occurs owing to external factors such as funding or a network initiative, the whole concept may receive new momentum.

The fact that AI-based systems make discriminatory decisions is often regarded as a general problem. Does this also apply to the library context? How can this be prevented?

Anna Kasprzik: That’s a very tricky question. Not many people are aware that potential difficulties almost always arise from the training data because training data is human data. These data sources contain our prejudices. In other words, whether the results may have a discriminating effect or not depends on the data itself and on the knowledge organisation systems that underpin it.

One movement that is gathering pace is known as de-colonisalisation. People are therefore taking a close look at the vocabularies they use, thesauri and ontologies. The problem has come up for us as well: since we also provide historical texts, terms that have racist connotations today appeared in the thesaurus . Naturally, we primarily incorporate terms that are considered politically correct. But these definitions can shift over time. The question is: what do you do with historical texts where this word occurs in the title? The task is then to find different ways to provide them as hidden elements of the thesaurus but not to display them in the interface.

There are knowledge organisation systems that are very old and have developed in times very different from ours. We need to restructure them completely as a matter of urgency. It’s always a balancing act if you want to display texts from earlier periods with the structures that were in use at that time. Because I must both not falsify the historical context, but also not offend anyone who wants to search in these texts and feel represented or at least not discriminated against. This is a very difficult question, particularly in libraries. People often think: that’s not an issue for libraries, it’s only relevant in politics, or that sort of thing. But on the contrary, libraries reflect the times in which they exist, and rightly so.

Frank Seeliger: Everything that you can use can also be misused. This applies to every object. For example, I was very impressed in Turkey. They are working with a big Koha approach (library software), meaning that more than 1,000 public libraries are using the open source solution Koha as their library management software. They therefore know, among other things, which book is most often borrowed in Turkey. We do not have this kind of information at all in Germany via the German Library Statistics (DBS, German). This doesn’t mean that this knowledge discredits the other books, that they are automatically “leftovers”. You can do a lot with knowledge. The bias that exists with AI is certainly the best known. But it is the same for all information: should monuments be pulled down or left standing? We need to find a path through the various moral phases that we live through as a society.

In my own studies, I specialised in pre-Colombian America. To name one example, the Aztecs never referred to themselves as Aztecs. If you searched in catalogues of libraries pre-1763, the term “Aztec” did not exist. They called themselves Mexi‘ca. Or we could take the Kerensky Offensive – search engines do not have much to offer on that. It was a military offensive that was only named that afterwards. It used to be called something else. It is the same challenge: to refer to both terms, even if the terminology has changed, or if it is no longer “en vogue” to work with a certain term.

Anna Kasprzik: This is also called concept drift and it is generally a big problem. It’s why you always have to retrain the machines: concepts are continually developing, new ones emerge or old terms change their meaning. Even if there is no discrimination, terminology is constantly evolving
.

And who does this work?

Anna Kasprzik: The machine learning experts at the institution.

Frank Seeliger: The respective zeitgeist and its intended structure.

Thank you for the interview, Anna and Frank.

Part 1 of the interview on “AI in Academic Libraries” is about areas of activity, the big players and the automation of indexing.
Part 3 of the interview on “AI in Academic Libraries” focuses on prerequisites and conditions for successful use
We will share the link here as soon as the post is published

This text has been translated from German.

This might also interest you:

We were talking to:

Dr Anna Kasprzik, coordinator of the automation of subject indexing (AutoSE) at the ZBW – Leibniz Information Centre for Economics. Anna’s main focus lies on the transfer of current research results from the areas of machine learning, semantic technologies, semantic web and knowledge graphs into productive operations of subject indexing of the ZBW. You can also find Anna on Twitter and Mastodon.
Portrait: Photographer: Carola Gruebner, ZBW©

Dr Frank Seeliger (German) has been the director of the university library at the Technical University of Applied Sciences Wildau since 2006 and has been jointly responsible for the part-time programme part-time programme Master of Science in Library Computer Sciences (M.Sc.) at the Wildau Institute of Technology since 2015. One module explores AI. You can find Frank on ORCID.
Portrait: TH Wildau

Featured Image: Alina Constantin / Better Images of AI / Handmade A.I / Licensed by CC-BY 4.0

The post AI in Academic Libraries, Part 2: Interesting Projects, the Future of Chatbots and Discrimination Through AI first appeared on ZBW MediaTalk.

AI in Academic Libraries, Part 1: Areas of Activity, Big Players and the Automation of Indexing

Interview with Frank Seeliger (TH Wildau) and Anna Kasprzik (ZBW)

We recently had an intense discussion with Anna Kasprzik (ZBW) and Frank Seeliger (Technical University of Applied Sciences Wildau) on the use of artificial intelligence (AI) in academic libraries. Both of them were also recently involved in two wide-ranging articles: “On the Promising Use of AI in Libraries: Discussion Stage of a White Paper in Progress – Part 1“ (German) and “Part 2 (German). This slightly shortened, three-part series has been drawn up from our spoken interview. These two articles are also part of the following text:

  • Part 2: Interesting Projects, the Future of Chatbots and Discrimination Through AI
  • Part 3: Prerequisites and Conditions for Successful Use

We will link them here as soon as the texts are online.

An interview with Dr Anna Kasprzik (ZBW – Leibniz Information Centre for Economics) and Dr Frank Seeliger (University Library of the Technical University of Applied Sciences Wildau).

What are the most promising areas of activity for the use of AI in academic libraries?

Frank Seeliger: Time and again, reports crop up about how great the automation potential of different job profiles is. This also applies to libraries: In the case of the management of an institution, automation using AI is minimal, but for the specialists for media and information services (FaMI in German), it could be up to 50%.

In the course of automation and digitalisation, it’s largely about changing process chains and automating so that users can borrow or return media autonomously in the libraries – outside opening hours or during rush hour – essentially as an interaction between human and machine.

Even the display of availabilities in the catalogue is a consequence of the use of automation and digitalisation of services in libraries. Users can check at home whether a medium is available. Services in this area – those dealing with how to access a service outside the immediate vicinity and opening hours – are certainly increasing, for example in the context of asking a question or using something during the evening, including via remote access. This process continues and also includes internal procedures such as leave requests or budget planning. These processes run completely differently in comparison to 15 years ago.

One of the first areas of activity for libraries is the automatic letter and number recognition, including for older works, cimelia, early printed books or also generally in the context of digitalisation for all the projects there. This is the one area of expertise of libraries in layout, identification and recognition. The other is the question of indexing. Many years ago, libraries worked almost exclusively with printed works, keywording them and indexing their content. Nowadays detection systems have tables of contents and work with what are known as “component parts of a bibliographically independent work”, i.e. articles that are co-documented in discovery tools or search engines. The question is always: “How should we prepare this knowledge so that it can be found using completely different approaches?” Competitors such as Wikipedia and Google predetermine the speed to some extent. We try to keep up or go into niche fields where we have different expertise, another perspective. These are definitely the first areas of activity in the field of operations, search activities or indexing and digitalisation, where AI is helping us to go further than before.

It has thereby been possible for many libraries to offer services at lower personnel cost even beyond the opening hours of public libraries (Open Level concept). Not round the clock, but for several more hours – even if no-one is in the building.

We need to make sure that we provide students with relatively high-quality information at different places and different times in their various locations. This is why chatbots for example (there’s more to come about this in part 2 of this article series) are such an exciting development, because students do not necessarily work when libraries are open or when our service times are available, but rather during the evenings, at the weekend or on public holidays. Libraries have the urgent task of providing them with sufficient and quality-checked information. We need to position ourselves where the modern technologies are.

Anna Kasprzik: Perhaps I’m biased because I’m working in the field but for me it’s very important to differentiate: I am specialised in the field of automation of subject indexing in academic libraries; the core task is to process and provide information intelligently. For me, this is the most interesting field. However, I sometimes get the impression that some libraries are falling into a trap: they want to do “something with AI” because it’s cool at the moment and then just end up dabbling in it.

But it’s really important to tackle the core tasks and thus prove that libraries can stay relevant. These days, core tasks such as subject indexing are impossible to imagine without automation. Previously this work was done intellectually by people, often even by people with doctorates. But because the tasks are changing and the quantity of digital publications is growing so rapidly, humans can only achieve a fraction of what is required. This is why we need to automate and successively find ways to combine humans and machines more intelligently. In machine learning, we speak of the „Human in the Loop“. By this, we mean the various ways in which humans and machines can work together to solve problems. We really need to focus on the core tasks. And we need to apply methods of artificial intelligence and not just do explorative projects that might be interesting in the short-term but are not thought through at a sustainable level.

Frank Seeliger: The challenge is that, even when you have a very narrow field that you are trying to research and describe, it’s difficult to stay up to date with all relevant articles. You need tools such as the Open Research Knowledge Graph (ORKG). With its help, content can be compared with the same methods and similar facts, without reading the entire article. Because this naturally requires time and energy. It’s impossible to read 20 scientific articles a day. But that’s how many are produced in some fields. That’s why you need to develop intelligent tools that help scientists to get a fast overview of which articles to prioritise for reading, absorbing and reflecting on.

But it goes even further. In the authors’ group of the „White Papers in progress“ (German), which we held for one year, we asked ourselves what search of the future would be like: Will we still search for keywords? We’re familiar with this from plagiarism detection software into which entire documents are entered. The software checks whether there is a match with other publications and whether non-cited text is used without permission. But you can also turn the whole thing around by saying: I have written something; have I forgotten a significant, current contribution in science? As a result, you get a semantic ontological hint that there is already an article on the topic you have explored which you should reflect on and incorporate. This is a perspective for us, because we assume that today one can hardly become master of the situation, even when they have an interdisciplinary focus or are exploring a new field. It would also be exciting to find a way in via a graphic analysis that ensures that you have not forgotten anything important.

(How) can libraries keep up with big players such as Google, Amazon or facebook? Do they even have to?

Frank Seeliger: We’ve had some very intensive disagreements about this and come to the conclusion that libraries will never have the men-and-women power that other corporations have, even if we were able to only have one single world library. Even then it would be questionable whether we would be able to establish a parallel world (and if we would even want this). After all, others cater for other target groups. But even in the case of Google Scholar, the target group is quite clearly defined.

Our expertise lies in the respective field that we have licenced, for which we have access. Every higher education institution has different points of focus for its own teaching and research. For this, it ensures very privileged, exclusive access which is used to reflect precisely on what is in the full text or is licenced and what can be accessed by going to the shelves. This is and remains the task.

Although it is also changing. How will things develop, for example, if a very high percentage of publications are published in Open Access and the data becomes freely accessible? There are semantic search engines that are experimenting with this. Examples are YEWNO at Bayerische Staatsbibliothek (Bavarian State Library) or iris.ai, a company that has a headquarters in Prague, among other places. They work a lot with Open Access literature and try to process it differently on a scientific level than before. So in this respect, tasks also change.

Libraries need to reposition themselves if they want to stay in the race. But it’s clear that our core task is first of all to process the material that we have licenced and for which we pay a lot of money in the best possible way. The aim must be that our users, i.e. students or researchers, find the information they need relatively quickly and not after the 30th hit.

One of the ways in which libraries are intrinsically different to the big players lies in how they deal with personal data. The relationship to personal data when using services is diametrically opposed to the offers of the big players, because values such as trustworthiness, transparency etc. play an enormously important role for the services of libraries.

Do students even start their search in library catalogues? Don’t they go directly to the general internet search engines?

Anna Kasprzik: They use Google relatively often. At the ZBW, we are actually currently analysing the routes via which users enter our research portal. It’s often Google hits. But I don’t see that as a problem because the research portal of a library is only one reuse scenario of metadata that libraries create. You can also make it available for reuse as Linked Open Data. And what’s more: Google uses a lot of this data, so it is already integrated into Google.

And to respond to the other question, we have also discussed this in the paper, at least in the early draft. The fact that libraries are publicly funded means that they have a very different set of ethics when dealing with the personal data of users. And this has many advantages because they don’t constantly try to milk the users according to their needs or requirements. Libraries simply want to provide the best-prepared information possible. This is a strong moral advantage, which we can utilise to our benefit. But libraries do not sell this advantage, at least not very much.

There is also an age-old disagreement about this (which has nothing to do with AI, however) – many students or also PhD candidates do not realise that in their everyday lives, they are using data that a library has prepared and made available for them. They call up a paper in the university and do not notice that its link has been made available via their library and that the library has paid for this. And then, there are two factions: some people say that the users shouldn’t notice that it must occur as smoothly as possible. The others believe that, actually, there should be a big fat notice stating “provided by your library” so that people can’t miss it.

Frank Seeliger: The visualisation of the library work that is reused by third parties is a great challenge and must be properly championed because otherwise, if it is no longer visible, people will start asking why they are giving money to libraries at all? The results are visible but not who has financed them and/or people don’t notice that they are actually commercial products.

Another aspect that we discussed was the issue of transparency and freedom from advertising. We organised a virtual Open Access Week (German) from November 2021 to March 2022. We made video recordings of each ninety-minute session. Then we asked ourselves: Should we use YouTube for publication or the non-commercial video portal of the TIB Leibniz Information Centre for Science and Technology and University Library (TIB AV Portal)? We made a clear-cut decision to use the TIB AV portal and they have accepted us there. We decided in favour of the portal precisely because there are no advertisements, no overlays and no pop-up windows. If we work with discovery tools, we try to advertise the fact that you really don’t get any advertising and reach your goal with your very first hit. Therefore, several aspects differentiate us significantly from commercial providers. We are having that discussion right now; it’s an important difference.

Will the intellectual creation of metadata soon become superfluous because intelligent search engines will take over this task?

Anna Kasprzik: This is a fundamental issue for me. I say: “no”, or perhaps “yes and no”. What we are doing at the moment via our automation of subject indexing with machine learning methods is an attempt to imitate the intellectual subject indexing one-to-one, just the same way it has always been done. But for me this is only a way for us to get our foot in the door technologically. In the next few years, we will address this and start designing the interplay between human knowledge organisation expertise and machines in a more intelligent way – reorganise it completely. I can imagine that we will not necessarily need to do the intellectual subject indexing in advance in the same way that we are currently doing it. Instead, intelligent search engines can try to index content resources taking the context into account.

But even if they are able to do this from the context ad hoc, those engines require a certain amount of underlying semantic structuring. And this structuring needs to exist in advance. It will therefore always be necessary to prepare information so that the pattern recognition algorithms can access them in the first place. If you merely dive into the raw data, the result is chaos, because the available metadata is fuzzy. You need structuring that pulls the whole thing more sharply into focus, even if it only accommodates the machine to a partial extent and not completely. There exist completely different ways of interconnecting search queries and retrieval results. But intelligent search engines still have to have something up their sleeve, and that something is organised knowledge. This knowledge organisation requires human expertise as input at certain points. The question is: at which points?

Frank Seeliger: There is also the opposing view of TIB director Prof. Dr Sören Auer, who says that data collection is overvalued. Certainly also meant as a provocation or simply to test how far one can go. In the future, it may not be necessary to have as many colleagues working in the field of intellectual indexing.

For example, we have 16,000 graduate thesis held in the library of the TH Wildau library; the entire lists of contents are being scanned and made OCR-compatible. The question is, can you systematise them according to the Regensburger Verbundklassifikation (RVK, Regensburger Association Classification; a classification scheme for academic libraries), perhaps with the Annif tool? This means that I don’t have to look at each dissertation and say, this one belongs in the field of engineering, etc., independently of the study courses in which they were written. But instead, here is the RVK graph, there are the tables of contents, then they are matched according to certain algorithms. This is a different approach to when I, as a specialist, take a look at every work and index it correspondingly for keywords, the Integrated Authority File (GND; a service facilitating the collaborative use and administration of authority data) and so on, run through all the procedures. I see this as a new way of master or mistress of the masses, because a great deal is published; because we have taken over responsibilities that did not used to be covered by libraries, such as the indexing of articles, i.e. component parts of a bibliographically independent work, besides bibliographically independent works. It’s definitely a great help.

However I cannot imagine that humans no longer intervene at all in such algorithms and offer a pre-structuring according to which they must act. Up to now, it’s been the case that we require a lot of human intervention to trim and optimise these systems better, so that the results are indexed 99% correctly. That’s one objective. This requires control and pre-structuring, looking at, training data. For example in calligraphy, when you check if a letter has been recognised correctly. Checking and handling by human beings is still necessary.

Anna Kasprzik: Exactly – I mentioned the concept earlier: the “human in the loop”, i.e. that people can be involved at various levels. These can start out very trivially: with the fact that training data or our knowledge organisation systems are generated by humans. Or the fact that you can use automatically generated keywords as suggestions – machine-assisted subject indexing.

There are also concepts such as online learning and active learning. Online learning means that the machine receives feedback relatively consistently from the indexer, as to how good its output was and based on that retraining takes place. Active learning is where the machine can interactively decide at certain points: I now need a person as an oracle for a partial decision. The machine initiates this, saying: “Human, I am pushing a few part-decisions that I need into the queue here – please work through them.” People and machines tend to toss the ball back and forth here, rather than doing it separately in two blocks.

Thank you for the interview, Anna and Frank.

In part 2 of the interview on “AI in Academic Libraries” we explore exciting projects regarding the future of chatbots and discrimination through AI.
Part 3 of the interview on “AI in Academic Libraries” focuses on prerequisites and conditions for successful use.
We’ll share the link here as soon as the post is published.

This text has been translated from German.

This might also interest you:

We were talking to:

Dr Anna Kasprzik, coordinator of the automation of subject indexing (AutoSE) at the ZBW – Leibniz Information Centre for Economics. Anna’s main focus lies on the transfer of current research results from the areas of machine learning, semantic technologies, semantic web and knowledge graphs into productive operations of subject indexing of the ZBW. You can also find Anna on Twitter and Mastodon.
Portrait: Photographer: Carola Gruebner, ZBW©

Dr Frank Seeliger (German) has been the director of the university library at the Technical University of Applied Sciences Wildau since 2006 and has been jointly responsible for the part-time programme Master of Science in Library Computer Sciences (M.Sc.) at the Wildau Institute of Technology since 2015. One module explores AI. You can find Frank on ORCID.
Portrait: TH Wildau

Featured Image: Alina Constantin / Better Images of AI / Handmade A.I / Licensed by CC-BY 4.0

The post AI in Academic Libraries, Part 1: Areas of Activity, Big Players and the Automation of Indexing first appeared on ZBW MediaTalk.

Guest Post — Leveraging the Digital ABCs: AI, Big Data, and Cloud Computing

In today’s guest post, Hong Zhou and Megan Prosser of Atypon explore how new technology and new ideas — specifically around AI, Big Data, and Cloud computing – can advance our industry.

The post Guest Post — Leveraging the Digital ABCs: AI, Big Data, and Cloud Computing appeared first on The Scholarly Kitchen.

Horizon Report 2022: Trends Such as Hybrid Learning, Micro-certificates and Artificial Intelligence are Gaining Traction

by Claudia Sittner and Birgit Fingerle

The 2022 EDUCAUSE Horizon Report Teaching and Learning Edition was published in mid-April 2022. It examines which trends, technologies and practices will have a significant effect on teaching and learning at universities in the future. As with previous editions, the report uses four different scenarios to envision what the future of university education could look like. We outline some of these trends, which could be of interest to academic libraries and information infrastructure facilities.

We outline some of these trends, which could be of interest to academic libraries and information infrastructure facilities.

Hybrid learning: Here to stay

After around two years of the Corona pandemic, most of us are now aware that there will be no return to pre-Corona normality; online and hybrid learning are the new normal. One trend identified by the Horizon Report is a continuation of synchronous and asynchronous learning experiences, coupled with minimal compulsory attendance on campus. According to the Horizon Report (p. 7), this will require “more sustainable and evidence-based models of hybrid and online teaching and learning”. These have now gradually superseded the contingency plans hastily put in place at the start of the pandemic, and will be accompanied by recently developed, reliable hybrid and online education, as well as an investment in additional staff and services. Higher education institutions now have to focus on making sure their students are ready for the online learning experiences. This is certainly an area where academic libraries can also play their part. The new motto is: Education for everyone, from anywhere.
Example: ‘Attend anywhere’ model, Portland, USA

Micro-certificates are winning out over classic university degrees

Lifelong, tailored learning is gaining importance over typical, drawn-out degrees, according to the Horizon Report. Both microcredentialing and online/hybrid education are particularly useful in this regard. That is why libraries and digital infrastructure facilities should be increasingly focussed on more practical, personalised and competence-based courses and micro-certificates, which according to the Horizon Report could potentially provide more attractive options for career advancement than a traditional university education. For example, libraries could think about providing their own courses to make their offers more visible, and thus prevent the big tech companies from dominating the field entirely.

Furthermore, the fact that many people experienced significant financial losses as a result of the pandemic has led them to think more carefully about whether it pays to opt for a typical university degree. Micro-certificates, especially those awarded for free by institutions such as libraries, are thus becoming more attractive.

Read more:

Artificial intelligence: learning analysis and learning tools

Even if it still often gets stuck in the teething stage, the application of artificial intelligence (AI) plays a role in two respects in this year’s Horizon Report: in relation to both learning analysis and learning tools. In relation to learning analysis, institutions would primarily use AI to encourage students’ learning progress based on existing data. When it comes to learning tools it is the students themselves who use them, and are thus able to improve their learning experience at university.

The digital re-orientation brought about by the pandemic has also heralded a flood of digital data. For academic libraries too this means engaging more directly with the potential of the data that has been generated, and ultimately providing their users and staff with an improved learning and working experience.

One challenge in this regard could emerge from the data silos of individual departments, divisions or institutes. These have to be more closely integrated in order to optimise the user experience and encourage operating efficiency. Despite the great potential of AI, there are also some risks to be aware of, such as the fact that AI systems often adopt existing biases and thus favour certain groups of users. This can increase inequalities. What’s more, it is important to clearly communicate what data is being gathered for what purpose, so that users do not lose trust in the institutions.

Read more:

Data trails demand critical engagement with media

In light of growing data volumes, users of infrastructure facilities are leaving more data trails behind them online, whether in the cloud or on social networks. This means it is even more important to equip them with sufficient information literacy and a media-critical mindset, so that they can recognise fake news, dubious conferences and predatory journals, for example. In this regard, academic libraries have a more important role to play than ever when it comes to offering relevant courses and further support.

Strengthening sustainable practices and reducing the ecological footprint

Environmental aspects are also becoming increasingly relevant in how all higher education institutions conduct themselves. It will be a question of them reducing their own ecological footprint on site, and leading by example. Here, libraries can take a look at the permanently altered behaviour brought about by the pandemic, as well as the new demands of users and staff. The ‘Planetary Health Education Framework’ and the 17 Sustainable Development Goals (SDGs) proposed by the United Nations could provide possible points of reference. Is this perhaps a good time for academic libraries to think about how they can become more sustainable and strengthen environmentally friendly practices?

Allegation of political interference

In times of increasing nationalism and populism in some parts of the world, along with global uncertainties, it would be advisable for educational institutions to safeguard their autonomy. However, due to the financing that they require, it is not always possible to withdraw from political matters completely. “In these instances, institutions must be prepared to offer compelling evidence of the benefits of the education and training they provide, as well as to accommodate the needs of increasingly strained and distracted students and families.” (p. 13). In light of increasingly scarce financial resources, more focus could also be afforded to academic libraries in this regard.

This text has been translated from German.

You may also be interested in:

About the Authors:

Birgit Fingerle holds a diploma in economics and business administration and works at ZBW, among others, in the fields innovation management, open innovation, open science and currently in particular with the “Open Economics Guide”. Birgit Fingerle can also be found on Twitter.
Portrait, photographer: Northerncards©

Claudia Sittner studied journalism and languages in Hamburg and London. She was a long time lecturer at the ZBW publication Wirtschaftsdienst – a journal for economic policy, and is now the managing editor of the blog ZBW MediaTalk. She is also a freelance travel blogger (German), speaker and author. She can also be found on LinkedIn, Twitter and Xing.
Portrait: Claudia Sittner©

The post Horizon Report 2022: Trends Such as Hybrid Learning, Micro-certificates and Artificial Intelligence are Gaining Traction first appeared on ZBW MediaTalk.

Discrimination Through AI: To What Extent Libraries are Affected and how Staff can Find the Right Mindset

An interview with Gunay Kazimzade (Weizenbaum Institute for the Networked Society – The German Internet Institute)

Gunay, in your research, you deal with the discrimination through AI systems. What are typical examples of this?

Typically, biases occur in all forms of discrimination in our society, such as political, cultural, financial, or sexual. These are again manifested in the data sets collected and the structures and infrastructures around the data, technology, and society, and thus represent social standards and decision-making behaviour in particular data points. AI systems trained upon those data points show prejudices in various domains and applications.

For instance, facial recognition systems built upon biased data tend to discriminate against people of colour in several computer vision applications. According to research from MIT Media Lab, white male and black female accuracy differ dramatically in vision models. In 2018, Amazon “killed” its hiring system, which has started to eliminate female candidates for engineering and high-level positions. This outcome resulted from the company’s culture to prefer male candidates to females in those particular positions traditionally. These examples clarify that AI systems are not objective and are mapping human biases we have in society to the technological level.

How can library or digital infrastructure staff develop an awareness of this kind of discrimination? To what extent can they become active themselves?

Bias is an unavoidable consequence of situated decision-making. The decision of who and how classifies data, which data points are included in the system, is not new to libraries’ work. Libraries and archives are not just the data storage, processing, and access providers. They are critical infrastructures committed to making information available and discoverable yet with the desirable vision to eliminate discriminatory outcomes of those data points.

Imagine a situation where researchers approach the library asking for images to train a face recognition model. The quality and diversity of this data directly impact the results of the research and system developed upon those data. Diversity in images (Youtube) has been recently investigated in the “Gender shades” study by Joy Buolamwini from MIT Media Lab. The question here is: Could library staff identify demographic bias in the data sets before the Gender Shades study was published? Probably not.

The right mindset comes from awareness. Awareness is the social responsibility and self-determination framed with the critical library skills and subject specialization. Relying only on metadata would not be necessary for eliminating bias in data collections. Diversity in staffing and critical domain-specific skills and tools are crucial assets in analysing library system digitised collections. Training of library staffing, continuous training, and evaluation should be the primary strategy of the libraries on the way to detect, understand and mitigate biases in library information systems.

If you want to develop AI systems, algorithms, and designs that are non-discriminatory, the right mindset plays a significant role. What factors are essential for the right attitude? And how do you get it?

Whether it is a developer, user, provider, or another stakeholder, the right mindset starts with the

  • Clear understanding of the technology use, capabilities as well as limitations;
  • Diversity and inclusion in the team, asking the right questions at the right time;
  • Considering team composition for the diversity of thought, background, and experiences;
  • Understanding the task, stakeholders, and potential for errors and harm;
  • Checking data sets: Consider data provenance. What is the data intended to represent?;
  • Verifying the quality of the system through qualitative, experimental, survey, and other methods;
  • Continual monitoring, including customer feedback;
  • Having a plan to identify and respond to failures and harms as they occur;

Therefore, long-term strategy for library information systems management should include

  • Transparency
    • Transparent processes
    • Explainability/interpretability for each worker/stakeholder
  • Education
    • Special Education/Training
    • University Education
  • Regulations
    • Standards/Guidelines
    • Quality Metrics

Everybody knows it: You choose a book from an online platform and get other suggestions a la “People who bought this book also bought XYZ”. Are such suggestion and recommendation systems, which can also exist in academic libraries, discriminatory? In what way? And how can we make them fairer?

Several research findings suggest making recommendations fairer and out of the “filter bubbles” created by technology deployers. In recommendations, transparency and explainability are among the main techniques for approaching this problem. Developers should consider the explainability of the suggestions made by the algorithms and make the recommendations justifiable for the user of the system. It should be transparent for the user based on which criteria this particular book recommendation was made and whether it was based on gender, race, or other sensitive attributes. Library or digital infrastructure staff are the main actors in this technology deployment pipeline. They should be conscious and reinforce the decision-makers to deploy the technology that includes the specific features for explainability and transparency in the library systems.

What can they do if an institute, library, or repository wants to find out if their website, library catalogue, or other infrastructure they offer is discriminatory? How can they tell who is being discriminated against? Where can they get support or a discrimination check-up done?

First, “check-up” should start by verifying the quality of the data through quantitative and qualitative, mixed experimental methods. In addition, there are several open-access methodologies and tools for fairness check and bias detection/mitigation in several domains. For instance, AI Fairness 360 is an open-source toolkit that helps to examine, report, and mitigate discrimination and bias in machine learning models throughout the AI application lifecycle.

Another useful tool is “Datasheets for datasets”, intended to document the datasets used for training and evaluating machine learning models; this tool is very relevant in developing metadata for library and archive systems, which can be further used for model training.

Overall, everything starts with the right mindset and awareness on approaching the bias challenge in specific domains.

Further Readings

We were talking to:

Gunay Kazimzade is a Doctoral Researcher in Artificial Intelligence at the Weizenbaum Institute for the Networked Society in Berlin, where she is currently working with the research group “Criticality of AI-based Systems”. She is also a PhD candidate in Computer Science at the Technical University of Berlin. Her main research directions are gender and racial bias in AI, inclusivity in AI, and AI-enhanced education. She is a TEDx speaker, Presidential Award of Youth winner in Azerbaijan and AI Newcomer Award winner in Germany. Gunay Kazimzade can also be found on Google Scholar, ResearchGate und LinkedIn.
Portrait: Weizenbaum Institute©

The post Discrimination Through AI: To What Extent Libraries are Affected and how Staff can Find the Right Mindset first appeared on ZBW MediaTalk.

Science Checker: Open Access and Artificial Intelligence Help Verify Claims

An Interview with Sylvain Massip

What is the Science Checker?

In July 2021, the Science Checker went online in a beta version. In this version, it only deals with health topics. As a first step, it is intended to help science journalists and other scientific fact checkers to test the likelihood of a claim. 3 million Open Access articles from PubMed (out of 36 million) serve as the data basis for the Open Source tool. It uses artificial intelligence to check whether a claim is supported, discussed or rejected by the scientific literature. As a result, it shows how many and which documents it has found on the topic, when they were published and to what extent they make the claim probable. The guiding question is always: “What does the research literature say about this? In the practical operation of the Science Checker, three fields must first be filled in: agent, effect (increase, cause, prevent, cure) and disease.

To make the Science Checker more imaginable, here are a few examples: “Does caffeine lead to more intelligence?” (unfortunately unlikely). For this question, the tool finds three sources in the database.

There are 5933 sources on the question whether smoking causes cancer. Of these, 80% are confirmatory and 20% negative. For the question “Does sport prevent heart attacks?” the Science Checker finds 420 sources, of which only the first 20 relevant ones are included in the first probability calculation. Click on “Add” to add the next 20 articles or on “All” to calculate the total. Since the latter takes some time, a notification is sent out by e-mail as soon as the result is available.

We have already introduced the idea behind the Science Checker in the article “Opscidia: Fighting Fake News via Open Access”. To get a practical impression of the tool’s possibilities, we recommend simply trying it out yourself: to the Science Checker.

In this interview, we talk to Sylvain Massip, one of five members of the Science Checker team, about his experiences during the first five months that the tool has been online in a beta version. He explains who the Science Checker is aimed at, how it is financed and what contribution libraries can make.

What happened since you introduced your idea to use Open Access (OA) to fight fake news a year ago here at ZBW MediaTalk?

We have now developed a beta version of the Science Checker, which is available and usable by everyone online. It is a tool for journalists and fact-checkers which works with an Artificial Intelligence (AI) pipeline that retrieves the articles of interest for the request of the user, and classifies them as supporting and contradicting the original claim entered by the user, or neutral in some cases. The data used for the Science Checker comes from a dump of Open Access articles from Europe PMC.

For whom did you create the Science Checker?

Our first targeted audience are the scientific journalists and scientific fact-checkers.
But the Science Checker carries the will of Opscidia to make scientific literature more and more accessible beyond academic circles only. That’s why we aim for a larger use of it, open to all curious people.

The Science Checker has been launched in July 2021. So, it is online for about five months now. What are your first experiences and feedback? What were your biggest challenges?

Since the release of the Science Checker, it has been tried by more than 400 people. It is indeed a relatively slow uptake, but that was to be expected with a beta version. Our main challenge now is to find the right partners to help us in two aspects: increasing the accuracy of the tool and its growth potential.

What role does Artificial Intelligence play?

In simple words, AI is trained by our developers to be able to read articles, to understand it and to get the essential information out of it. Thanks to this upstream process, the AI used in the Science Checker will analyse millions of articles in a very short time in order to give you an answer based on many different sources of information.

Why is it so important that there are practical application examples for the use of OA?

Open Access is an important issue of our era. The free diffusion of academic knowledge is of paramount importance for many topics, from sanitary crisis to sustainable development. The OA community has to show the real value of it, that its activity is useful even outside of academia and related to global challenges. Open Access should not stay a topic for academic activists, it should spread for the common good.

You told us that there are now five people working on the Science Checker. How is it financed? Who pays the bill?

Yes, there are indeed five people who took part in the project, but in different ways. One main developer has shaped the Science Checker, Loic Rakotoson, who worked for more than four months full time on it. But he is not the only developer who has worked on it actually. Frejus Laleye and Timothée Babinet have developed part of the code used by the Science Checker. Charles Letaillieur, Opscidia’s CTO, has managed the project technically and Sylvain Massip, Opscidia’s CEO, has done most of the scientific design. In addition, together with Enzo Rodrigues, I also did a lot of work for the promotion of the Science Checker in conferences and on social media.

Financially, this beta version of the Science Checker was developed as a project, which is now over. This project was funded by the Vietsch Foundation, that we would like to thank warmly for their support.

We see it as a first step, and now that we have done a successful first draft, we are looking for the funding of our next iteration to keep the process going and build a second version of our Science Checker.

How can you guarantee its sustainability?

For the time being, we try to ensure its sustainability by focusing our maintenance on the very most important things, as we are doing it with Opscidia’s own funds. But in the future, our goal is to have a major partner, such as a large media company, in order to fund its development, communication and maintenance.

How can libraries and information infrastructures support you? Which role do they play in the project?

First of all, libraries can fund Open Science, Opscidia’s and others, to ensure that initiatives such as the Science Checker have the data they need. Indeed, we are directly dependent on information sources, their quantity and their quality.

They can also help us by spreading the word about the Science Checker and Opscidia’s other activities, and of course, we are happy to partner with any interested party for the continuation of the project.

Are you still looking for partners/support for the Science Checker? Who? How can you be supported?

We want to continue to develop the Science Checker to improve the results and optimise its performance. It is also possible that we will have to develop additional features to the tool, if we identify other needs for the user. Thus, we continue to seek funding to help us in this direction. Moreover, we are quite open to potential technological partnerships if they are relevant for the evolution of the Science Checker. Furthermore, anyone can support us by providing feedback on its use. This is an essential source of information for us and has very often allowed us to match our tools to the needs of the users.

Is your system open and can be (re-)used by others?

Yes, absolutely. Our system is totally open, but we do not own the data. It comes from the Europe PMC database. The system is Open Source, the source code is accessible for anybody in our Github and can be reused freely as long as it is for non-commercial applications.

What is your vision for the Science Checker? Where do you see it in, say, five years?

In terms of software development, the next objectives are to increase the size of the dataset that we use, make it more precise and more general. By that, we mean that we aim for a tool capable of doing the same work for all scientific fields and not just medical sciences.

In terms of usage, we want to partner with major media so that our Science Checker could be used on a daily basis for fact checking purposes.

This might also interest you:

An Interview with Sylvain Massip
Sylvain Massip is the CEO of Opscidia, the company which is responsible for the Science Checker. He has a PhD in Physics from the University of Cambridge and ten years’ experience at the boundaries between science and industry. Passionate about research, he believes that scholarly communication can be improved, for the benefit of researchers and beyond. He took part in the scientific design of the project and its promotion.

Portrait: Sylvain Massip©

The post Science Checker: Open Access and Artificial Intelligence Help Verify Claims first appeared on ZBW MediaTalk.

Where Does Enhancement End and Citation Begin?

As more publishers semantically enrich documents, Todd Carpenter considers whether links are the same as citations

The post Where Does Enhancement End and Citation Begin? appeared first on The Scholarly Kitchen.

Horizon Report 2021: Focus on Hybrid Learning, Microcredentialing and Quality Online Learning

by Claudia Sittner

The 2021 EDUCAUSE Horizon Report Teaching and Learning Edition was published at the end of April 2021 and looks at what trends, technologies and practices are currently driving teaching and learning and how they will significantly shape its future.

The report runs through four different scenarios of what the future of higher education might look like: growth, constraint, collapse or transformation. Only time will tell which scenario prevails. With this in mind, we looked at the Horizon Report 2021 to see what trends it suggests for academic libraries and information infrastructure institutions.

Artificial Intelligence

Artificial intelligence (AI) has progressed so rapidly since the last Horizon Report 2020 that people cannot catch up fast enough to test the technical advances of machines in natural language proceedings. Deep learning has evolved into self-supervised learning, where AI learns from raw or unlabelled data.

Artificial intelligence has a potential role to play in all areas of higher education where learning, teaching and success are concerned: support for accessible apps, student information and learning management systems, examination systems and library services, to name but a few. AI can also help analyse learning experiences and identify when students seem to be floundering academically. The much greater analytics opportunities that have emerged as the vast majority of learning events take place online, leaving a wide trail of analysable data, can help to better understand students and adapt learning experiences to their needs more quickly.

But AI also remains controversial: for all its benefits, questions about privacy, data protection and ethical aspects often remain unsatisfactorily answered. For example, there are AI-supported programmes that paraphrase texts so that other AI-supported programmes do not detect plagiarism.

Open Educational Resources

For Open Educational Resources (OER), the pandemic has not changed much, many of the OER offerings are “born digital” anyway. However, advantages of OER such as cost savings (students have to buy less literature), social equality (free and from everywhere) and the fact that materials are updated faster are gaining importance. Despite these obvious advantages and the constraints that corona brought with it, however, only a few teachers have switched to OER so far as the report „Digital Texts in Times of COVID” (PDF) shows. 87% of teachers still recommend the same paid textbooks.

OER continue to offer many possibilities, such as teachers embedding self-assessment questions directly into pages alongside text, audio and video content, and students receiving instant feedback. In some projects, libraries and students are also involved in the development of materials as OER specialists, alongside other groups from the academic ecosystem, helping to break down barriers within the discipline and redesign materials from their particular perspective.

In Europe, for example, the ENCORE+ – European Network for Catalyzing Open Resources in Education is working to build an extensive OER ecosystem. Also interesting: the „Code of Best Practices in Fair Use für Open Educational Resources”. It can be a tool for librarians when they want to create OER and use other data, including copyrighted data.

Learning Analytics

Online courses generate lots of data: How many learners have participated? When did they arrive? When did they leave? How did they interact? What works and what doesn’t? In higher education, learning data analysis should help make better, evidence-based decisions to best support the increasingly diverse group of learners. Academic libraries also often use such data to better understand and interpret learner needs, respond promptly and readjust.

The Syracuse University Libraries (USA), for example, have transmitted its user data via an interface to the university’s own learning analysis programme (CLLASS). A library profile was developed for this purpose, which was consistent with the library’s values, ethics, standards, policies and practices. This enabled responsible and controlled transmission of relevant data, and a learner profile could be created from different campus sources.

Just as with the use of artificial intelligence, there are many objections in this area regarding moral aspects and data protection. In any case, the handling of such learning data requires sensitisation and special training so that teachers, advisors and students can use data sensibly and draw the right conclusions. In the end, students could also receive tailored virtual support throughout the entire process from enrolment to graduation. Infrastructures for data collection, analysis and implementation are essential for this.

Microcredentials

Microcredentials are new forms of certification or proof of specific skills. They are also better suited to the increasingly diverse population of learners than traditional degrees and certificates. Unlike these, they are more flexible, designed for a shorter period of time and often more thematically focused. The spectrum of microcredentials spans six areas from short courses and badges, to bootcamps and the classic degree or accredited programmes.

Microcredentials are becoming increasingly popular and can also be combined with classic certifications. The Horizon Report 2021 sees particular potential for workers who can use them to retrain and further their education. It is therefore hardly surprising that companies like Google are also appearing on the scene with Google Career Certificates. For many scientific institutes, this means that they will have to further develop and rethink the architecture, infrastructure and work processes of their traditional certification systems.

Blended and Hybrid Course Models

Due to the corona pandemic, diverse blended and hybrid course models mushroomed, especially in the summer of 2020. “It is clear that higher education has diversified quickly and that these models are here to stay”, the report says. Hybrid courses allow more flexibility in course design; institutions can ramp up capacity as needed and cater even more to the diverse needs of students. However, most students still prefer face-to-face teaching.

Newly learned technical skills and technical support have played a predominant role. In some places, new course models have been developed together with the learners. On the other hand, classic practices (such as frequent assessments, breakout groups during live course meetings, and check-in messages to individual students) remain high on the agenda. However, corona has brought mental and social health of all participants into sharper focus; it should also receive even more attention according to the Horizon Report.

Quality Online Learning

The coronavirus came along and everything suddenly had to take place online. So it is little wonder that the need to design, meaningfully evaluate and adapt high-quality online learning opportunities has increased enormously. Some were surprised to find that teaching online involved more effort than simply offering the on-site event via Zoom. In order to achieve learning success, online quality assurance became an issue of utmost relevance.

Early in the pandemic, therefore, institutes began to develop online portals or hubs that included materials and teaching strategies adapted to the situation: for content delivery, to encourage student participation and to rethink assessment mechanisms.

A positive example is the twelve-module course “Quickstarter Online-Lehre” (Quickstarter Online Teaching, German) by the Hochschulforum Digitalisierung – German Forum for Higher Education in a digital age and the Gesellschaft für Medien in der Wissenschaft (Society for media in science) from Germany. This course aims to support teachers with no or little online experience.

This text has been translated from German.

This might also interest you:

The post Horizon Report 2021: Focus on Hybrid Learning, Microcredentialing and Quality Online Learning first appeared on ZBW MediaTalk.