Discrimination Through AI: To What Extent Libraries are Affected and how Staff can Find the Right Mindset

An interview with Gunay Kazimzade (Weizenbaum Institute for the Networked Society – The German Internet Institute)

Gunay, in your research, you deal with the discrimination through AI systems. What are typical examples of this?

Typically, biases occur in all forms of discrimination in our society, such as political, cultural, financial, or sexual. These are again manifested in the data sets collected and the structures and infrastructures around the data, technology, and society, and thus represent social standards and decision-making behaviour in particular data points. AI systems trained upon those data points show prejudices in various domains and applications.

For instance, facial recognition systems built upon biased data tend to discriminate against people of colour in several computer vision applications. According to research from MIT Media Lab, white male and black female accuracy differ dramatically in vision models. In 2018, Amazon “killed” its hiring system, which has started to eliminate female candidates for engineering and high-level positions. This outcome resulted from the company’s culture to prefer male candidates to females in those particular positions traditionally. These examples clarify that AI systems are not objective and are mapping human biases we have in society to the technological level.

How can library or digital infrastructure staff develop an awareness of this kind of discrimination? To what extent can they become active themselves?

Bias is an unavoidable consequence of situated decision-making. The decision of who and how classifies data, which data points are included in the system, is not new to libraries’ work. Libraries and archives are not just the data storage, processing, and access providers. They are critical infrastructures committed to making information available and discoverable yet with the desirable vision to eliminate discriminatory outcomes of those data points.

Imagine a situation where researchers approach the library asking for images to train a face recognition model. The quality and diversity of this data directly impact the results of the research and system developed upon those data. Diversity in images (Youtube) has been recently investigated in the “Gender shades” study by Joy Buolamwini from MIT Media Lab. The question here is: Could library staff identify demographic bias in the data sets before the Gender Shades study was published? Probably not.

The right mindset comes from awareness. Awareness is the social responsibility and self-determination framed with the critical library skills and subject specialization. Relying only on metadata would not be necessary for eliminating bias in data collections. Diversity in staffing and critical domain-specific skills and tools are crucial assets in analysing library system digitised collections. Training of library staffing, continuous training, and evaluation should be the primary strategy of the libraries on the way to detect, understand and mitigate biases in library information systems.

If you want to develop AI systems, algorithms, and designs that are non-discriminatory, the right mindset plays a significant role. What factors are essential for the right attitude? And how do you get it?

Whether it is a developer, user, provider, or another stakeholder, the right mindset starts with the

  • Clear understanding of the technology use, capabilities as well as limitations;
  • Diversity and inclusion in the team, asking the right questions at the right time;
  • Considering team composition for the diversity of thought, background, and experiences;
  • Understanding the task, stakeholders, and potential for errors and harm;
  • Checking data sets: Consider data provenance. What is the data intended to represent?;
  • Verifying the quality of the system through qualitative, experimental, survey, and other methods;
  • Continual monitoring, including customer feedback;
  • Having a plan to identify and respond to failures and harms as they occur;

Therefore, long-term strategy for library information systems management should include

  • Transparency
    • Transparent processes
    • Explainability/interpretability for each worker/stakeholder
  • Education
    • Special Education/Training
    • University Education
  • Regulations
    • Standards/Guidelines
    • Quality Metrics

Everybody knows it: You choose a book from an online platform and get other suggestions a la “People who bought this book also bought XYZ”. Are such suggestion and recommendation systems, which can also exist in academic libraries, discriminatory? In what way? And how can we make them fairer?

Several research findings suggest making recommendations fairer and out of the “filter bubbles” created by technology deployers. In recommendations, transparency and explainability are among the main techniques for approaching this problem. Developers should consider the explainability of the suggestions made by the algorithms and make the recommendations justifiable for the user of the system. It should be transparent for the user based on which criteria this particular book recommendation was made and whether it was based on gender, race, or other sensitive attributes. Library or digital infrastructure staff are the main actors in this technology deployment pipeline. They should be conscious and reinforce the decision-makers to deploy the technology that includes the specific features for explainability and transparency in the library systems.

What can they do if an institute, library, or repository wants to find out if their website, library catalogue, or other infrastructure they offer is discriminatory? How can they tell who is being discriminated against? Where can they get support or a discrimination check-up done?

First, “check-up” should start by verifying the quality of the data through quantitative and qualitative, mixed experimental methods. In addition, there are several open-access methodologies and tools for fairness check and bias detection/mitigation in several domains. For instance, AI Fairness 360 is an open-source toolkit that helps to examine, report, and mitigate discrimination and bias in machine learning models throughout the AI application lifecycle.

Another useful tool is “Datasheets for datasets”, intended to document the datasets used for training and evaluating machine learning models; this tool is very relevant in developing metadata for library and archive systems, which can be further used for model training.

Overall, everything starts with the right mindset and awareness on approaching the bias challenge in specific domains.

Further Readings

We were talking to:

Gunay Kazimzade is a Doctoral Researcher in Artificial Intelligence at the Weizenbaum Institute for the Networked Society in Berlin, where she is currently working with the research group “Criticality of AI-based Systems”. She is also a PhD candidate in Computer Science at the Technical University of Berlin. Her main research directions are gender and racial bias in AI, inclusivity in AI, and AI-enhanced education. She is a TEDx speaker, Presidential Award of Youth winner in Azerbaijan and AI Newcomer Award winner in Germany. Gunay Kazimzade can also be found on Google Scholar, ResearchGate und LinkedIn.
Portrait: Weizenbaum Institute©

The post Discrimination Through AI: To What Extent Libraries are Affected and how Staff can Find the Right Mindset first appeared on ZBW MediaTalk.

Science Checker: Open Access and Artificial Intelligence Help Verify Claims

An Interview with Sylvain Massip

What is the Science Checker?

In July 2021, the Science Checker went online in a beta version. In this version, it only deals with health topics. As a first step, it is intended to help science journalists and other scientific fact checkers to test the likelihood of a claim. 3 million Open Access articles from PubMed (out of 36 million) serve as the data basis for the Open Source tool. It uses artificial intelligence to check whether a claim is supported, discussed or rejected by the scientific literature. As a result, it shows how many and which documents it has found on the topic, when they were published and to what extent they make the claim probable. The guiding question is always: “What does the research literature say about this? In the practical operation of the Science Checker, three fields must first be filled in: agent, effect (increase, cause, prevent, cure) and disease.

To make the Science Checker more imaginable, here are a few examples: “Does caffeine lead to more intelligence?” (unfortunately unlikely). For this question, the tool finds three sources in the database.

There are 5933 sources on the question whether smoking causes cancer. Of these, 80% are confirmatory and 20% negative. For the question “Does sport prevent heart attacks?” the Science Checker finds 420 sources, of which only the first 20 relevant ones are included in the first probability calculation. Click on “Add” to add the next 20 articles or on “All” to calculate the total. Since the latter takes some time, a notification is sent out by e-mail as soon as the result is available.

We have already introduced the idea behind the Science Checker in the article “Opscidia: Fighting Fake News via Open Access”. To get a practical impression of the tool’s possibilities, we recommend simply trying it out yourself: to the Science Checker.

In this interview, we talk to Sylvain Massip, one of five members of the Science Checker team, about his experiences during the first five months that the tool has been online in a beta version. He explains who the Science Checker is aimed at, how it is financed and what contribution libraries can make.

What happened since you introduced your idea to use Open Access (OA) to fight fake news a year ago here at ZBW MediaTalk?

We have now developed a beta version of the Science Checker, which is available and usable by everyone online. It is a tool for journalists and fact-checkers which works with an Artificial Intelligence (AI) pipeline that retrieves the articles of interest for the request of the user, and classifies them as supporting and contradicting the original claim entered by the user, or neutral in some cases. The data used for the Science Checker comes from a dump of Open Access articles from Europe PMC.

For whom did you create the Science Checker?

Our first targeted audience are the scientific journalists and scientific fact-checkers.
But the Science Checker carries the will of Opscidia to make scientific literature more and more accessible beyond academic circles only. That’s why we aim for a larger use of it, open to all curious people.

The Science Checker has been launched in July 2021. So, it is online for about five months now. What are your first experiences and feedback? What were your biggest challenges?

Since the release of the Science Checker, it has been tried by more than 400 people. It is indeed a relatively slow uptake, but that was to be expected with a beta version. Our main challenge now is to find the right partners to help us in two aspects: increasing the accuracy of the tool and its growth potential.

What role does Artificial Intelligence play?

In simple words, AI is trained by our developers to be able to read articles, to understand it and to get the essential information out of it. Thanks to this upstream process, the AI used in the Science Checker will analyse millions of articles in a very short time in order to give you an answer based on many different sources of information.

Why is it so important that there are practical application examples for the use of OA?

Open Access is an important issue of our era. The free diffusion of academic knowledge is of paramount importance for many topics, from sanitary crisis to sustainable development. The OA community has to show the real value of it, that its activity is useful even outside of academia and related to global challenges. Open Access should not stay a topic for academic activists, it should spread for the common good.

You told us that there are now five people working on the Science Checker. How is it financed? Who pays the bill?

Yes, there are indeed five people who took part in the project, but in different ways. One main developer has shaped the Science Checker, Loic Rakotoson, who worked for more than four months full time on it. But he is not the only developer who has worked on it actually. Frejus Laleye and Timothée Babinet have developed part of the code used by the Science Checker. Charles Letaillieur, Opscidia’s CTO, has managed the project technically and Sylvain Massip, Opscidia’s CEO, has done most of the scientific design. In addition, together with Enzo Rodrigues, I also did a lot of work for the promotion of the Science Checker in conferences and on social media.

Financially, this beta version of the Science Checker was developed as a project, which is now over. This project was funded by the Vietsch Foundation, that we would like to thank warmly for their support.

We see it as a first step, and now that we have done a successful first draft, we are looking for the funding of our next iteration to keep the process going and build a second version of our Science Checker.

How can you guarantee its sustainability?

For the time being, we try to ensure its sustainability by focusing our maintenance on the very most important things, as we are doing it with Opscidia’s own funds. But in the future, our goal is to have a major partner, such as a large media company, in order to fund its development, communication and maintenance.

How can libraries and information infrastructures support you? Which role do they play in the project?

First of all, libraries can fund Open Science, Opscidia’s and others, to ensure that initiatives such as the Science Checker have the data they need. Indeed, we are directly dependent on information sources, their quantity and their quality.

They can also help us by spreading the word about the Science Checker and Opscidia’s other activities, and of course, we are happy to partner with any interested party for the continuation of the project.

Are you still looking for partners/support for the Science Checker? Who? How can you be supported?

We want to continue to develop the Science Checker to improve the results and optimise its performance. It is also possible that we will have to develop additional features to the tool, if we identify other needs for the user. Thus, we continue to seek funding to help us in this direction. Moreover, we are quite open to potential technological partnerships if they are relevant for the evolution of the Science Checker. Furthermore, anyone can support us by providing feedback on its use. This is an essential source of information for us and has very often allowed us to match our tools to the needs of the users.

Is your system open and can be (re-)used by others?

Yes, absolutely. Our system is totally open, but we do not own the data. It comes from the Europe PMC database. The system is Open Source, the source code is accessible for anybody in our Github and can be reused freely as long as it is for non-commercial applications.

What is your vision for the Science Checker? Where do you see it in, say, five years?

In terms of software development, the next objectives are to increase the size of the dataset that we use, make it more precise and more general. By that, we mean that we aim for a tool capable of doing the same work for all scientific fields and not just medical sciences.

In terms of usage, we want to partner with major media so that our Science Checker could be used on a daily basis for fact checking purposes.

This might also interest you:

An Interview with Sylvain Massip
Sylvain Massip is the CEO of Opscidia, the company which is responsible for the Science Checker. He has a PhD in Physics from the University of Cambridge and ten years’ experience at the boundaries between science and industry. Passionate about research, he believes that scholarly communication can be improved, for the benefit of researchers and beyond. He took part in the scientific design of the project and its promotion.

Portrait: Sylvain Massip©

The post Science Checker: Open Access and Artificial Intelligence Help Verify Claims first appeared on ZBW MediaTalk.

Where Does Enhancement End and Citation Begin?

As more publishers semantically enrich documents, Todd Carpenter considers whether links are the same as citations

The post Where Does Enhancement End and Citation Begin? appeared first on The Scholarly Kitchen.

Horizon Report 2021: Focus on Hybrid Learning, Microcredentialing and Quality Online Learning

by Claudia Sittner

The 2021 EDUCAUSE Horizon Report Teaching and Learning Edition was published at the end of April 2021 and looks at what trends, technologies and practices are currently driving teaching and learning and how they will significantly shape its future.

The report runs through four different scenarios of what the future of higher education might look like: growth, constraint, collapse or transformation. Only time will tell which scenario prevails. With this in mind, we looked at the Horizon Report 2021 to see what trends it suggests for academic libraries and information infrastructure institutions.

Artificial Intelligence

Artificial intelligence (AI) has progressed so rapidly since the last Horizon Report 2020 that people cannot catch up fast enough to test the technical advances of machines in natural language proceedings. Deep learning has evolved into self-supervised learning, where AI learns from raw or unlabelled data.

Artificial intelligence has a potential role to play in all areas of higher education where learning, teaching and success are concerned: support for accessible apps, student information and learning management systems, examination systems and library services, to name but a few. AI can also help analyse learning experiences and identify when students seem to be floundering academically. The much greater analytics opportunities that have emerged as the vast majority of learning events take place online, leaving a wide trail of analysable data, can help to better understand students and adapt learning experiences to their needs more quickly.

But AI also remains controversial: for all its benefits, questions about privacy, data protection and ethical aspects often remain unsatisfactorily answered. For example, there are AI-supported programmes that paraphrase texts so that other AI-supported programmes do not detect plagiarism.

Open Educational Resources

For Open Educational Resources (OER), the pandemic has not changed much, many of the OER offerings are “born digital” anyway. However, advantages of OER such as cost savings (students have to buy less literature), social equality (free and from everywhere) and the fact that materials are updated faster are gaining importance. Despite these obvious advantages and the constraints that corona brought with it, however, only a few teachers have switched to OER so far as the report „Digital Texts in Times of COVID” (PDF) shows. 87% of teachers still recommend the same paid textbooks.

OER continue to offer many possibilities, such as teachers embedding self-assessment questions directly into pages alongside text, audio and video content, and students receiving instant feedback. In some projects, libraries and students are also involved in the development of materials as OER specialists, alongside other groups from the academic ecosystem, helping to break down barriers within the discipline and redesign materials from their particular perspective.

In Europe, for example, the ENCORE+ – European Network for Catalyzing Open Resources in Education is working to build an extensive OER ecosystem. Also interesting: the „Code of Best Practices in Fair Use für Open Educational Resources”. It can be a tool for librarians when they want to create OER and use other data, including copyrighted data.

Learning Analytics

Online courses generate lots of data: How many learners have participated? When did they arrive? When did they leave? How did they interact? What works and what doesn’t? In higher education, learning data analysis should help make better, evidence-based decisions to best support the increasingly diverse group of learners. Academic libraries also often use such data to better understand and interpret learner needs, respond promptly and readjust.

The Syracuse University Libraries (USA), for example, have transmitted its user data via an interface to the university’s own learning analysis programme (CLLASS). A library profile was developed for this purpose, which was consistent with the library’s values, ethics, standards, policies and practices. This enabled responsible and controlled transmission of relevant data, and a learner profile could be created from different campus sources.

Just as with the use of artificial intelligence, there are many objections in this area regarding moral aspects and data protection. In any case, the handling of such learning data requires sensitisation and special training so that teachers, advisors and students can use data sensibly and draw the right conclusions. In the end, students could also receive tailored virtual support throughout the entire process from enrolment to graduation. Infrastructures for data collection, analysis and implementation are essential for this.

Microcredentials

Microcredentials are new forms of certification or proof of specific skills. They are also better suited to the increasingly diverse population of learners than traditional degrees and certificates. Unlike these, they are more flexible, designed for a shorter period of time and often more thematically focused. The spectrum of microcredentials spans six areas from short courses and badges, to bootcamps and the classic degree or accredited programmes.

Microcredentials are becoming increasingly popular and can also be combined with classic certifications. The Horizon Report 2021 sees particular potential for workers who can use them to retrain and further their education. It is therefore hardly surprising that companies like Google are also appearing on the scene with Google Career Certificates. For many scientific institutes, this means that they will have to further develop and rethink the architecture, infrastructure and work processes of their traditional certification systems.

Blended and Hybrid Course Models

Due to the corona pandemic, diverse blended and hybrid course models mushroomed, especially in the summer of 2020. “It is clear that higher education has diversified quickly and that these models are here to stay”, the report says. Hybrid courses allow more flexibility in course design; institutions can ramp up capacity as needed and cater even more to the diverse needs of students. However, most students still prefer face-to-face teaching.

Newly learned technical skills and technical support have played a predominant role. In some places, new course models have been developed together with the learners. On the other hand, classic practices (such as frequent assessments, breakout groups during live course meetings, and check-in messages to individual students) remain high on the agenda. However, corona has brought mental and social health of all participants into sharper focus; it should also receive even more attention according to the Horizon Report.

Quality Online Learning

The coronavirus came along and everything suddenly had to take place online. So it is little wonder that the need to design, meaningfully evaluate and adapt high-quality online learning opportunities has increased enormously. Some were surprised to find that teaching online involved more effort than simply offering the on-site event via Zoom. In order to achieve learning success, online quality assurance became an issue of utmost relevance.

Early in the pandemic, therefore, institutes began to develop online portals or hubs that included materials and teaching strategies adapted to the situation: for content delivery, to encourage student participation and to rethink assessment mechanisms.

A positive example is the twelve-module course “Quickstarter Online-Lehre” (Quickstarter Online Teaching, German) by the Hochschulforum Digitalisierung – German Forum for Higher Education in a digital age and the Gesellschaft für Medien in der Wissenschaft (Society for media in science) from Germany. This course aims to support teachers with no or little online experience.

This text has been translated from German.

This might also interest you:

The post Horizon Report 2021: Focus on Hybrid Learning, Microcredentialing and Quality Online Learning first appeared on ZBW MediaTalk.