Volg ICTI

Irresponsible and dangerous: AI specific risks

| Sieuwert van Otterloo | Project Management

Artificial Intelligence (AI) is growing up from research into applied technology. The use of AI however comes with many additional risks. These risks go beyond the normal project management risks and many organisations do not have enough knowledge to fully understand these risks. This leads to serious issues for their companies, their customers and society in general. In this article we have collected an overview of AI-specific risks, along with suggested actions.

How to use this list

We recommend any AI project to go through the list during design phase, and determine whether the risk could occur in that specific project. If the risks applies, plan steps to mitigate the risk. The mitigation will probably result in multiple plans:

  • A data collection plan to make sure the right data is collected in the right way, for both training and testing
  • A test plan where the AI application is tested for accuracy and bias.
  • A communication plan to make sure that all stakeholders and users are informed about what the system does, how it works and what mistakes it could make.

Depending on the phase of the project, you can make these plans part of the normal, overall software project management. The list then becomes detailed input to normal software project management. We have a series of blogs as part of our software project management course, and this blog post would be an addition to the software project risk management, and management of software project organisational aspects. It is also possible to use this list during an AI impact assessment, a new analysis tool designed specifically for AI projects. The AI Impact assessment can be conducted while developing the AI application, or for existing applications.

Risk involving AI accuracy

The following is a risk overview involving the quality of AI outcomes.

Cold start problem: many AI applications collect data while being used, e.g. by using past user actions for predictions. Such systems will perform poorly at start when not much data is available. 

Poor results due to data not being classified correctly. Many AI algorithms need more than raw data to be trained: the data must be annotated or classified with the correct answers. Annotating or classifying data can be a labor-intensive task. For some tasks, e.g. classifying diagnostic medial images with the correct diagnosis, one may have to hire expensive and scarce professionals.

Gender bias in AI output. It is in most countries forbidden to discriminate on gender in decision making, It causes an implicit requirement for IT systems that they should not have any gender bias: average acceptance rates for equally qualified male and female subjects must be the same. Training data provided to AI systems can however be biased. If this bias is not corrected in the data, the resulting AI system will replicate this bias. To mitigate this problem, one must explicitly test for gender bias and possibly other relevant biases, and either define different thresholds based on the test results, or correct the training data. See this article on algorithmic bias for more information. In October 2018, Amazon ditched some AI recruitment software because it was biased against women

Replication of wrong, funny or offensive user inputs. Many AI systems learn from users, e,.g. by memorizing and replicating previous inputs. Some users misbehave and will provide the system with intentionally wrong, funny or even offensive input. The system will replicate these inputs. This problem has occurred in March 2016 in Microsoft chatbot ‘Tay’. The chatbot turned racist and was shut down in 24 hours. Peter Lee, Microsoft VC, issued the following statement about the Tay fiasco: “AI systems feed off of both positive and negative interactions with people. In that sense, the challenges are just as much social as they are technical. We will do everything possible to limit technical exploits but also know we cannot fully predict all possible human interactive misuses without learning from mistakes. To do AI right, one needs to iterate with many people and often in public forums. We must enter each one with great caution and ultimately learn and improve, step by step, and to do this without offending people in the process. “

Poor results due to unrealistic training data. Many AI systems are trained using training data (numbers, text, sound, images, video) before being deployed in the real world. Good training data can be hard to obtain and therefore generic data sets such as existing image collections are used for training. This data however is often very different from real world data. As a result, the AI system that worked well on training data does not work as expected in the real world. Examples of mismatches are sentence structure (existing data sets contain formal language with complete sentences, while the application must handle informal chats), sound quality (recorded training data has a much lower noise level than real world recordings), image quality and tone (existing data sets can be ‘plain’ while some applications only handle commercial, glossy-style images or vice versa). Note that lack of diversity is also an example of unrealistic training data. 

Inability to handle diversity. In many projects, the available test subjects and test data are collected from a not-so-diverse environment (E.g. mostly young, male, white, higher-educated, non-disabled students). As a result the AI systems is not well trained to handle diverse cases. This leads to poor results, but also to discrimination of subjects. Scientific research has shown that face recognition software does not handle diversity well: “both commercial and the nontrainable algorithms consistently have lower matching accuracies on the same cohorts (females, Blacks, and age group 18–30) than the remaining cohorts within their demographic.”. Unfortunately, facial recognition software has already been used in 2015 in the US justice system. According to at least one expert, ‘using commercial facial recognition in law enforcement is irresponsible and dangerous’.

Other AI risks

Illegal use of data for training purposes: Under the privacy law GDPR, personal data can only be collected for specific purposes and cannot be used for other purposes. Re-using personal data for AI training is thus a potential GDPR violation, that could lead to fines and reputational risks. An example of this risk occurring was the criticism IBM received after use of public images for facial recognition in March 2019. Ironically, the images were used for this research aimed at better handling of Diversity in Faces.

Leakage of confidential training data. Data provided to AI systems for training, such as chat conversations or images, may contain sensitive data such as names, addresses and passwords. It is possible that the AI systems learns to reproduce this data, and thus leaks the sensitive data to new users. To mitigate this problem, one must either sanitize the training data, or apply additional checks to the AI output. This problem was encountered by one of the data scientist of ICT Institute in 2018 in automated training of chatbots, and the solutions applied in that project was to sanitize all training data using manual effort.

No motivation / explanation for decisions. In certain use cases, people want to know or even have a right to know the motivation behind decisions. This is for instance the case when people are denied certain rights or services. Many AI algorithms are ‘black boxes’: the produce outcomes without any explanation on how the outcomes are reached. 

Randomized outcomes. Some AI algorithms use randomization, and will produce different outcomes when applied repeatedly to the same problem. In some settings this poses problems, either from a fairness perspective, or because it invites people to keep trying. 

AI-generated ‘Deep Fakes’ mistaken for real data. Some successful AI algorithms produce results that are so realistic that they can be used ‘for real’. Malicious users can conduct fraud by submitting AI generated output in order to mislead people. This risks can occur in image, sound and video generation. If this risk exist, the spread or use of the algorithms must be restricted. The image shown on the right was generated by GAN’s (generative adversarial networks) via thispersondoesnotexist.com. As an example of an actual incident, in May 2018 a Belgian political group used a fake review featuring the likeness of Donald Trump to influence opinion on climate change. 

Liability for damages due to mistakes. If one company develops vision software, a second company uses the software in an autonomous vehicle, and someone gets injured, who is liable for damages? Would this change depending on whether the autonomous vehicles cause more or fewer accidents than human drivers? A first deadly accident involving a Tesla car in autonomous mode occurred in California in March 2018. Researches such as Dr. Iyad Rahwan are investigating the social dilemma of autonomous vehicles, also published in the NY Times.  

Getting support

This list and the next part are collected as part of our larger research effort into AI ethics and risks. We decided to publish this shortlist since we noticed that many data scientists are struggling with risk management in AI projects. This shortlist can be used and republished freely as long as ICT Institute is mentioned as a source.

As part of our ongoing research, we are looking for companies that are using AI and that are in need of an AI impact assessment. Please contact us iff you would like a better understanding of your AI application.

Illustration: Su San Lee via Unsplash

Sieuwert van Otterloo
Author: Sieuwert van Otterloo
Dr. Sieuwert van Otterloo is a court-certified IT expert with interests in agile, security, software research and IT-contracts.