Beware data bias in AI models
Insurers should be aware of the risks of data bias associated with artificial intelligence (AI) models. Atreyee Bhattacharyya looks at some of these risks, particularly the ethical considerations and how an actuary can address these.
The use of advanced analytics techniques and machine learning models in insurance has increased significantly over the last few years. It’s an exciting time for actuaries and an opportunity to innovate. We have seen leading insurers in this area driving better insights and increasing predictive powers, ultimately leading to better performance.
However, with every new technology comes new risks. With AI, such risks could be material in terms of regulatory implications, litigation, public perception, and reputation.
Why data bias in AI models matters
The ethical risks associated with data bias are not particular to just AI models, but data bias is more prevalent in AI models because:
-
AI models make predictions based on patterns in data without assuming any particular form of statistical distribution. Since these models learn from historical data, any biases present in the training data can be perpetuated by the AI systems. This can lead to biased outcomes and unfair treatment for certain groups or individuals.
For instance, a tech giant had to abandon the trial of a recruitment AI system when it was found to discriminate against women for technical roles. This turned out to be the result of training the model with a dataset spanning a number of years and since, historically, the majority of these roles were held by males, the algorithm undervalued applications from women.
Furthermore, AI models can inadvertently reinforce existing biases present in society or in existing practices. For example, if historical data reflects biased decisions made by humans, the AI model may learn and perpetuate those biases. This creates a feedback loop where biased AI outcomes further reinforce the existing biases. Non-AI models may be less susceptible to this feedback loop as they typically don't have the ability to learn and adapt over time.
- AI models can process vast amounts of data at a fast rate, enabling them to make decisions and predictions on a large scale and in real-time. This amplifies the potential impact of biases present in the data if human oversight is missing or reduced.
- AI models can be highly complex and opaque, making it challenging to understand how they arrive at decisions. This lack of transparency can make it difficult to detect and address biases within the models. In contrast, non-AI models, such as traditional rule-based systems or models based on statistical distributions, are often more transparent, allowing humans to directly inspect and understand the decision-making process.
Given these factors, data bias is a more critical concern in AI and addressing and mitigating data bias is crucial to ensure fair and ethical outcomes in AI models.
What are the various kinds of data biases?
Selection bias arises when certain samples are systematically overrepresented or underrepresented in the training data. This can occur if data collection processes inadvertently favour certain groups or exclude others. As a result, the AI model may be more accurate or effective for the overrepresented groups. Also, if the training data does not adequately capture the diversity of the target population, the AI model may not generalise well and could make inaccurate or unfair predictions. This might happen if, for example, an Asian health insurer bases its pricing on an AI model which has been trained predominantly on health metrics data from Western populations; the result will most likely not be accurate and fair.


