Human bias is an indisputable challenge when we aim to extract business value from data. With artificial intelligence (AI) quickly rising, should we expect data bias to be a problem of the past?

The insurance industry has a natural interest in leveraging data analytics. Business models increasingly depend on the insight provided by data to better understand customer behavior, fraud patterns, policy risk, claim surety, and more. As a result, data is closely tied to an insurer’s ability to operate and effectively compete in a complex marketplace today.

On the technology side, we are familiar with buzzwords such as “big data,” which imply that there’s no shortage of data in terms of variety, volume, and velocity of availability. Plain availability of data, however, is not a guarantee of quality of data or a conclusion that the insight would be correct.

 

Where bias comes from and why it’s a problem in the context of data

Bias toward or against conditions is deeply entrenched in the human mind. Merriam-Webster defines bias as “a personal and sometimes unreasoned judgment”. If we were to go a bit further, Psychology Today tells us that bias is “a natural inclination for or against an idea, object, group, or individual. It is often learned and is highly dependent on variables like a person’s socioeconomic status, race, ethnicity, educational background, etc.”

When we look at bias specifically in connection with data, we almost always source, observe, and interpret data that is incomplete and inconsistent. Often, our quest to chase enlightening insight through data is flawed from the outset. Our bias impacts the data we choose to look at and continues through our intuition of how to conduct analysis and how we arrive at conclusions. The same data set could provide different conclusions when considered by different humans, based on various types of bias, which include, for example, the following:

  • Cognitive bias, which is based on existing scenarios and perceptions and favors the stitching-together of conclusions as opposed to the construction of conclusions
  • Accuracy bias, which focuses on the accuracy of data, but not on the business impact of data
  • Representation bias, which refers to the over- or underrepresentation of data within a data set, including outlier bias, which considers the removal of specific extreme values in data
  • Analysis bias, which is impacted by lack of data context and lack of data dependencies
  • Confirmation bias, which occurs when the investigation of data is focused on proving or disproving a present hypothesis

Any of these forms of bias introduces a chance of error in the application of data. The consequential outcomes are flawed deductions or, at the very least, question whether a conclusion in data analysis can be trusted — or not. It’s a fair question to ask: If there was AI that had access to all data and wasn’t biased in the way it looks for dependencies and outcomes, would we be able to answer questions correctly — and without bias?

If that were the case, AI could truly transform the way we look at data and how it impacts our lives.

 

Group fairness versus individual fairness

To arrive at actionable conclusions, data analytics experts typically look for data trends by applying attributes — descriptors of data values — to data points and combining those that may be relevant and that impact each other. Insurers may have access to hundreds of attributes that define their customers as groups and individuals and help calculate pricing of products and assess various categories of risk, such as risk of fraud, risk of claim, or risk of cancellation.

AI is not capable yet of replicating a human level of intuition as it lacks, among other things, an awareness of temporal behaviors.

The inherent challenge with models that favor trends across attributes that may describe more than one person is that data trends that are correct for a certain group — for example, people living in a specific ZIP code — may be fair to the group as a whole but may not be so to an individual. For example, not everyone in a specific ZIP code that has a high risk to insure may actually carry a high risk to insure.

While there are some cases to apply individual fairness in insurance, such as insuring automobile drivers through their individual driving behavior, the actual individual fairness may depend on additional attributes that vary from case to case and specific scenarios — and remove bias toward individuals that exists in group data.

Determining the attributes of individual fairness today requires human intuition and good knowledge of all influencing attributes, which, of course, may be subject to bias. AI is not capable yet of replicating a human level of intuition as it lacks, among other things, an awareness of temporal behaviors. In addition to the awareness, machines lack the capability to interpret behaviors and the human emotions that drive changes in behavior.

Conclusively, without such awareness and capability, including the ability to continuously discover, machines cannot universally achieve individual fairness without human intervention. This circumstance does not even consider the fact that not all data is fair game in AI anyway. Protected attributes such as age, color, marital status, national origin, race, religion, recipient status of public assistance, and sex could be prohibited from being explicitly used in machine learning models. The exclusion of such attributes, however, does not eliminate implicit bias that is liable to creep into the model due to correlating attributes. For example, a comparison of driving patterns between two cities whose demographics are uniquely different due to race would inevitably end up creating implicit bias toward race.

 

What is realistic in the near future and what it would take for AI to remove bias entirely

Assuming that AI may have access to increasingly large data sets and growing computing power, humans may be able to train machines to specifically look for bias in data through feedback rules and remove it through continuous testing of massive amounts of data. However, that training will, at least initially, rest on the human ability to understand, identify, and describe bias — and transfer that knowledge to AI. The other approach is to introduce processes to check for bias as part of pre-, in-, or post-processing of the data. This process by itself carries the chance of bias.

In a best-case scenario, AI may help reduce bias in our data but is unlikely to resolve it in its entirety anytime soon due to its lack of human intuition and awareness of context. At least today, AI relies very much on our direction and instructions in data analytics.

As long as AI lacks the ability to feel happiness, sadness, or pain as humans do, it is unlikely to be able to comprehend bias entirely and remove it as a result. However, all that said, the spotlight on bias in data is ever increasing, and while the problem now may not be deemed a ridiculously huge one, we can expect to see more seminal research in this area to continue as data engineering and analysis pervade every aspect of our lives. I am optimistic and expect to see better methodologies but none that claim to rid data of complete bias soon.

Anoop Gopalakrishnan is Vice President of Engineering, Guidewire Cloud Infrastructure at Guidewire. Connect with him on LinkedIn.