Dealing With AI Biases Part 4: Fixing the Root Cause of AI Biases

This article was originally published on quasi.pros.com.

Missed the other parts of the series? Catch up on Part 1 here and Part 2 here and Part 3 here.

Today, we are going to discuss the final battle in dealing with AI bias. That is, how can we fix the root cause of AI biases? Since today’s exposition builds on our previous discussions on this very topic, it’s important to get familiar with the 3 installments we’ve already published on AI bias.

As previously discussed, the most common sources of AI bias are those inherited from training data. And these inherited biases are often introduced as we preprocess the data before training. On the contrary, emergent bias can be created during training even when the training data is unbiased. So these biases are often created during training as data scientists employ subjective model selection (e.g. regularization, hyperparameter tuning, etc.) to finalize their model choice.

Contrary to conventional belief, data science work requires much discretion. Hence, many standard procedures in data science are potential sources where bias can be introduced if applied haphazardly. But they also serve as natural points for us to inject counteracting biases to neutralize the biases we want to eliminate. With meticulous data scientists, inadvertently injected bias (whether inherited or emergent) should be minimal. What’s left are the inherited biases that already exist in the training data. 

As data scientists, we often treat the data given to us as the raw input. We seldom question where the data came from and how it was collected. Where did these pre-existing inherited biases come from, and can they be eliminated? In most practical situations, there are two root causes where inherited biases originate.

Fixing the Root Cause of AI Biases

Data Capture Biases

Since all training data must be captured and collected at some point, pre-existing inherited biases may be a result of biased data collection processes. All data collection schemes are designed and built as a result of a series of design choices. And it’s a well-known fact in Choice Architecture that there are no neutral designs. Hence, every data collection process is inherently a little biased.

For example, it’s common knowledge that survey data always exhibit a certain self-selection bias. Such data will over-represent the already inclined and diligent consumers and under-represent those that are either lazy or paranoid.

We may still be biased even when the data is collected automatically through passive behavior monitoring via sensors, devices, or WiFi networks. We may inadvertently be selecting the population that is near the sensors we installed. And we may systematically under-represent those who don’t have access to reliable WiFi or a mobile device.

If we don’t control the data collection process, which is typically the case for most data science work, it’s still important to understand the data collection processes to understand the biases inherent to the data collection process. We need to acknowledge these biases as we’ve discussed in my previous article. In practice, this means we must continuously monitor the level of bias in the captured raw data to ensure changes in bias do not adversely impact the trained model.

However, in some rare situations where we can influence the data collection process, we should aim to capture more data and more complete metadata that provides the context to interpret those data. In the age of big data and ubiquitous sensors, the default strategy should be to collect everything possible.

“Debating over what to collect often ends up being more costly due to lost time and development velocity.”

— Andreas Weigend, former Chief Scientist, Amazon

Having detailed demographic data can help reveal the biases inherent to the data collection process so we can better understand them. It will facilitate the effective monitoring of how data-collection biases change over time. Most importantly, it can also help us redesign the data-capturing process to reduce bias during data collection. As with any design, redesigning the data-capturing method is an iterative exercise. With proper bias monitoring and rapid iteration, data-collection biases can be progressively minimized over time.

Fixing the root cause of AI bias by modifying data capture and behaviors

Unconscious Human Biases

If the data-capture mechanism has been iteratively perfected, the only place where inherited biases can be created is during data generation. But most of the data used in training machine-learning (ML) models are generated by humans. These data are merely the result of past human decisions and behaviors. Therefore, the inherent biases in the training data originate from us, humans. The bias in data is simply a reflection of our own bias.

In practice, however, AI bias often appears to be more extreme and therefore more noticeable than our own bias. This is because the slightest bias in our decisions can often be magnified dramatically through the ML training process. Our minute biases are accentuated when machines learn from large amounts of data very rapidly. This amplification is a result of both the high speed of learning and the consolidation of lots of data. Despite this, the AI bias problem is fundamentally a human bias problem.

Solving for AI Bias

Now that we have a good grasp on the root cause of the AI bias problem, how can we solve this problem?

As with most problems, recognizing the problem is half the battle. The other half of the battle involves human behavior change. Fortunately, this is a problem we know how to solve. And the solution involves 3 steps:

Provide analytics to drive awareness of existing human biases
Set small but incremental goals toward bias mitigation
Use gamification with non-monetary rewards to motivate less biased decisions

Conclusion

Although there are emergent AI biases that are created through the training process, most AI biases are inherited from its training data. So where did these inherent biases in training data originate?

The data is captured with biased data collection mechanisms
The data is generated from biased human decisions and behaviors in the past

The data capture biases are often dealt with through iterative refinement of the data collection processes while we carefully monitor the biases. Interestingly, unconscious human biases can also be dealt with pretty much the same way.

By monitoring our own biases and iteratively influencing the biased human behaviors that generated the training data, we, humans, can also learn to make decisions that are less biased, eventually. Thus, learning how to fix AI biases not only improves our AI tools, it can also improve ourselves and make us better humans.

Michael Wu

Dr. Michael Wu is one of the world’s premier authorities on artificial intelligence (AI), machine learning (ML), data science, and behavioral economics. He’s the Chief AI Strategist at PROS (NYSE: PRO), an AI-powered SaaS provider that helps companies monetize more efficiently in the digital economy. He’s been appointed as a Senior Research Fellow at the Ecole des Ponts Business School for his work in Data Science. Prior to PROS, Michael was the Chief Scientist at Lithium for a decade, where he focuses on developing predictive and prescriptive algorithms to extract insights from social media big data. His research spans many areas, including customer experience, CRM, online influence, gamification, digital transformation, AI, etc. His R&D won him the recognition as an Influential Leader by CRM Magazine along with Mark Zuckerberg, Marc Benioff and other industry giants. Michael has served as a DOE fellow at the Los Alamos National Lab conducting research in face recognition and was awarded 4 years of full fellowship under the Computational Science Graduate Fellowship. Prior to industry, Michael received his triple major undergraduate degree in Applied Math, Physics, and Molecular & Cell Biology; and his Ph.D. from UC Berkeley’s Biophysics program, where he uses machine learning to model visual processing within the human brain. Michael believes in knowledge dissemination, and speaks internationally at universities, conferences, and enterprises. His insights have inspired many global enterprises and are made accessible through “The Science of Social,” and “The Science of Social 2”—two easy-reading e-books.

Products

FOR ENTERPRISE

Add-Ons

For Airlines & Travel

Data Capture Biases

Unconscious Human Biases

Solving for AI Bias

Conclusion

Other content in this Stream

Margin Under Pressure: How Price Optimization and Quoting Are Reshaping Transportation & Logistics

AI Search Engines Are Here—Is Your Airline Ready?

Building Materials Industry at a Crossroads: Achieving Consistent Margins with Smart Pricing

Collaborative Quoting: Minimizing Sales Friction for Better Results

Rewiring MedTech for the Value Era: Why Digital Pricing Is the Next Frontier

PROS Platform Travel Summer Release 2025