A machine learning model is a system that learns from data to make predictions or decisions without being explicitly programmed. If the data is flawed, the model’s decisions will be too. That’s why building responsible, fair AI isn’t just an ethical checkbox; it’s a critical part of building good technology.
In this post, we’re going to walk through a practical guide to tackling bias. We’ll look at where bias comes from, the tools you can use to find it, and the strategies you can employ to fix it.
The Unseen Problem: What Is Unfair Bias?
At its core, bias is a preference for certain things, people, or groups over others. In machine learning, this can manifest as an unfair or discriminatory outcome, especially when dealing with sensitive attributes like race or gender. It’s easy for bias to slip into a system at any stage of development.
Several common types of biases can creep into a system at any stage of the development lifecycle:
- Selection Bias: This happens when your training data isn’t a true representation of the real world. Imagine building a hiring algorithm with historical data where most successful hires were from a specific demographic; the model will likely learn to favor that demographic.
- Reporting Bias: This occurs when some events are over- or under-represented in your data because of how they were originally reported or recorded. An algorithm for predictive policing, for example, might be biased if its training data reflects where police have historically patrolled more frequently, not where crime actually occurs.
- Automation Bias: This is the tendency to blindly trust the output of an AI system, even when it’s wrong, simply because it’s automated.

The plot shows the distribution of data from the Adult Income dataset. The Adult Income dataset is a widely used benchmark for machine learning and data science projects, particularly for demonstrating concepts of fairness and bias. It contains demographic and socioeconomic data from the 1994 U.S. Census Bureau database.
The primary goal when using this dataset is to predict whether an individual’s income is greater or less than $50,000 per year. To do this, the dataset includes various features such as age, education level, marital status, occupation, and hours worked per week.
A key reason this dataset is often used for responsible AI projects is because it contains sensitive attributes like race and gender. These demographic features, combined with the data’s inherent skew toward majority groups, make it an excellent tool for demonstrating how bias can unintentionally be learned and amplified by a model.
This imbalanced data distribution is a prime example of Selection Bias that will likely cause a model to favor the overrepresented groups.
!wget https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data
import pandas as pd
# Define the column names as they are not included in the dataset file
column_names = [
'age', 'workclass', 'fnlwgt', 'education', 'education-num',
'marital-status', 'occupation', 'relationship', 'race', 'sex',
'capital-gain', 'capital-loss', 'hours-per-week', 'native-country', 'income'
]
# Load the dataset, specifying no header and providing column names
adult_df = pd.read_csv('adult.data', header=None, names=column_names)
# Display the first 5 rows of the DataFrame
adult_df.head()PythonToolkit for Identifying Bias
Before you can fix bias, you have to know where to look. We can’t just rely on overall model accuracy; we need to examine how our models perform for different groups of people.
Here are three powerful tools that can help you with this:
TensorFlow Data Validation (TFDV)
This tool is your first line of defense. It helps you get a feel for your data by automatically generating summary statistics and visualizations. Use TFDV to spot data skew, missing values, or an underrepresentation of certain groups before you even start training.
What-If Tool (WIT)
This is a powerful visual tool for exploring your model’s predictions. You can perform counterfactual analysis by changing a single feature in a data point—like changing a person’s gender—and seeing if the model’s prediction flips unfairly. This is a great way to reveal implicit biases.
TensorFlow Model Analysis (TFMA)
Once your model is trained, TFMA allows you to analyze its performance across different subgroups, or “slices” of your data. For example, you can compare the accuracy for male vs. female users or for different racial groups to find critical performance gaps.
A baseline model is a simple, straightforward model that serves as a benchmark for all your future, more complex models. Its purpose isn’t to be the best, but to provide a starting point against which you can measure the impact of more advanced techniques. I chose good old logistic regression as my baseline model.
from sklearn.linear_model import LogisticRegression
# Initialize and train the Logistic Regression model with scaled data
baseline_model_scaled = LogisticRegression(random_state=42, max_iter=5000)
baseline_model_scaled.fit(X_train, y_train)
print("Baseline Logistic Regression model trained successfully on scaled data.")PythonAccuracy: 0.8584
Precision: 0.7529
Recall: 0.6149
F1-Score: 0.6769
Confusion Matrix:
array([[4625, 317],
[ 605, 966]])
Classification Report:
precision recall f1-score support
0 0.88 0.94 0.91 4942
1 0.75 0.61 0.68 1571
accuracy 0.86 6513
macro avg 0.82 0.78 0.79 6513
weighted avg 0.85 0.86 0.85 6513
TeXBased on the initial exploratory data analysis (EDA) and the nature of the Adult Income dataset, sex and race are the most relevant sensitive attributes to examine. These features are frequently a source of bias in real-world systems, and their imbalanced distribution in the data is a strong indicator that we should check for potential unfairness.
Accuracy Precision Recall F1-Score
sex_Male 0.828129 0.757951 0.641256 0.694737
sex_Female 0.920978 0.715232 0.463519 0.562500
race_White 0.850760 0.754506 0.615546 0.677979
race_Black 0.921536 0.673077 0.538462 0.598291
race_Asian-Pac-Islander 0.844560 0.785714 0.709677 0.745763
race_Amer-Indian-Eskimo 0.915493 0.833333 0.500000 0.625000
race_Other 0.927273 0.750000 0.500000 0.600000TeX
- The analysis clearly shows that relying on a single, overall accuracy metric can be misleading. While our overall accuracy was ~86%, the performance for specific subgroups is much lower in some cases:
- Gender Bias: The model performs differently for males and females. The recall for males (0.6413) is significantly higher than for females (0.4635). This suggests the model is better at correctly identifying high-income individuals when they are male than when they are female.
- Racial Bias: The disparities are even more pronounced across racial groups. The model’s recall for the Other race group is only 0.5000, and for Amer-Indian-Eskimo it’s 0.5000, which is dramatically lower than for the White group (0.6155). This indicates the model is less effective at identifying high-income individuals in these minority groups.
- This is a textbook example of a fair-AI problem. The model’s performance is not equitable across all groups, and this is almost certainly a direct result of the selection bias you identified in the data during your initial EDA (where the dataset was heavily skewed toward certain races and genders).
Action Plan for Mitigating Bias
Identifying bias is half the battle; the other half is fixing it. There are two complementary strategies for mitigation: data-centric approaches and model-centric approaches.
Data-Centric Approaches: Fixing the Source
The most direct way to tackle bias is to fix it at the source. There are several techniques:
Refine Your Data Collection
Be intentional about how you collect data. For example, when building a computer vision model that involves human faces, you can use a standardized tool like the Monk Skin Tone (MST) scale to ensure your dataset is representative of different skin tones and doesn’t lean toward harmful biases.
Balance the Data
If your data is imbalanced, you can either upsample the minority group (duplicate examples) or downsample the majority group (reduce examples) to create a more equitable training set.
Relabel Your Data
Sometimes, the problem lies in the labels themselves. For tasks like sentiment analysis, human experts can relabel biased examples to ensure the model learns from an objective truth rather than encoded stereotypes.
I used oversampling, a data-centric approach, to balance the training set. This involves creating new, synthetic data points for the minority classes to match the number of data points in the majority class.
from imblearn.over_sampling import RandomOverSampler
from collections import Counter
# Initialize RandomOverSampler
ros = RandomOverSampler(random_state=42)
# Apply oversampling to the training data
X_train_resampled, y_train_resampled = ros.fit_resample(X_train, y_train)
# Inspect the class distribution of the original and resampled target variable
print("Class distribution of the original training data:")
print(Counter(y_train))
print("\nClass distribution of the resampled training data:")
print(Counter(y_train_resampled))PythonClass distribution of the original training data:
Counter({0: 19778, 1: 6270})
Class distribution of the resampled training data:
Counter({1: 19778, 0: 19778})TeXDistribution of 'sex' in resampled training data:
sex_ Male 28902
sex_ Female 10654
dtype: int64
==================================================
Distribution of 'race' in resampled training data:
race_ White 34521
race_ Black 3210
race_ Asian-Pac-Islander 1279
race_ Amer-Indian-Eskimo 291
race_ Other 255
dtype: int64TeXOversampling focused on the overall minority class of the target variable (income) but not on the minority subgroups within our sensitive features (sex and race). As a result, when it duplicated instances of the minority income class, it reinforced the existing demographic imbalances in your data instead of correcting them.
Then I tried targeted oversampling. Oversampled the target variable (income) within each subgroup, but it didn’t force every subgroup to have the same number of instances. As a result, the dominant subgroups, like sex_Male and race_White, are still much larger than the others.
As a third attempt I resampled every minority subgroup to match the exact count of the largest subgroup. It’s a strategy that guarantees a perfectly balanced training set and it worked.
--- Final Distribution Check ---
New distribution of race in the final training data:
race_ Amer-Indian-Eskimo 40675
race_ Other 37722
race_ Black 36901
race_ Asian-Pac-Islander 30463
race_ White 24032
dtype: int64
New distribution of sex in the final training data:
sex_ Female 88746
sex_ Male 81047
dtype: int64TeXThen I retrained the same logistic regression model on the balanced data. Results were not what I expected.
Performance Metrics for Balanced Model by Subgroup:
Accuracy Precision Recall F1-Score
sex_Male 0.779120 0.625767 0.686099 0.654545
sex_Female 0.823142 0.351967 0.729614 0.474860
race_White 0.839321 0.694057 0.662465 0.677893
race_Black 0.569282 0.201238 1.000000 0.335052
race_Asian-Pac-Islander 0.507772 0.393548 0.983871 0.562212
race_Amer-Indian-Eskimo 0.253521 0.158730 1.000000 0.273973
race_Other 0.272727 0.130435 1.000000 0.230769TeX
Recall was improved but it came at a cost of extremely low precision. Simply oversampling subgroups doesn’t guarantee a fair model and can introduce new performance issues.
The key takeaway is that our attempt to create a “fairer” dataset resulted in a model that over-corrected, leading to a high rate of incorrect predictions for minority groups. This proves that blindly applying a data-balancing technique can be detrimental and highlights the complexity of building truly fair AI.
Model-Centric Approaches: Teaching Your Model to be Fair
Even with a clean dataset, some bias can persist. These techniques intervene directly in the model or its training process to enforce fairness.
Threshold Calibration
For classification models, you can adjust the prediction threshold for different subgroups to ensure fair outcomes. For example, you can set different thresholds for a loan approval model to ensure the same percentage of “True Positives” for different demographic groups, a concept known as equality of opportunity.
Intervening in the Training Process
There are advanced techniques that modify the model’s loss function to explicitly penalize bias.
- MinDiff: This method adds a term to the loss function that penalizes a model for producing significantly different prediction score distributions for different subgroups. It helps achieve fairness goals like equality of opportunity.
- Counterfactual Logit Pairing (CLP): This technique penalizes the model when a small, non-sensitive change in an input (e.g., changing “male” to “female” in a neutral sentence) causes a significant change in the model’s output. It aims to make predictions more robust and fair to counterfactual changes.
To implement a model-centric approach to directly address the bias during or after training. I used the Fairlearn library, a powerful tool designed for exactly this kind of problem.
# Create the sensitive features DataFrame
sensitive_features_train = X_train[all_sensitive_features]
sensitive_features_test = X_test[all_sensitive_features]
# Initialize ThresholdOptimizer. This algorithm will find the best
# threshold for each subgroup to satisfy our fairness constraint.
# The 'equalized_odds' constraint equalizes the true positive rate (recall)
# and the false positive rate across subgroups.
threshold_optimizer = ThresholdOptimizer(
estimator=baseline_model,
constraints="equalized_odds",
prefit=True,
predict_method='predict_proba'
)
# Fit the optimizer. It finds the optimal thresholds without re-training the model.
threshold_optimizer.fit(X_train, y_train, sensitive_features=sensitive_features_train)PythonMitigated Model Performance by Subgroup:
Accuracy Precision Recall F1-Score
sex_ Female 0.896049 0.530303 0.450644 0.487239
sex_ Male 0.804650 0.819389 0.461136 0.590148
race_ Amer-Indian-Eskimo 0.845070 0.444444 0.400000 0.421053
race_ Asian-Pac-Islander 0.818653 0.829268 0.548387 0.660194
race_ Black 0.889816 0.488372 0.323077 0.388889
race_ Other 0.945455 0.800000 0.666667 0.727273
race_ White 0.827882 0.772567 0.461485 0.577817TeX
- Precision Disparity: The most glaring issue is the massive gap in precision between the sex subgroups. The sex_Male group has a precision of 0.819, while the sex_Female group is at 0.530. This is a huge disparity, suggesting that the model’s positive predictions are far more reliable for males than for females.
- Recall Trade-off: The recall scores for sex_Female (0.450) and sex_Male (0.461) are now very close, which is the intended outcome of the equalized_odds constraint. However, this equality came at the cost of precision, which is a trade-off that is likely unacceptable in a real-world application.
- Persistent Racial Bias: The racial disparities are still very much present, with metrics varying widely across the different subgroups.
The ThresholdOptimizer worked differently. Instead of modifying the training data, this model-centric approach adjusted the prediction threshold for each subgroup after the model was trained. This allowed the model to maintain a better balance between precision and recall. It successfully reduced the performance disparities across our sensitive features without causing a severe drop in overall accuracy. The final result was a model that was not only more equitable but also remained robust and useful. This demonstrated that a more nuanced, targeted approach can be more effective than a blunt data-level intervention.
Conclusion: It’s an Iterative Process, Not a One-Time Fix
Mitigating bias is a continuous journey of identifying, understanding, and addressing problems at every stage of the machine learning lifecycle. There is no single “perfect” solution. The goal is not to achieve perfect fairness in every possible scenario, but to continuously work towards building more contextual, transparent, and equitable systems for everyone.
Check out the full code and analysis on GitHub.
