Insurance regulations often prohibit discrimination based on sensitive factors such as age and gender. In this vignette, we will explore a hypothetical law that prevents insurance companies from using age and gender as criteria for discrimination. To simulate this scenario, we will model sinistrality occurrences in Python using a Random Forest algorithm and apply optimal transport techniques with the equipy package. This approach ensures fairness within the freMPL1sub dataset from Charpentier (2014), which contains detailed information on French motor insurance contracts and claims.
Required Packages
# R package required to run Python code in a Quarto documentrequired_R_libraries <-c("reticulate")invisible(lapply(required_R_libraries, library, character.only =TRUE))
The data used in this vignette comes from a French motor third-party liability insurance portfolio. The dataset, freMPL1sub, is part of the CASdatasets package in RStudio and contains details about contracts and clients obtained from a French insurance company, specifically related to a motor insurance portfolio.
This dataset is a subset of several datasets, such as freMTPL1, freMTPL2, and others, all of which are available in the casdatasets package. The subset has been filtered to include only instances where the exposure is greater than 0.9, allowing for the effective application of classification trees for more precise analysis.
Dictionaries
The list of the 18 variables from the freMPL1sub dataset is reported in Table 1.
Table 1: Content of the freMPL1sub dataset
Attribute
Type
Description
LicAge
Numeric
The driving licence age, in months
VehAge
Numeric
Age of the vehicle in years
sensitive
Factor
Gender of the driver
MariStat
Factor
Marital status of the driver
SocioCateg
Factor
Socio-economic category of the driver
VehUsage
Factor
Usage of the vehicle
DrivAge
Numeric
Age of the driver in years
HasKmLimit
boolean
Indicator if there’s a mileage limit
BonusMalus
Numeric
Bonus-malus coefficient
VehBody
Factor
Body type of the vehicle
VehPrice
Factor
Price category of the vehicle
VehEngine
Factor
Type of engine
VehEnergy
Factor
Type of fuel used (diesel or regular)
VehMaxSpeed
Factor
Maximum speed category of the vehicle
VehClass
Factor
Vehicle class
y
boolean
Response variable (if there is a claim)
RiskVar
Numeric
Risk variable
Garage
Factor
Type of garage used
Importation
Show the code
library(CASdatasets)
Le chargement a nécessité le package : xts
Le chargement a nécessité le package : zoo
Attachement du package : 'zoo'
Les objets suivants sont masqués depuis 'package:base':
as.Date, as.Date.numeric
Le chargement a nécessité le package : survival
Show the code
data("freMPL1sub")
Code for importing our datasets on python environment
In the context of insurance, understanding the balance between “fair” and “unfair” discrimination is critical. One notable example is age-based discrimination, often seen as acceptable in insurance pricing. Younger drivers—despite being statistically more accident-prone—are typically charged lower premiums than their actual risk would justify. The extra costs are, instead, borne by older, more experienced drivers, who end up paying more than what their individual risk profile alone would suggest. This practice, while technically discriminatory, is accepted because it’s part of a generational balancing act: over time, younger drivers eventually mature and will shoulder the cost for future younger generations.
This is known as acceptable discrimination in insurance, where premiums are adjusted based on factors like age and driving experience. However, other forms of discrimination, particularly those based on sensitive variables such as gender or ethnicity, are prohibited under laws like the Anti-Discrimination Law of 2008 in France. Sensitive variables are protected by law, meaning they cannot be used as direct factors in pricing models, as this could reinforce social inequalities.
The challenge arises when proxy variables—those that indirectly encode similar information—can still introduce bias. For example, while gender might be excluded, variables like vehicle type or occupation can act as proxies, unintentionally reintroducing discriminatory patterns. Therefore, insurers face the difficult task of balancing fairness and ensuring the economic viability of their models.
The dilemma is clear when considering male and female drivers. If premiums for male drivers, who statistically have higher accident rates, are set too low, insurers may face financial loss. Conversely, if premiums are set too high for female drivers, who generally pose lower risk, it could result in unfair pricing and limit their access to affordable insurance. This highlights the importance of maintaining acceptable discrimination based on actual risk while avoiding bias that would be prohibited by law.
To address these challenges, advanced mathematical tools like Optimal Transport and the Wasserstein barycenter can help. Rather than aligning premiums solely with the lower-risk group, such as older drivers, or the higher-risk group, such as younger drivers, which can skew fairness and financial stability, the Wasserstein barycenter offers a middle ground. It calculates a balanced distribution of risk between these groups, minimizing disparity without sacrificing economic viability. Additionally, the barycenter conserves the mean of the premiums after transformation, ensuring that the overall premium level remains financially viable for the insurer.
Mathematically, given distributions μ1, μ2, …, μn corresponding to different demographic groups, the Wasserstein barycenter μ* is defined as:
where W2 denotes the Wasserstein distance of order 2, and λi are the weights associated with each distribution μi.
A key component in ensuring fairness is strong demographic parity. This principle requires that the probability distributions of model outcomes be the same across different protected groups. Mathematically, it demands that the distribution of scores for one group, say m(X,S=A), must be equivalent to the distribution for another group, m(X,S=B):
m(X,S=A) ∼ m(X,S=B)
In this context, m(X,S=A) represents the predicted scores for individuals in group A (e.g., male), while m(X,S=B) represents the predicted scores for individuals in group B (e.g., female). The symbol ∼ indicates that the distributions of these predicted scores should be statistically similar or identical.
This ensures that the model’s outcomes do not systematically favor one group over another. For example, if group A consistently receives lower premiums than group B for similar risk profiles, the model would violate strong demographic parity. By requiring that these score distributions be the same, this method helps to eliminate any bias that might arise from group membership alone, ensuring that decisions are made based purely on risk-related features.
Tools like the equipy package provide practical implementations of these concepts. By applying Wasserstein distances and adjusting model outputs, equipy ensures that sensitive variables like age and gender do not lead to biased pricing models. For instance, if a Random Forest model is used to predict claim occurrences, equipy can modify the model in a post-processing step to enforce fairness while maintaining predictive accuracy.
Balancing fairness with economic sustainability is crucial, particularly when modeling high-risk groups like young drivers. Historical practices might justify lower premiums for them, but using Optimal Transport and the Wasserstein barycenter techniques ensures that premiums are equitable across demographics without compromising the financial health of the insurer. In this vignette, we will explore a theoretical case where fairness is enforced in pricing across both young and old drivers, as well as between male and female drivers. This demonstrates how fairness can be introduced in insurance pricing, ensuring compliance with ethical and legal standards while maintaining economic viability.
Beyond legal compliance, fairness methods offer broader benefits. Ensuring transparent and equitable pricing enhances customer trust and aligns with ethical AI standards, as explored in sources like Sauce, Chancel, and Ly (2023), which examines strategies for addressing proxy discrimination in AI-driven models. By applying these fairness principles, insurers can maintain compliance while fostering fairness in their decision-making processes.
To summarize, Wasserstein barycenters and Optimal Transport provide innovative solutions to fairness challenges in insurance. They ensure that premiums reflect risk fairly while adhering to ethical and legal standards. As discussed in the Visualizations section, methods like Fairness by Unawareness and Fairness through Awareness will be explored to address fairness in predictive modeling.
Methods
Random Forest Model
Random Forest Overview
Random Forest, as described in Breiman (2001), is a versatile machine learning technique that belongs to a class of ensemble learning methods. In this method, multiple decision trees are constructed during training, and their predictions are combined to produce a more accurate and stable model. Specifically, predictions are averaged for regression tasks and voted on for classification tasks.
The key idea behind Random Forest is to build a “forest” of decision trees, where each tree is trained on a random subset of the data and a random subset of the features. This randomness makes the model more robust and less prone to overfitting, which is a common issue with individual decision trees.
The process of building a Random Forest can be summarized in the following steps:
Bootstrap Sampling: Multiple subsets of the original dataset are created using bootstrapping, which involves randomly selecting samples with replacement.
Random Feature Selection: At each node of the tree, a random subset of features is selected, and the best feature from this subset is used to split the data.
Tree Construction: Decision trees are constructed for each bootstrap sample, and these trees grow without pruning, resulting in very deep and complex structures.
Aggregation: For regression tasks, predictions from all trees are averaged. For classification tasks, the most frequent prediction (mode) is selected.
The Random Forest model can be mathematically represented as:
where ŷi is the predicted value for the ith observation, T is the total number of trees, and ft(xi) represents the prediction of the tth tree.
One of the key advantages of Random Forest is its ability to handle large datasets with high dimensionality and missing data. The random selection of features at each split ensures that the model doesn’t rely too heavily on any single feature, reducing the risk of overfitting. Additionally, Random Forest provides a measure of feature importance, ranking the contribution of each feature to the model’s predictive power. This is particularly useful for understanding which features are most influential in making predictions.
In conclusion, Random Forest is widely appreciated for its high accuracy, robustness, and ease of use. It is highly effective for both regression and classification tasks and is especially valuable in situations with a large number of features or complex relationships between variables. For additional information, see Breiman (2001).
EquiPy
equipy is a Python package specifically designed to implement sequential fairness in machine learning models, particularly when managing multiple sensitive attributes. This advanced post-processing method leverages multi-marginal Wasserstein barycenters to achieve fairness across various sensitive features. By extending the concept of Strong Demographic Parity to scenarios involving multiple sensitive characteristics, equipy allows for a nuanced approach to balancing fairness and model performance.
Key Functionalities:
Fairness Module: The package’s core functionality centers around the computation of fairness metrics using Wasserstein barycenters, particularly through the _wasserstein.py module. This method ensures equitable treatment across different sensitive attributes, allowing models to achieve approximate fairness in complex, multi-attribute settings.
Graphing Utilities: To facilitate the analysis and interpretation of fairness outcomes, equipy offers a suite of visual tools. These include _arrow_plot.py for directional fairness visualizations, _density_plot.py for examining the distributional impact of fairness adjustments, and _waterfall_plot.py for understanding the cumulative effects of fairness interventions.
Metrics Module: The package provides comprehensive evaluation tools, including _fairness_metrics.py for assessing fairness across sensitive attributes. These metrics enable a detailed examination of how fairness measures could influence predictive accuracy and other performance indicators.
Developed in 2023 by François Hu, Philipp Ratz, Suzie Grondin, Agathe Fernandes Machado, and Arthur Charpentier, equipy is grounded in cutting-edge research, as presented in the paper “Charpentier, Hu, and Ratz (2023): A Sequentially Fair Mechanism for Multiple Sensitive Attributes.” This foundational research underpins the package’s approach to sequential fairness, making it a robust and theoretically sound tool for practitioners.
For more detailed information about the package, visit the equiPy website.
Estimation of the Random Forest
Data preprocessing
To prepare the data for modeling with RandomForest, we start by encoding the categorical variables. This is necessary because RandomForest requires numerical input. We use one-hot encoding to convert the categorical variables into binary columns. Additionally, we drop the first category of each encoded variable to avoid multicollinearity.
Next, we split the data into training, calibration, and test sets. The training set will be used to fit the model, the calibration set will be used for enforcing fairness with equipy, and the test set will be used to evaluate the model’s performance and fairness.
Splitting Data into Training, Calibration, and Test Sets
# Splitting into two datasetsX_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.4, random_state =42)# Split the temporary set into calibration and test setsX_calib, X_test, y_calib, y_test = train_test_split(X_temp, y_temp, test_size=0.5, random_state =42)
Training the RandomForest Model
# Create a Random Forest classifierrf_classifier = RandomForestClassifier(n_estimators=100, min_samples_leaf =100, random_state=42)# Fit the classifier to the training datarf_classifier.fit(X_train, y_train)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
# Get predicted scores of calibration and test setsscores_train = rf_classifier.predict_proba(X_train)[:, 1]scores_calib = rf_classifier.predict_proba(X_calib)[:, 1]scores_test = rf_classifier.predict_proba(X_test)[:, 1]
Evaluating Model Performance with ROC Curve
# Compute ROC curve and area under the curve (AUC)y_true = np.concatenate((y_calib, y_test))scores = np.concatenate((scores_calib, scores_test))fpr, tpr, thresholds = roc_curve(y_true, scores, pos_label=1)
/Users/dutang/.virtualenvs/r-reticulate/lib/python3.12/site-packages/sklearn/metrics/_ranking.py:1183: UndefinedMetricWarning: No positive samples in y_true, true positive value should be meaningless
warnings.warn(
# Define a range of thresholds to evaluatethresholds = np.arange(0.1, 1.0, 0.01)best_threshold =0best_f1 =0# Iterate through thresholds and calculate F1 scorefor threshold in thresholds:#y_true = np.concatenate((y_calib, y_test))#scores = np.concatenate((scores_calib, scores_test)) predicted_labels = (scores_train > threshold).astype(int) y_train = y_train.astype(predicted_labels.dtype) f1 = f1_score(y_train, predicted_labels)# Update optimal values if F1 score is higherif f1 > best_f1: best_f1 = f1 best_threshold = threshold# Define classes on predicted scores for each datasetthreshold = best_threshold# Convert scores to binary class predictionsy_pred_train = (scores_train > threshold).astype(int)y_pred_calib = (scores_calib > threshold).astype(int)y_pred_test = (scores_test > threshold).astype(int)
The optimal threshold, which maximizes the F1 score, is found to be 0.10.
The optimal F1 score obtained is 0.2812.
Fairness with Equipy
Data preparation
Tip
To facilitate the application of the equiPy package, we first rename the datasets and create the necessary objects. We store calibration and test data into dataframes and prepare true outcome values. Additionally , the equiPy package does not support calculating unfairness for true values of y because it only accommodates binary values (0/1). For calculating unfairness, we require real-valued numbers, such as scores of classifiers, rather than binary outcomes.
preprocessing for equipy application
df_calib = freMPL1sub.loc[X_calib.index]df_test = freMPL1sub.loc[X_test.index]df_calib.rename(columns={'sensitive': 'Gender'}, inplace=True)df_test.rename(columns={'sensitive': 'Gender'}, inplace=True)# The function will enforce fairness on Gender then Age [ 'Age', 'Gender'] will enforce fairness on Age then Gender but the result is theoretically the same.x_ssa_calib = df_calib[['Gender', 'Age']]x_ssa_test = df_test[['Gender', 'Age']]y_true_calib = np.array(y_calib)y_true_test = np.array(y_test)
Enforcing Fairness Using Wasserstein Barycenters on Gender then Age
We enforce fairness by creating an instance of the MultiWasserstein class, fitting it on the calibration set, and then transforming the test set to make it fair.
# Create instance of Wasserstein classexact_wst = MultiWasserstein()# Fit EQF, ECDF, and weights on the calibration setexact_wst.fit(scores_calib, x_ssa_calib)# We apply those values to the test set to make it fair# The transform function returns the final fair y, after mitigating biases from the 2 sensitive atributes # First sensitive atribute : gender, Second : driver's age# Transform test set to make it fairy_final_fair = exact_wst.transform(scores_test, x_ssa_test)print("y_fair:", y_final_fair) # returns the y_fair
# Define dictionaries to track unfairness and performance metrics# for different attribute permutationsunfs_list = [{'Base model': 0, 'sens_var_1': 0, 'sens_var_2': 0}, {'Base model': 0, 'sens_var_2': 0, 'sens_var_1': 0}]perf_list = [{'Base model': 0, 'sens_var_1': 0, 'sens_var_2': 0}, {'Base model': 0, 'sens_var_2': 0, 'sens_var_1': 0}]# Calculate and print unfairness before and after mitigating biasesunfairness_before = unfairness(scores_test, x_ssa_test)unfairness_after = unfairness(y_final_fair, x_ssa_test)# Retrieve fairness metrics for each step with sensitive attributesy_seq_fair = exact_wst.y_fair# It contains the output of the base model, then the output where Gender is fair,# and finally where Gender and Age are fair# Calculate and print unfairness for each model variantunfairness_base_model = unfairness(y_seq_fair["Base model"], x_ssa_test)unfs_list[0]['Base model'] = unfairness_base_modelunfairness_gender = unfairness(y_seq_fair["Gender"], x_ssa_test)unfs_list[0]['sens_var_1'] = unfairness_genderunfairness_age = unfairness(y_seq_fair["Age"], x_ssa_test)unfs_list[0]['sens_var_2'] = unfairness_age# Evaluate performance metrics before and after bias mitigationy_true_test = y_true_test.astype(y_pred_test.dtype)accuracy_before = performance(y_true_test, y_pred_test, accuracy_score)f1_before = performance(y_true_test, y_pred_test, f1_score)# Set the threshold and convert final fair predictions to binary classesclass_final_fair = (y_final_fair > best_threshold).astype(int)accuracy_after = performance(y_true_test, class_final_fair, accuracy_score)f1_after = performance(y_true_test, class_final_fair, f1_score)metric = f1_scoreclass_base_model = (y_seq_fair["Base model"] > threshold).astype(int)perf_list[0]['Base model'] = performance(y_true_test, class_base_model, metric)class_sa_1 = (y_seq_fair["Gender"] > threshold).astype(int)perf_list[0]['sens_var_1'] = performance(y_true_test, class_sa_1, metric)class_sa_2 = (y_seq_fair["Age"] > threshold).astype(int)perf_list[0]['sens_var_2'] = performance(y_true_test, class_sa_2, metric)
Here are the outputs from the model under different fairness constraints.There is the output of the base model, output where Gender is fair, and finally output where Gender and Age are fair:
The unfairnessfunction computes the unfairness based on the Wasserstein distance. We can see a decrease in unfairness after enforcing fairness.
Unfairness before mitigation: 0.0338734782124827
Unfairness after mitigating biases from gender: 0.01594754303963647
Unfairness after mitigating biases from gender and age: 0.006320130284696135
But the accuracy and F1 score does not necessarily decrease.
Accuracy before mitigation: 0.7773311897106109
F1-score before mitigation: 0.22191011235955055
Accuracy after mitigating biases from gender and age: 0.7805466237942122
F1-score after mitigating biases from gender and age: 0.24585635359116023
Same code but for a different order of sensitive attributes : Age then Gender
Code for enforcing fairness on Age then Gender
# Rename datasets to facilitate the EquiPy package application# Creation of the objects useful for the packagex_ssa_calib = df_calib[['Age','Gender']]x_ssa_test = df_test[['Age','Gender']]# True outcome values (0/1)y_true_calib = np.array(y_calib)y_true_test = np.array(y_test)# Predicted scores because EquiPy deals with real-valued outcomes#scores_calib#scores_test# Instance of Wasserstein class : exact fairness# Create instance of Wasserstein class (MSA)exact_wst = MultiWasserstein()# We calculate EQF, ECDF, weights on the calibration setexact_wst.fit(scores_calib, x_ssa_calib)# We apply those values to the test set to make it fair# The transform function returns the final fair y,# after mitigating biases from the 2 sensitive attributes # First sensitive atribute : gender, Second : driver's agey_final_fair = exact_wst.transform(scores_test, x_ssa_test)y_seq_fair = exact_wst.y_fairunfs_list[1]['Base model'] = unfairness(y_seq_fair["Base model"], x_ssa_test)unfs_list[1]['sens_var_2'] = unfairness(y_seq_fair["Gender"], x_ssa_test)unfs_list[1]['sens_var_1'] = unfairness(y_seq_fair["Age"], x_ssa_test)# We can do the same with sequential fairnessmetric = f1_score# Calculate sequential accuracyy_true_test = y_true_test.astype(int)class_base_model = (y_seq_fair["Base model"] > threshold).astype(int)perf_list[1]['Base model'] = performance(y_true_test, class_base_model, metric)class_sa_1 = (y_seq_fair["Gender"] > threshold).astype(int)perf_list[1]['sens_var_2'] = performance(y_true_test, class_sa_1, metric)class_sa_2 = (y_seq_fair["Age"] > threshold).astype(int)perf_list[1]['sens_var_1'] = performance(y_true_test, class_sa_2, metric)
Same process but without gender and Age in the Base Model
df_encoded2 = pd.get_dummies(freMPL1sub, columns=['VehAge', 'MariStat', 'SocioCateg', 'VehUsage', 'VehBody', 'VehPrice', 'VehEngine', 'VehEnergy', 'VehMaxSpeed', 'VehClass', 'Garage'], drop_first=True)X2 = df_encoded2.drop(["y","sensitive", "Age"], axis=1)y2 = df_encoded2["y"]# Splitting into two datasetsX_train2, X_temp2, y_train2, y_temp2 = train_test_split(X2, y2, test_size=0.4, random_state =42)# Split the temporary set into calibration and test setsX_calib2, X_test2, y_calib2, y_test2 = train_test_split(X_temp2, y_temp2, test_size=0.5, random_state =42)# Create a Random Forest classifierrf_classifier2 = RandomForestClassifier(n_estimators=100, min_samples_leaf =100, random_state=42)# Fit the classifier to the training datarf_classifier2.fit(X_train2, y_train2)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Same process but without gender and Age in the Base Model
# Get predicted scores of calibration and test setsscores_train2 = rf_classifier2.predict_proba(X_train2)[:, 1]scores_calib2 = rf_classifier2.predict_proba(X_calib2)[:, 1]scores_test2 = rf_classifier2.predict_proba(X_test2)[:, 1]best_threshold2 =0best_f1_2 =0# Iterate through thresholds and calculate F1 scorefor threshold in thresholds:#y_true = np.concatenate((y_calib, y_test))#scores = np.concatenate((scores_calib, scores_test)) predicted_labels = (scores_train > threshold).astype(int) y_train = y_train.astype(predicted_labels.dtype) f1 = f1_score(y_train, predicted_labels)# Update optimal values if F1 score is higherif f1 > best_f1: best_f1_2 = f1 best_threshold2 = threshold# Define classes on predicted scores for each datasetthreshold2 = best_threshold2# Convert scores to binary class predictionsy_pred_train2 = (scores_train2 > threshold2).astype(int)y_pred_calib2 = (scores_calib2 > threshold2).astype(int)y_pred_test2 = (scores_test2 > threshold2).astype(int)y_true_calib2 = np.array(y_calib2)y_true_test2 = np.array(y_test2)# Create instance of Wasserstein classexact_wst = MultiWasserstein()# Fit EQF, ECDF, and weights on the calibration setexact_wst.fit(scores_calib2, x_ssa_calib)# We apply those values to the test set to make it fair# The transform function returns the final fair y,# after mitigating biases from the 2 sensitive atributes # First sensitive atributes : gender, Second : driver's agey_final_fair2 = exact_wst.transform(scores_test2, x_ssa_test)y_seq_fair2 = exact_wst.y_fair
Visualization of fairness density without age and gender, showcasing the influence of the proxy variable using the EquiPy package.
In this plot, we observe a significant disparity in distribution across different groups, even when sensitive attributes are excluded from the model. This disparity can be attributed to the influence of proxy variables, which indirectly encode the information of sensitive attributes. This observation underscores the inadequacy of merely removing sensitive attributes as a solution to fairness. equipy addresses this challenge by adjusting the underlying distribution, effectively mitigating discrimination that arises from proxy variables. This demonstrates equipy’s utility in enhancing fairness beyond the simplistic approach of omitting sensitive attributes, ensuring a more comprehensive approach to mitigating bias in machine learning models.
Fairness-performance combinations visualized step by step.
We observe that both performance and unfairness metrics remain consistent when transitioning from gender fairness to both gender and Age fairness, as well as from Age fairness to both Age and gender fairness. This consistency underscores that the order in which fairness criteria are applied does not impact the outcome, highlighting the robustness of the fairness adjustments regardless of the sequence in which sensitive attributes are considered.
Waterfall Plot for Sequential Fairness Gains Function
def _set_colors(substraction_list: list[float]) ->list[str]:""" Assign colors to bars based on the values in the subtraction_list. Parameters ---------- subtraction_list : list A list of numerical values representing the differences between two sets. Returns ------- list A list of color codes corresponding to each value in subtraction_list. Notes ----- - The color 'tab:orange' is assigned to positive values, 'tab:green' to non-positive values, and 'tab:grey' to the first and last positions. """ bar_colors = ['tab:grey']for i inrange(1, len(substraction_list)-1):if substraction_list[i] >0: bar_colors.append('tab:orange')else: bar_colors.append('tab:green') bar_colors.append('tab:grey')return bar_colorsdef _add_bar_labels(values: list[float], pps: list[plt.bar], ax: plt.Axes) -> plt.Axes:""" Add labels to the top of each bar in a bar plot. Parameters ---------- values : list A list of numerical values representing the heights of the bars. pps : list A list of bar objects returned by the bar plot. ax : matplotlib.axes.Axes The Axes on which the bars are plotted. Returns ------- matplotlib.axes.Axes Text object representing the labels added to the top of each bar in the plot. """ true_values = values + (values[-1],)for i, p inenumerate(pps): height = true_values[i] ax.annotate('{}'.format(height), xy=(p.get_x() + p.get_width() /2, height), xytext=(0, 3), textcoords="offset points", ha='center', va='bottom')return axdef _add_doted_points(ax: plt.Axes, values: np.ndarray) -> plt.Axes:""" Add dotted lines at the top of each bar in a bar plot. Parameters ---------- ax : numpy.ndarray The Axes on which the bars are plotted. values : numpy.ndarray An array of numerical values representing the heights of the bars. Returns ------- matplotlib.axes.Axes The dotted lines at the top of each bar in a bar plot This function adds dotted lines at the top of each bar in a bar plot, corresponding to the height values. Examples -------- >>> import matplotlib.pyplot as plt >>> fig, ax = plt.subplots() >>> values = np.array([10, 15, 7, 12, 8]) >>> add_dotted_lines(ax, values) >>> plt.show() """for i, v inenumerate(values): ax.plot([i+0.25, i+1.25], [v, v], linestyle='--', linewidth=1.5, c='grey')return axdef _add_legend(pps: list[plt.bar], distance: Union[np.ndarray, list], hatch: bool=False) ->list[plt.bar]:""" Add legend labels to the bar plot based on the distances. Parameters ---------- pps : List[plt.bar] List of bar objects. distance : np.ndarray or list Array or list of numerical values representing distances. hatch : bool, optional If True, uses hatching for the legend labels. Defaults to False. Returns ------- List[plt.bar] List of bar objects with legend labels added. """ used_labels =set()for i, bar inenumerate(pps):if i ==0or i ==len(pps)-1:continueif hatch: label ='Net Loss (if exact)'if distance[i] <0else'Net Gain (if exact)'else: label ='Net Loss'if distance[i] <0else'Net Gain'if label notin used_labels: bar.set_label(label) used_labels.add(label)return ppsdef _values_to_distance(values: list[float]) ->list[float]:""" Convert a list of values to a list of distances between consecutive values. Parameters ---------- values : list A list of numerical values. Returns ------- list A list of distances between consecutive values. Notes ----- This function calculates the differences between consecutive values in the input list, returning a list of distances. The last element in the list is the negation of the last value in the input list. """ arr = np.array(values) arr = arr[1:] - arr[:-1] distance =list(arr) + [-values[-1]]return distancedef fair_waterfall_plot(unfs_exact: dict[str, np.ndarray], unfs_approx: Optional[dict[str, np.ndarray]] =None) -> plt.Axes:""" Generate a waterfall plot illustrating the sequential fairness in a model. Parameters ---------- unfs_exact : dict Dictionary containing fairness values for each step in the exact fairness scenario. unfs_approx : dict, optional Dictionary containing fairness values for each step in the approximate fairness scenario. Default is None. Returns ------- matplotlib.axes.Axes The Figure object representing the waterfall plot. Notes ----- The function creates a waterfall plot with bars representing the fairness values at each step. If both exact and approximate fairness values are provided, bars are color-coded and labeled accordingly. The legend is added to distinguish between different bars in the plot. """ fig, ax = plt.subplots() unfs_exact = {key: round(value, 4) for key, value in unfs_exact.items()}if unfs_approx isnotNone: unfs_approx = {key: round(value, 4) for key, value in unfs_approx.items()} sens = [int(''.join(re.findall(r'\d+', key))) for key inlist(unfs_exact.keys())[1:]] labels = []for i inrange(len(list(unfs_exact.keys())[1:])):if i ==0: labels.append(f"$A_{sens[i]}$-fair")elif i ==len(list(unfs_exact.keys())[1:])-1: labels.append(f"$A_{1}$"+r"$_:$"+f"$_{sens[i]}$-fair")else: labels.append(f"$A_{{{','.join(map(str, sens[0:i+1]))}}}$-fair") leg = ('Base model',) +tuple(labels) + ('Final model',) base_exact =list(unfs_exact.values()) values_exact = [0] + base_exact distance_exact = _values_to_distance(values_exact)if unfs_approx isnotNone: base_approx =list(unfs_approx.values()) values_approx = [0] + base_approx distance_approx = _values_to_distance(values_approx)# waterfall for gray hashed color direction = np.array(distance_exact) >0 values_grey = np.zeros(len(values_exact)) values_grey[direction] = np.array(values_approx)[direction] values_grey[~direction] = np.array(values_exact)[~direction] distance_grey = np.zeros(len(values_exact)) distance_grey[direction] = np.array( values_exact)[direction] - np.array(values_approx)[direction] distance_grey[~direction] = np.array( values_approx)[~direction] - np.array(values_exact)[~direction]# waterfall for exact fairness pps0 = ax.bar(leg, distance_exact, color='w', edgecolor=_set_colors( distance_exact), bottom=values_exact, hatch='//') _add_legend(pps0, distance_exact, hatch=True) ax.bar(leg, distance_grey, color='w', edgecolor="grey", bottom=values_grey, hatch='//', label='Remains')# waterfall for approx. fairness pps = ax.bar(leg, distance_approx, color=_set_colors( distance_approx), edgecolor='k', bottom=values_approx, label='Baseline') _add_legend(pps, distance_approx)else:# waterfall for exact fairness pps = ax.bar(leg, distance_exact, color=_set_colors( distance_exact), edgecolor='k', bottom=values_exact, label='Baseline') _add_legend(pps, distance_exact) fig.legend(loc='upper center', bbox_to_anchor=(0.5, 0), ncol=3, fancybox=True) _add_bar_labels(tuple(base_exact)if unfs_approx isNoneelsetuple(base_approx), pps, ax) _add_doted_points(ax, tuple(base_exact)if unfs_approx isNoneelsetuple(base_approx)) ax.set_ylabel(f'Unfairness of the model')#ax.set_ylim(0, 1.1) ax.set_ylim(0, np.max(list(unfs_exact.values()))+np.max(list(unfs_exact.values()))/10if unfs_approx isNoneelse np.max(list(unfs_exact.values(), unfs_approx.values()))+np.max(list(unfs_exact.values(), unfs_approx.values()))/10) ax.set_title(f'Sequential ({"exact"if unfs_approx isNoneelse"approximate"}) fairness') plt.show()return ax
fair_waterfall_plot(unfs_exact = unfs_list[0])
Waterfall plot representing sequential gains in fairness.
This graph illustrates the incremental gains in fairness achieved at each step of the process. Each successive step demonstrates how the applied adjustments contribute to reducing bias, showcasing the cumulative effect of the fairness interventions throughout the process.
Charpentier, Arthur, François Hu, and Philipp Ratz. 2023. “Mitigating Discrimination in Insurance with Wasserstein Barycenters.”arXiv Preprint arXiv:2306.12912.
Sauce, Marguerite, Antoine Chancel, and Antoine Ly. 2023. “AI and Ethics in Insurance: A New Solution to Mitigate Proxy Discrimination in Risk Modeling.”arXiv Preprint arXiv:2307.13616.
See also
For additional datasets with similar occurrence structures, see credit (import with data("credit")): German Credit dataset, or uslapseagent: United States lapse dataset from tied-agent channel (import with data("uslapseagent")),