Last Updated: 07/24/2020
Machine Learning, AI and Data Science based predictive tools are being increasingly used in problems that can have a drastic impact on people's lives in policy areas such as criminal justice, education, public health, workforce development and social services. Recent work has raised concerns on the risk of unintended bias in these models, affecting individuals from certain groups unfairly. While a lot of bias metrics and fairness definitions have been proposed, there is no consensus on which definitions and metrics should be used in practice to evaluate and audit these systems. Further, there has been very little empirical work done on using and evaluating these measures on real-world problems, especially in public policy.
Aequitas is an open-source bias audit toolkit for data scientists, machine learning researchers, and policymakers to audit machine learning models for discrimination and bias, and to make informed and equitable decisions around developing and deploying predictive tools..
Before we talk about dataset, it is important to know what can be considered as a protected variable.
Protected variable: An attribute that partitions a population into groups whose outcomes should have parity. Examples include race, gender, caste, and religion. Protected attributes are not universal, but are application specific.
We provide a couple of sample dataset: Compas Dataset and Lending Club dataset.
The COMPAS Recidivism Risk Score Data and Analysis: Data contains variables used by the COMPAS algorithm in scoring defendants, along with their outcomes within 2 years of the decision, for over 10,000 criminal defendants in Broward County, Florida. Below you can identify some (not all) columns within the dataset.
Looking into the Lending Club dataset, the files contain complete loan data for all loans issued through the 2007-2015, including the current loan status (Current, Late, Fully Paid, etc.) and latest payment information. Additional features include credit scores, number of finance inquiries, address including zip codes, and state, and collections among others. There are 145 columns with information representing individual loan accounts. Each row is divided by an individual loan id and member id, of course, for the interest of privacy each member id has been removed from the dataset. Below you can identify some (not all) columns within the dataset.
Note: The sample .csv files already have ‘score' and ‘label_value' as the columns. They just need to be downloaded and used for report generation.
The aequitas package offers functionalities as a web-app as well as from command line to generate the report. The packages, not only creates a PDF document but also a .csv file with all the values. In the most simplest terms, the .csv//PDF can be thought of a large pivot table representing the data.
The report has a fixed input format. It requires:
Following is sample for the report format:
A sample report can be found here. The report can be mainly divided into 3 parts.
This tutorial heavily depends on metrics like accuracy, parity difference, equal odds difference, disparate impact, theil index, and other. So it's better to have a good understanding for the same. Let's look at them now. Refer to this wikipedia page for more information.
Ensure all protected groups have equal representation in the selected set. This criteria considers an attribute to have equal parity if every group is equally represented in the selected set. In simple terms, it tells about the data's distribution among the categories.
For example, consider the column ‘homeownership' from our Lending Club data. The available categories are [‘Any', ‘MORTGAGE', ‘RENT', ‘OTHER']. We see that there is unequal distribution of them in the data.
Ensure all protected groups are selected proportional to their percentage of the population. This criteria considers an attribute to have proportional parity if every group is represented proportionally to their share of the population. If your desired outcome is to intervene proportionally on people from all categories, then you care about this criteria
For example, the disparity for ‘Grade' is
Ensure all protected groups have the same false positive rates (as the reference group). This criteria considers an attribute to have False Positive parity if every group has the same False Positive Error Rate.
For example, the FPR for column ‘homeownership' is
This means that categories with ‘Mortgage' have 0.7 times the false positive rate as ‘Own' i.e. the model would predict positive class for ‘Own' more than the category ‘Mortgage'.
Ensure all protected groups have equally proportional false positives within the selected set (compared to the reference group). This criteria considers an attribute to have False Discovery Rate parity if every group has the same False Discovery Error Rate.
For example, again for homeownership, the FDR parity is
Ensure all protected groups have the same false negative rates (as the reference group). This criteria considers an attribute to have False Negative parity if every group has the same False Negative Error Rate.
For example, for the column ‘term period' the FNR parity is
At the end of the report, the relative values and the absolute values for the metrics are reported.
In this module we showcase how different people can read and/or analyze the report.
Let's start with a Data Scientist.
After reading the report, the data scientist would infer that the model is biased in many ways. Some of the possible scenarios to come up are as follows:
Now, let's consider a finance person. Clearly, she/he doesn't have knowledge of the underlying model and cannot effectively change the model. Some of the possible scenarios for her/him would be as follows: