Bias
What is bias?
In this context, bias refers to algorithmic bias. "Algorithmic bias" refers to unintended discrimination occurring as a result of an automated decision.
Legislation defines a series of protected features. For example, in the UK, citizens are protected against discrimination on the basis of age, disability, gender reassignment, marriage and civil partnership, pregnancy and maternity, race, religion or belief, sex or sexual orientation status by the Equality Act 2010.
The unprivileged group within the protected feature (for example, people over 65 when age is the protected feature) tends to be discriminated against and as a result tends to be the one protected by legislation. The privileged group within the protected feature tends to not be discriminated against.
If you are not tackling this issue, not only is your model potentially unethical, discriminating unintentionally and at risk from a compliance point of view, but also you are potentially leaving customer groups underserved and thus leaving money on the table.
Bias Metrics Scan
Some of the metrics commonly used in the algorithmic fairness literature that the Etiq library provides are:
Metrics | Description |
---|---|
Equal Opportunity | measures the difference in true positive rate between a privileged demographic group and an unprivileged demographic group |
Demographic Parity | measures the difference between number of positive labels out of total from a privileged demographic group vs. a unprivileged demographic group) |
Equal Odds TNR | measures the difference between true negative rate - privileged vs. unprivileged The full measure in the literature looks for an optimal point where the difference in true positive rate between demographic groups as well as the difference in true negative rate between demographic groups are both minimized |
Individual Fairness | measures whether individuals with similar features observe the same model responses |
Our Bias Metrics scan uses the metrics above with certain thresholds to see if the model meets that benchmark or not.
The syntax to run the scan after you’ve logged a snapshot is:
snapshot.scan_bias_metrics()
The thresholds are set by the user, but most metrics are ideally as close to 0 as possible, meaning that the model shouldn't really behave differently (and with detrimental outcomes) for the protected groups.
The consensus in the literature (and our view) is that algorithmic bias can be mitigated but not removed entirely.
This is still a new area of research, and the metrics available can be misleading. For more resources please see our research post on this topic.
Bias Sources Scan
Our Bias Sources scan identifies potential sources of bias based on a framework that includes:
Sources | Description |
---|---|
Proxies | features that are proxy for demographics |
Sample size disparity | difference in sample sizes and size of positive/negative labels between protected demographic and the majority demographic group |
Segment size | are some customer profiles poorly represented in your sample? |
Limited features / correlation issue | features are less reliable for a certain demographic group this is oftentimes linked with sampling but more fundamentally it could be that some groups' behaviour is less well encoded by available features |
It is useful to look at these metrics globally to uncover issues across your sample. But a lot of the issues will only be visible for specific groups or specific records. The Bias Sources scan aims to identify which groups have the issues above.
Bias sources scan is ran on training dataset by default as this is where the potentially harmful unfairly discriminatory pattern is learned by your model. You will not be running this scan in production. Bias metrics is ran on the validation dataset.
The syntax to run the scan after you've logged the relevant config file and a snapshot is:
snapshot.scan_bias_
sources()
You have two options of bias sources scans to run:
if you don't set anything in the config, the segments will be fuzzy rather than business rules.
if you set the option: auto in the config (as in the current config we are using) then the segments will be based on business rules.
If you use the auto option, you will need to specify the categorical and continuous features. You can do this either from the config as in this case:
Or you can run it from the notebook:
We provide multiple correlation measures to be used based on the type of features: Pearson, Cramer's V, Rank-Biserial, Point-Biserial. Remember to clarify in the config or the snapshot which features are of which type to be able to use fully the multiple measure functionality. You can customize this in the config, but the default and recommended version is below:
"continuous_continuous_measure" : "pearsons"
"categorical_categorical_measure": "cramersv"
"categorical_continuous_measure": "rankbiserial"
"binary_continuous_measure": "pointbiserial"
There are many additional sources of bias, which require more background or context knowledge than just observing the data or the model:
'Tainted' examples: the target variable is reflective of past bias
e.g. a model predicting who might make a good hire using data on who was hired in the past not on who was the objectively best candidate for the role
Skewed sample: the dataset is not representative of the population for which the model will be used
Production vs. pre-production
Stage | Scan | Snapshot set-up |
---|---|---|
Pre-production (etiq wrapped model) | Bias Sources: | You can use the whole dataset and set-up the split % based on whatever you prefer (leaving at least % in the validation sample). Etiq dataset loader will split it for you when it creates the snapshot. By default the scan will be run on the training sample. The parameter ‘label’ refers to predicted (and because this is your training/test/validation it will also be your actuals) |
Pre-production (etiq wrapped model) | Bias Metrics:
| You can use the whole dataset and set-up the split % based on whatever you prefer. Etiq dataset loader will split it for you when it creates the snapshot. By default the scan will be run on the validation sample. The parameter ‘label’ refers to predicted (and because this is your training/test/validation it will also be your actuals) |
Pre-production (already trained user model) | Bias Sources: | You should log your actual training dataset as training by setting the split in the config file like this: train_valid_test_splits": [1.0, 0.0, 0.0]. By default the scan will be run on the training sample. You will have to run this scan separately from the bias metrics and bias accuracy scans. (We are working on changing this). The parameter ‘label’ refers to predicted (and because this is your training/test/validation it will also be your actuals) |
Pre-production (already trained user model) | Bias Metrics:
| You should log your actual test/validation dataset (the sample you did not use to train the model) as validation by setting the split in the config file like this: train_valid_test_splits: [0.0, 1.0, 0.0]. By default the scan will be run on the validation sample. The "label" parameter will be the predicted feature, not the actual. You won’t have actuals by this stage of model deployment yet. |
Production | Bias Metrics:
Individual_Fairness; Demographic_Parity | You should log your dataset as validation. By default the scan will be run on the validation sample. These metrics do not require actuals. The "label" parameter will be the predicted feature, not the actual. You won’t have actuals by this stage of model deployment yet. |
Production | Bias Metrics: scan_bias_metrics() Equal_Opportunity; Equal_Odds | Only once you have actuals you are able to run this scan in production. You should log your dataset as validation by setting the split in the config file like this: train_valid_test_splits": [0.0, 1.0, 0.0]. By default the scan will be run on the validation sample. The "label" parameter will be the actuals feature once you have it and you will need to set-up your dataset in advance (e.g. via using Airflow) |
Bias Scans Limitations
Bias is one of the most complex topics today. We started Etiq to help teams tackle this problem.
We don’t believe that having a few scans in place is enough to tackle this problem. We don’t think our bias sources scans are by any means exhaustive. Additionally the metrics themselves are often misleading - we have published some research on this topic here. However, if via these scans, data science and engineering teams at least start considering algorithmic bias and fairness as a problem they should tackle, as important if not more important than accuracy based performance, or drift, or data issues, then we feel at least part of our mission is accomplished.
If you are interested in this problem in more depth, we’d be very happy to hear from you. We have done research in the space and have additional pipelines built as part of the lower level API which we’re happy to share and run you through if you’re interested (email us info@etiq.ai).
Example notebooks
For example notebooks, code and config files for accuracy scans please see repo link.
Last updated