Etiq Docs
Search…
Bias RCA Scan
We have 2 RCA type tests for bias:
  • scan_bias_sources
  • scan_bias_metrics_rca
scan_bias_sources is described in more detail in this section. The auto option means that it essentially acts as an RCA scan, but there are multiple types of issues that it searches for.
scan_bias_metrics_rca is a typical RCA scan.
An example config file is as below:
1
{
2
"dataset": {
3
"label": "income",
4
"bias_params": {
5
"protected": "gender",
6
"privileged": 1,
7
"unprivileged": 0,
8
"positive_outcome_label": 1,
9
"negative_outcome_label": 0
10
},
11
"train_valid_test_splits": [0.0, 1.0, 0.0],
12
"remove_protected_from_features": true
13
},
14
15
"scan_bias_metrics": {
16
"thresholds": {
17
"equal_opportunity": [0.0, 0.2],
18
"demographic_parity": [0.0, 0.2],
19
"equal_odds_tnr": [0.0, 0.2],
20
"individual_fairness": [0.0, 0.2],
21
"equal_odds_tpr": [0.0, 0.2]
22
23
}
24
},
25
"scan_bias_metrics_rca": {
26
"thresholds": {
27
"demographic_parity": [0.0, 0.3]
28
},
29
"metric_filter": ["demographic_parity"],
30
"ignore_lower_threshold": true,
31
"ignore_upper_threshold": false,
32
"minimum_segment_size": 1000
33
}
34
35
}
Copied!
The syntax to run the scan after logging the model and dataset is the following:
1
snapshot.scan_bias_metrics_rca()
Copied!
Like with all RCA scans the principle behind the scan is that it searches through different combinations of records and it finds those combinations for which the metric is outside the thresholds. As per the usual scans, you can set the thresholds for what constitutes an issue for your use case. You can also filter out the metrics you want/do not want RCA for, using e.g. "metric_filter": ["accuracy", "true_pos_rate"]
To make the metrics per group meaningful, it assigns a minimum number of records that constitutes a group, but you can change this by using the following syntax/parameter: "minimum_segment_size": 1000 as per config example above. You can also forgo checking for issues below lower threshold or above higher threshold if you want to using this syntax: "ignore_lower_threshold": true
At the moment we only have results retrieval through the IDE and by snapshot using the following syntax and then call each of the elements.
1
(segments_bias, issues_bias, issue_summary_bias) = snapshot.scan_bias_metrics_rca()
Copied!
The end results give business rules to the segments to help you understand the records you’re having an issue with.
We are working to add more retrieval methods.
Out of the box you can scan for the following metrics:
  1. 1.
    Equal Opportunity: measures the difference in true positive rate between a privileged demographic group and an unprivileged demographic group.
  2. 2.
    Demographic Parity: measures the difference between number of positive labels out of total from a privileged demographic group vs. a unprivileged demographic group)
  3. 3.
    Equal Odds TNR: measures the difference between true negative rate - privileged vs. unprivileged. The full measure in the literature looks for an optimal point where the difference in true positive rate between demographic groups as well as the difference in true negative rate between demographic groups are both minimized.
  4. 4.
    Individual Fairness: measures whether individuals with similar features observe the same model responses
This test pipeline is experimental.
Copy link