The scan is a test pipeline applied to a snapshot. The scan is testing whether the model/snapshot has a specific issue. The function to call a scan is a one liner:
A scan is a testing pipeline, and a lot of scans just test for one issue only, e.g. is accuracy above a certain threshold. But other scans, e.g. scan_bias_sources test for a lot of different issues at the same time because it is more efficient. Which is why we added ISSUE as a sub-element of the scan.
Whether an issue is found or not is based on whether the METRIC associated with the issue is outside acceptable THRESHOLDS.
You can set thresholds based on your use case, although we provide config files with suggested thresholds to get you started. You can also add custom metrics as per this section.
A subset of metrics is MEASURES. In our convention, measures are used more to uncover causes of issues rather than high level issues on your snapshot/model , e.g. a correlation coefficient would be a measure, but the line is blurry.
To be able to use the scans, you will also need to login parameters about the dataset. As the scans primarily handle classification problems at this stage, the parameters are as follows:
- For all scans:
- ‘label’ - Feature you are predicting
- ‘train_valid _test_splits’ (if your model is already trained and you’re providing only the test dataset for scans please set the % accordingly)
- Optional: ‘cat_col’ - list of categorical features
- Optional: ‘cont_col’ - list of continuous features
- For bias scans:
- ‘protected’ - a demographic feature or features that you are checking for bias for (protected characteristics) - for more information please see Bias Scans section
- ‘privileged’ - usually the majority class or the class not protected by legislation
- ‘unprivileged’ - the minority class or the class protected by legislation
- ‘positive_outcome_label’ - for bias type tests it’s important to know which outcome label for the predicted feature is a positive outcome for the individual (e.g. low likelihood of default on a loan, or high likelihood of performing well in a role). This allows you to set-up the test to understand if the group that needs to be ‘protected’ is more likely to be treated negatively by the model. (For more details please see the Bias tests section)
- ‘negative_outcome_label’ - a negative outcome for the individual (e.g. high likelihood of default of a loan)
Metrics, thresholds and parameters are customized as part of the config file (see next Key concept)