Etiq Docs
Search…
Custom Tests
How to set up your own metrics to run regular scans and RCA type scans
You have multiple ways to customize your tests. You can choose the scan types, scan metrics and thresholds.
Additionally, you can also add your own custom metrics and measures, include them in your config and then your scans will check for this metric as well - both the regular scans and RCA type scans
πŸŽ‰
​
πŸŽ‰
​
πŸŽ‰
. For a notebook and config example check our github repo.

Custom Metrics for Accuracy and Bias Scans

The decorators you can use to build your custom accuracy and bias metrics are as follows:
  • prediction_values (refers to what the model scores, should be a list)
  • actual_values (refers to the actuals, if your custom metric is for production, it will use the score as actual if it is provided and no actuals or model are available, should be a list)
  • protected_values (refers to the demographic variable you want to check for bias, if you have multiple demographics please create a feature with the intersection)
  • positive_outcome (directional, refers to what is considered a positive prediction or outcome, e.g. in the case of a lending model it would be a low risk score or if the customer is accepted for the loan, should be a value)
  • negative_outcome (directional, refers to what is considered a negative prediction or outcome, e.g. in the case of a lending model it would be a high risk score or if the customer is rejected for the loan, should be a value)
  • privileged_class (refers to the class in the demographics which is priviledged - not protected by the legislation, should be a value)
  • unprivileged_class (refers to the class in the demographics which is not priviledged - and which is protected by the legislation, should be a value, in future releases we will add functionality for multiple values here)
They follow the parameters available in the config file.
@etiq.metrics.accuracy_metric - refers to logging your metric as an accuracy metric
@etiq.metrics.bias_metric - refers to logging your metric as an accuracy metric
@etiq.custom_metric - specifies that this is a custom metric
Below is an example of how to add a custom metric to the accuracy metrics scan suite:
@etiq.metrics.accuracy_metric
@etiq.custom_metric
@etiq.actual_values('actual')
@etiq.prediction_values('predictions')
def accuracy_custom(predictions=None, actual=None):
""" Accuracy = nr of correct predictions/ nr of predictions
"""
apred = np.asarray(predictions)
alabel = np.asarray(actual)
return (apred == alabel).mean()
​
Below is an example of how to add a custom metric to the bias metrics scan suite:
@etiq.metrics.bias_metric
@etiq.custom_metric
@etiq.prediction_values('predictions')
def gini_index(predictions):
class_counts = Counter(predictions)
num_values = len(predictions)
sum_probs = 0.0
for aclass in class_counts:
sum_probs += (class_counts[aclass]/num_values) ** 2
return 1.0 - sum_probs
Afterwards don’t forget to update your config file with the metric name, and thresholds you want, before you run your scan.
{
"dataset": {
"label": "income",
"bias_params": {
"protected": "gender",
"privileged": 1,
"unprivileged": 0,
"positive_outcome_label": 1,
"negative_outcome_label": 0
},
"train_valid_test_splits": [0.0, 1.0, 0.0]
},
"scan_accuracy_metrics": {
"thresholds": {
"accuracy": [0.7, 0.9],
"true_pos_rate": [0.75, 1.0],
"true_neg_rate": [0.7, 1.0],
"accuracy_custom": [0.9, 1.0]
}
},
"scan_bias_metrics": {
"thresholds": {
"equal_opportunity": [0.0, 0.2],
"demographic_parity": [0.0, 0.2],
"equal_odds_tnr": [0.0, 0.2],
"equal_odds_tpr": [0.0, 0.2],
"individual_fairness": [0.0, 0.2],
"gini_index": [0.3, 0.4]
}
}
}

Custom Metrics for Drift Scans

You can now add custom metrics for drift scans as well. Examples in this notebook.
Below is an example of how to add a custom metric for feature or target drift scans:
from etiq.drift_measures import drift_measure
from scipy.stats import wasserstein_distance
​
@drift_measure
def earth_mover_drift_measure(expected_dist, new_dist, number_of_bins=10, bucket_type='bins', **kwargs) -> float:
def scale_range (input, min, max):
input += -(np.min(input))
input *= (max - min)/np.max(input)
input += min
return input
​
breakpoints = np.arange(0, number_of_bins + 1) / (number_of_bins) * 100
if bucket_type == 'bins':
breakpoints = scale_range(breakpoints, np.min(expected_dist), np.max(expected_dist))
elif bucket_type == 'quantiles':
breakpoints = np.stack([np.percentile(expected_dist, b) for b in breakpoints])
​
expected_percents = np.histogram(expected_dist, breakpoints)[0] / len(expected_dist)
actual_percents = np.histogram(new_dist, breakpoints)[0] / len(new_dist)
​
return wasserstein_distance(expected_percents, actual_percents)pyth
Below is an example of how to add a custom metric for a concept drift scan:
from etiq.drift_measures import concept_drift_measure
​
​
@concept_drift_measure
def total_variational_distance(expected_dist, new_dist):
return sum(0.5 * abs(x-y) for (x,y) in zip(expected_dist, new_dist))pyth
Don't forget to add the new metrics to the config file:
{
"dataset": {
"label": "income",
"bias_params": {
"protected": "gender",
"privileged": 1,
"unprivileged": 0,
"positive_outcome_label": 1,
"negative_outcome_label": 0
},
"train_valid_test_splits": [0.0, 1.0, 0.0],
"remove_protected_from_features": true
​
},
"scan_drift_metrics": {
"thresholds": {
"psi": [0.0, 0.2],
"kolmogorov_smirnov": [0.05, 1.0],
"earth_mover_drift_measure": [0.0, 0.2]
},
"drift_measures": ["kolmogorov_smirnov", "psi", "earth_mover_drift_measure"]
},
"scan_concept_drift_metrics": {
"thresholds": {
"earth_mover_distance": [0.0, 0.05],
"kl_divergence": [0.0, 0.2],
"jensen_shannon_distance": [0.0, 0.2],
"total_variational_distance": [0.0, 0.03]
},
"drift_measures": ["earth_mover_distance", "total_variational_distance"]
}
}
To build your own drift type measures, consider the logic of feature/target drift vs. concept drift.
Feature and target drift look at whether the distribution for a certain feature has changed. For an explanation of how this is calculated using the out-of-the-box metrics provided check out the drift section. When building your own feature/target drift measure, you can use the following parameters which stand for the following concepts:
  • first argument (e.g. expected_dist in the example above): observations of a given feature in the baseline dataset
  • second argument (e.g. new_dist in the example above): observations of a given feature in the new dataset that we're assessing for feature drift
Concept drift looks at whether the relationships between input dataset and target feature have changed over time. The out-of-the-box measures for concept drift look at the change between 2 datasets when it comes to, for instance, the probability that if target has value 0 feature A has value 1. The measure looks not at just one probability value but conditional probabilities are calculated for the different potential combinations of target and feature values. Then the measure compares the distribution of all these probabilities in the 2 datasets. The custom measures follow the same logic. This means that the parameters which you use to build concept drift type measures stand for slightly different things than those you use for feature/target drift:
  • first argument (e.g. expected_dist in the example above): probabilities of the target values given a feature value in the baseline dataset
  • second argument (e.g. new_dist in the example above): probabilities of the target values given a feature value in the new dataset that we're assessing for concept drift
Note that for continuous features and/or target, the values will have to be binned.
​

Custom metrics for RCA type scans

You can also use your own custom metric in an RCA type scan.
If we continue the drift measure example below, we can just run the feature and target drift metrics RCA scans on the snapshot using the config as per below:
snapshot = project.snapshots.create(name="Test Snapshot", dataset=dataset1, comparison_dataset=dataset2, model=None)
​
#Scan for different drift types
(segments_f, issues_f, issue_summary_f) = snapshot.scan_drift_metrics()
​
would yield the following results:
Example results of drift RCA scan with custom metric - earth mover drift measure
This means you can now use Etiq to fully customize your tests to your use case, as well as to experiment with the best metrics and measures.

Custom Correlation/Association Measures

For bias sources and leakage scans, we use correlation and association measures. We provide multiple correlation measures out of the box to be used based on the type of features you have: Pearson, Cramer's V, Rank-Biserial, Point-Biserial, for more info see Bias Sources Scan .
To add your own custom correlation or association metric, use the decorator @correlation_measure and see an example below:
@correlation_measure
def tschuprowsT(x: List[Any], y: List[Any]) -> float:
""" Calculates Tschuprow's T between two (categorical) variables.
Args:
x (List[float]): List like values representing the first variable.
y (List[float]): List like values representing the second variable.
​
Returns:
float: tschuprow's T for the two variables (assuming they are both categorical)
"""
if len(x) < 2:
return np.nan
df = pd.DataFrame({'x': x, 'y': y})
contingency_table = pd.crosstab(index=df['x'], columns=df['y'])
if contingency_table.shape[1] < 2 or contingency_table.shape[0] < 2:
return np.nan
val = association(contingency_table, method='tschuprow')
return val
You can then use this for the relevant scans - we recommend using them in the scan types exemplified above.
​