Custom Tests

How to set up your own metrics to run regular scans and RCA type scans

You have multiple ways to customize your tests. You can choose the scan types, scan metrics and thresholds.

Additionally, you can also add your own custom metrics and measures, include them in your config and then your scans will check for this metric (or measure) as well - both the regular scans and RCA type scans 🎉🎉🎉. For a notebook and config example check our github repo.

Custom Metrics for Accuracy and Bias Scans

The decorators you can use to build your custom accuracy and bias metrics are as follows:

DecoratorDescription

prediction_values

refers to what the model scores

should be a list

actual_values

refers to the actuals

if your custom metric is for production, it will use the score as actual if it is provided and no actuals or model are available

should be a list

protected_values

refers to the demographic variable you want to check for bias

if you have multiple demographics please create a feature with the intersection

positive_outcome

directional, refers to what is considered a positive prediction or outcome

e.g. in the case of a lending model it would be a low risk score or if the customer is accepted for the loan, should be a value

negative_outcome

directional, refers to what is considered a negative prediction or outcome

e.g. in the case of a lending model it would be a high risk score or if the customer is rejected for the loan, should be a value

privileged_class

refers to the class in the demographics which is privileged - not protected by the legislation

should be a value

unprivileged_class

refers to the class in the demographics which is not privileged - and which is protected by the legislation

should be a value, in future releases we will add functionality for multiple values here

They follow the parameters available in the config file.

@etiq.metrics.accuracy_metric

refers to logging your metric as an accuracy metric

@etiq.metrics.bias_metric

refers to logging your metric as a bias metric

@etiq.custom_metric

specifies that this is a custom metric

Below is an example of how to add a custom metric to the accuracy metrics scan suite:

@etiq.metrics.accuracy_metric
@etiq.custom_metric
@etiq.actual_values('actual')
@etiq.prediction_values('predictions')
def accuracy_custom(predictions=None, actual=None):
    """ Accuracy = nr of correct predictions/ nr of predictions
    """
    apred = np.asarray(predictions)
    alabel = np.asarray(actual)
    return (apred == alabel).mean()

Below is an example of how to add a custom metric to the bias metrics scan suite:

@etiq.metrics.bias_metric
@etiq.custom_metric
@etiq.prediction_values('predictions')
def gini_index(predictions):
    class_counts = Counter(predictions)
    num_values = len(predictions)
    sum_probs = 0.0
    for aclass in class_counts:
        sum_probs += (class_counts[aclass]/num_values) ** 2
    return 1.0 - sum_probs

Afterwards don’t forget to update your config file with the metric name, and thresholds you want, before you run your scan.

{
    "dataset": {
        "label": "income",
        "bias_params": {
            "protected": "gender",
            "privileged": 1,
            "unprivileged": 0,
            "positive_outcome_label": 1,
            "negative_outcome_label": 0
        },
        "train_valid_test_splits": [0.0, 1.0, 0.0]
    },
    "scan_accuracy_metrics": {
        "thresholds": {
            "accuracy": [0.7, 0.9],
            "true_pos_rate": [0.75, 1.0],
            "true_neg_rate":  [0.7, 1.0], 
            "accuracy_custom": [0.9, 1.0]			
        }
	},
	"scan_bias_metrics": {
        "thresholds": {
            "equal_opportunity": [0.0, 0.2],
            "demographic_parity": [0.0, 0.2],
            "equal_odds_tnr":  [0.0, 0.2], 
	    "equal_odds_tpr": [0.0, 0.2],
	    "individual_fairness": [0.0, 0.2],
            "gini_index": [0.3, 0.4]
        }
    }
}

Custom Metrics for Drift Scans

You can now add custom metrics for drift scans as well. Examples in this notebook.

Below is an example of how to add a custom metric for feature or target drift scans:

from etiq.drift_measures import drift_measure
from scipy.stats import wasserstein_distance

@drift_measure
def earth_mover_drift_measure(expected_dist, new_dist, number_of_bins=10, bucket_type='bins', **kwargs) -> float:
    def scale_range (input, min, max):
        input += -(np.min(input))
        input *= (max - min)/np.max(input)
        input += min
        return input

    breakpoints = np.arange(0, number_of_bins + 1) / (number_of_bins) * 100
    if bucket_type == 'bins':
        breakpoints = scale_range(breakpoints, np.min(expected_dist), np.max(expected_dist))
    elif bucket_type == 'quantiles':
        breakpoints = np.stack([np.percentile(expected_dist, b) for b in breakpoints])

    expected_percents = np.histogram(expected_dist, breakpoints)[0] / len(expected_dist)
    actual_percents = np.histogram(new_dist, breakpoints)[0] / len(new_dist)

    return wasserstein_distance(expected_percents, actual_percents)pyth

Below is an example of how to add a custom metric for a concept drift scan:

from etiq.drift_measures import concept_drift_measure


@concept_drift_measure
def total_variational_distance(expected_dist, new_dist):
    return sum(0.5 * abs(x-y) for (x,y) in zip(expected_dist, new_dist))pyth

Don't forget to add the new metrics to the config file:

{
    "dataset": {
        "label": "income",
        "bias_params": {
            "protected": "gender",
            "privileged": 1,
            "unprivileged": 0,
            "positive_outcome_label": 1,
            "negative_outcome_label": 0
        },
        "train_valid_test_splits": [0.0, 1.0, 0.0],
        "remove_protected_from_features": true

    },
    "scan_drift_metrics": {
        "thresholds": {
            "psi": [0.0, 0.2],
            "kolmogorov_smirnov": [0.05, 1.0],
            "earth_mover_drift_measure": [0.0, 0.2]
        },
        "drift_measures": ["kolmogorov_smirnov", "psi", "earth_mover_drift_measure"]
    },
    "scan_concept_drift_metrics": {
        "thresholds": {
            "earth_mover_distance": [0.0, 0.05],
            "kl_divergence": [0.0, 0.2],
            "jensen_shannon_distance": [0.0, 0.2],
            "total_variational_distance": [0.0, 0.03]
        },
        "drift_measures": ["earth_mover_distance", "total_variational_distance"]
    }
}

To build your own drift type measures, consider the logic of feature/target drift vs. concept drift.

Feature and target drift look at whether the distribution for a certain feature has changed. For an explanation of how this is calculated using the out-of-the-box metrics provided check out the drift section. When building your own feature/target drift measure, you can use the following parameters which stand for the following concepts:

  • first argument (e.g. expected_dist in the example above): observations of a given feature in the baseline dataset

  • second argument (e.g. new_dist in the example above): observations of a given feature in the new dataset that we're assessing for feature drift

Concept drift looks at whether the relationships between input dataset and target feature have changed over time. The out-of-the-box measures for concept drift look at the change between 2 datasets when it comes to, for instance, the probability that if target has value 0 feature A has value 1. The measure looks not at just one probability value but conditional probabilities are calculated for the different potential combinations of target and feature values. Then the measure compares the distribution of all these probabilities in the 2 datasets. The custom measures follow the same logic. This means that the parameters which you use to build concept drift type measures stand for slightly different things than those you use for feature/target drift:

  • first argument (e.g. expected_dist in the example above): probabilities of the target values given a feature value in the baseline dataset

  • second argument (e.g. new_dist in the example above): probabilities of the target values given a feature value in the new dataset that we're assessing for concept drift

Note that for continuous features and/or target, the values will have to be binned.

Custom metrics for RCA type scans

You can also use your own custom metric in an RCA type scan.

If we continue the drift measure example below, we can just run the feature and target drift metrics RCA scans on the snapshot using the config as per below:

snapshot = project.snapshots.create(name="Test Snapshot", dataset=dataset1, comparison_dataset=dataset2, model=None)

#Scan for different drift types
(segments_f, issues_f, issue_summary_f) = snapshot.scan_drift_metrics()

would yield the following results:

This means you can now use Etiq to fully customize your tests to your use case, as well as to experiment with the best metrics and measures.

Custom Correlation/Association Measures

For bias sources and leakage scans, we use correlation and association measures. We provide multiple correlation measures out of the box to be used based on the type of features you have: Pearson, Cramer's V, Rank-Biserial, Point-Biserial, for more info see Bias Sources Scan .

To add your own custom correlation or association metric, use the decorator @correlation_measure and see an example below:

@correlation_measure
def tschuprowsT(x: List[Any], y: List[Any]) -> float:
    """ Calculates Tschuprow's T between two (categorical) variables.
    Args:
        x (List[float]): List like values representing the first variable.
        y (List[float]): List like values representing the second variable.

    Returns:
        float:  tschuprow's T for the two variables (assuming they are both categorical)
    """
    if len(x) < 2:
        return np.nan
    df = pd.DataFrame({'x': x, 'y': y})
    contingency_table = pd.crosstab(index=df['x'], columns=df['y'])
    if contingency_table.shape[1] < 2 or contingency_table.shape[0] < 2:
        return np.nan
    val = association(contingency_table, method='tschuprow')
    return val

You can then use this for the relevant scans - we recommend using them in the scan types exemplified above.

Last updated