Custom Tests

How to set up your own metrics to run regular scans and RCA type scans

You have multiple ways to customize your tests. You can choose the scan types, scan metrics and thresholds.

Custom Metrics for Accuracy and Bias Scans

The decorators you can use to build your custom accuracy and bias metrics are as follows:

They follow the parameters available in the config file.

Below is an example of how to add a custom metric to the accuracy metrics scan suite:

@etiq.metrics.accuracy_metric
@etiq.custom_metric
@etiq.actual_values('actual')
@etiq.prediction_values('predictions')
def accuracy_custom(predictions=None, actual=None):
    """ Accuracy = nr of correct predictions/ nr of predictions
    """
    apred = np.asarray(predictions)
    alabel = np.asarray(actual)
    return (apred == alabel).mean()

Below is an example of how to add a custom metric to the bias metrics scan suite:

@etiq.metrics.bias_metric
@etiq.custom_metric
@etiq.prediction_values('predictions')
def gini_index(predictions):
    class_counts = Counter(predictions)
    num_values = len(predictions)
    sum_probs = 0.0
    for aclass in class_counts:
        sum_probs += (class_counts[aclass]/num_values) ** 2
    return 1.0 - sum_probs

Afterwards don’t forget to update your config file with the metric name, and thresholds you want, before you run your scan.

{
    "dataset": {
        "label": "income",
        "bias_params": {
            "protected": "gender",
            "privileged": 1,
            "unprivileged": 0,
            "positive_outcome_label": 1,
            "negative_outcome_label": 0
        },
        "train_valid_test_splits": [0.0, 1.0, 0.0]
    },
    "scan_accuracy_metrics": {
        "thresholds": {
            "accuracy": [0.7, 0.9],
            "true_pos_rate": [0.75, 1.0],
            "true_neg_rate":  [0.7, 1.0], 
            "accuracy_custom": [0.9, 1.0]			
        }
	},
	"scan_bias_metrics": {
        "thresholds": {
            "equal_opportunity": [0.0, 0.2],
            "demographic_parity": [0.0, 0.2],
            "equal_odds_tnr":  [0.0, 0.2], 
	    "equal_odds_tpr": [0.0, 0.2],
	    "individual_fairness": [0.0, 0.2],
            "gini_index": [0.3, 0.4]
        }
    }
}

Custom Metrics for Drift Scans

You can now add custom metrics for drift scans as well. Examples in this notebook.

Below is an example of how to add a custom metric for feature or target drift scans:

from etiq.drift_measures import drift_measure
from scipy.stats import wasserstein_distance

@drift_measure
def earth_mover_drift_measure(expected_dist, new_dist, number_of_bins=10, bucket_type='bins', **kwargs) -> float:
    def scale_range (input, min, max):
        input += -(np.min(input))
        input *= (max - min)/np.max(input)
        input += min
        return input

    breakpoints = np.arange(0, number_of_bins + 1) / (number_of_bins) * 100
    if bucket_type == 'bins':
        breakpoints = scale_range(breakpoints, np.min(expected_dist), np.max(expected_dist))
    elif bucket_type == 'quantiles':
        breakpoints = np.stack([np.percentile(expected_dist, b) for b in breakpoints])

    expected_percents = np.histogram(expected_dist, breakpoints)[0] / len(expected_dist)
    actual_percents = np.histogram(new_dist, breakpoints)[0] / len(new_dist)

    return wasserstein_distance(expected_percents, actual_percents)pyth

Below is an example of how to add a custom metric for a concept drift scan:

from etiq.drift_measures import concept_drift_measure


@concept_drift_measure
def total_variational_distance(expected_dist, new_dist):
    return sum(0.5 * abs(x-y) for (x,y) in zip(expected_dist, new_dist))pyth

Don't forget to add the new metrics to the config file:

{
    "dataset": {
        "label": "income",
        "bias_params": {
            "protected": "gender",
            "privileged": 1,
            "unprivileged": 0,
            "positive_outcome_label": 1,
            "negative_outcome_label": 0
        },
        "train_valid_test_splits": [0.0, 1.0, 0.0],
        "remove_protected_from_features": true

    },
    "scan_drift_metrics": {
        "thresholds": {
            "psi": [0.0, 0.2],
            "kolmogorov_smirnov": [0.05, 1.0],
            "earth_mover_drift_measure": [0.0, 0.2]
        },
        "drift_measures": ["kolmogorov_smirnov", "psi", "earth_mover_drift_measure"]
    },
    "scan_concept_drift_metrics": {
        "thresholds": {
            "earth_mover_distance": [0.0, 0.05],
            "kl_divergence": [0.0, 0.2],
            "jensen_shannon_distance": [0.0, 0.2],
            "total_variational_distance": [0.0, 0.03]
        },
        "drift_measures": ["earth_mover_distance", "total_variational_distance"]
    }
}

To build your own drift type measures, consider the logic of feature/target drift vs. concept drift.

Feature and target drift look at whether the distribution for a certain feature has changed. For an explanation of how this is calculated using the out-of-the-box metrics provided check out the drift section. When building your own feature/target drift measure, you can use the following parameters which stand for the following concepts:

  • first argument (e.g. expected_dist in the example above): observations of a given feature in the baseline dataset

  • second argument (e.g. new_dist in the example above): observations of a given feature in the new dataset that we're assessing for feature drift

Concept drift looks at whether the relationships between input dataset and target feature have changed over time. The out-of-the-box measures for concept drift look at the change between 2 datasets when it comes to, for instance, the probability that if target has value 0 feature A has value 1. The measure looks not at just one probability value but conditional probabilities are calculated for the different potential combinations of target and feature values. Then the measure compares the distribution of all these probabilities in the 2 datasets. The custom measures follow the same logic. This means that the parameters which you use to build concept drift type measures stand for slightly different things than those you use for feature/target drift:

  • first argument (e.g. expected_dist in the example above): probabilities of the target values given a feature value in the baseline dataset

  • second argument (e.g. new_dist in the example above): probabilities of the target values given a feature value in the new dataset that we're assessing for concept drift

Note that for continuous features and/or target, the values will have to be binned.

Custom metrics for RCA type scans

You can also use your own custom metric in an RCA type scan.

If we continue the drift measure example below, we can just run the feature and target drift metrics RCA scans on the snapshot using the config as per below:

snapshot = project.snapshots.create(name="Test Snapshot", dataset=dataset1, comparison_dataset=dataset2, model=None)

#Scan for different drift types
(segments_f, issues_f, issue_summary_f) = snapshot.scan_drift_metrics()

would yield the following results:

This means you can now use Etiq to fully customize your tests to your use case, as well as to experiment with the best metrics and measures.

Custom Correlation/Association Measures

To add your own custom correlation or association metric, use the decorator @correlation_measure and see an example below:

@correlation_measure
def tschuprowsT(x: List[Any], y: List[Any]) -> float:
    """ Calculates Tschuprow's T between two (categorical) variables.
    Args:
        x (List[float]): List like values representing the first variable.
        y (List[float]): List like values representing the second variable.

    Returns:
        float:  tschuprow's T for the two variables (assuming they are both categorical)
    """
    if len(x) < 2:
        return np.nan
    df = pd.DataFrame({'x': x, 'y': y})
    contingency_table = pd.crosstab(index=df['x'], columns=df['y'])
    if contingency_table.shape[1] < 2 or contingency_table.shape[0] < 2:
        return np.nan
    val = association(contingency_table, method='tschuprow')
    return val

You can then use this for the relevant scans - we recommend using them in the scan types exemplified above.

Last updated