Etiq Test Library

Our initial limited release to give you a flavour of our library. Not for commercial or production uses!

Use cases & limitations

A typical use case for etiq library: Let's say you are building a predictive model using tabular customer data. You have wrangled your data and tried a few model classes. Now you want to see if your model is discriminating unintentionally against certain demographic groups, e.g. based on gender, ethnicity, age, etc. and you do have access to the demographic label. This is where you can use the etiq library.

Etiq library provides different kinds of pipelines that are intended to plug in to your existing pipelines and test them for a specific purpose. The pipelines currently available focus on identifying and mitigating unintended discrimination. Etiq pipelines provide identify methods, repair methods, metrics to evaluate outcomes including fairness metrics.

The current free release library provides only one etiq pipeline and the usage is limited to a model consisting of max 15 features. This is a teaser library and its intended use is not for production or commercial setting. As we're developing etiq further, we want to understand how you would interact with it, if it's a useful library and how to shape it to meet your needs. While we have tested our library, we expect issues and bugs in this current iteration which stems from usage that we haven't predicted. For more details on the theoretical underpinnings of our methods go to Definitions. We'd like to stress that the 'fairness' literature and methodology is a very wide field, with a lot of divergent opinions. Where applicable we will refer to the framework we are using, but some of our approaches are experimental.

If you want to access our full solution including support from us, or submit any comments, feature requests or issues please login to our slack channel or email us: [email protected]

Quickstart

The Etiq library supports Python versions 3.6, 3.7, 3.8 and 3.9 on Windows, Mac and Linux.

We do not support Mac m1 at the moment

We recommend using pip to install the Etiq library and its dependencies. To download the library go to this link.

To import etiq_core:

from etiq_core import *

DataPipeline

To follow the example analysis below, download the Adult dataset from https://archive.ics.uci.edu/ml/datasets/adult or load it in the notebook as a Pandas dataframe from the samples included in the library. A demo notebook is available at https://github.com/ETIQ-AI/demo/blob/main/DemoAdultLibrary01.ipynb

data = load_sample('adultdata')

The DataPipeline object has the model we'd like to evaluate, the dataset used to train it and the fairness metrics that are most relevant to our project.

Below, we define the parameters for the debiasing process. What is the protected category (often a demographic feature you'd like to mitigate bias for)? Who is in the privileged and unprivileged groups? What is a positive outcome in this dataset?

debias_param = BiasParams(protected='gender',
privileged='Male',
unprivileged='Female',
positive_outcome_label='>50K',
negative_outcome_label='<=50K')

Even if your model does not use the specific demographic features you want to identify bias for, you should include this in the dataset. (etiq will automatically exclude it later during any model refitting)

Specify transforms like Dropna or EncodeLabels to make sure data are numeric and without missing values.

transforms = [Dropna, EncodeLabels]

The DatasetLoader reads in the data, applies any transformations, splits the data into training, validation and test datasets and sets aside the test dataset to avoid data leakage in your analysis. The training and validation datasets are loaded into the Dataset class.

dl = DatasetLoader(data=data,
label='income',
transforms=transforms,
bias_params=debias_param,
train_valid_test_splits=[0.8, 0.1, 0.1],
names_col = data.columns.values)

Choose the metrics you want computed for this project.

metrics_initial= [accuracy, equal_opportunity]

Load the model you'd like to evaluate with the dataset or choose one of the architectures that are already available.

xgb = DefaultXGBoostClassifier()

Now you can create the DataPipeline. The DatasetLoader class will take the data, transform it, split it into training/validation/testing data and load it in. The DataPipeline computes your metrics of interest on the Dataset, using the model you provided.

pipeline_initial = DataPipeline(dataset_loader=dl, model=xgb, metrics=metrics_initial)
pipeline_initial.run()

Remember your dataset has as many features as you want but in this limited release library the DataPipeline will only pick up on the first 15 features

DebiasPipeline

DebiasPipeline takes as inputs a data pipeline, an identify and/or repair method and metrics you want to use to evaluate your model. Identify methods are as the name suggests intended to help you identify bias issues. Repair methods are designed to help fix or mitigate the issue identified and include implemented algorithms from the fairness literature. The current repair pipeline we provide is at the pre-processing level, i.e. changes the dataset with the objective that some of the sources of bias in it will be mitigated. Other methods at in-processing or post-processing stages will be more effective from an optimization point of view, but they might not address some of the issues in the data, which is why this is a good starting area. In our full solution we have additional pipelines.

identify_pipeline = IdentifyBiasSources(nr_groups=20, # nr of segments based on using unsupervised learning to group similar rows
train_model_segment=True,
group_def=['unsupervised'],
fit_metrics=[accuracy, equal_opportunity])
# the DebiasPipeline aims to mitigate sources of bias by applying different types of repair algorithms
# the library offers implementations of repair algorithms described in the academic fairness literature
repair_pipeline = RepairResamplePipeline(steps=[ResampleUnbiasedSegmentsStep(ratio_resample=1)], random_seed=4)
debias_pipeline = DebiasPipeline(data_pipeline=pipeline_initial,
model=xgb,
metrics=metrics_initial,
identify_pipeline=identify_pipeline,
repair_pipeline=repair_pipeline)
debias_pipeline.run()

In the fairness literature, mitigation is considered to be the likely terminology as these types of issues are hard to remove entirely. Our usage of the term repair & debias refers primarily to mitigation, rather than removal.

As with the data pipeline, when running the pipeline, we get the logs of how the pipeline has run:

INFO:etiq_core.pipeline.DebiasPipeline36:Starting pipeline
INFO:etiq_core.pipeline.DebiasPipeline36:Start Phase IdentifyPipeline844
INFO:etiq_core.pipeline.IdentifyPipeline844:Starting pipeline
INFO:etiq_core.pipeline.IdentifyPipeline844:Completed pipeline
INFO:etiq_core.pipeline.DebiasPipeline36:Completed Phase IdentifyPipeline844
INFO:etiq_core.pipeline.DebiasPipeline36:Start Phase RepairPipeline558
INFO:etiq_core.pipeline.RepairPipeline558:Starting pipeline
INFO:etiq_core.pipeline.RepairPipeline558:Completed pipeline
INFO:etiq_core.pipeline.DebiasPipeline36:Completed Phase RepairPipeline558
INFO:etiq_core.pipeline.DebiasPipeline36:Refitting model
INFO:etiq_core.pipeline.DebiasPipeline36:Computed metrics for the repaired dataset
INFO:etiq_core.pipeline.DebiasPipeline36:Completed pipeline

Output methods

Now that you've checked the logs and the etiq pipeline ran, to retrieve the outputs, use the following methods:

Metrics

debias_pipeline.get_protected_metrics()

Example output:

{'DataPipeline502':
[{'accuracy': ('privileged', 0.84, 'unprivileged', 0.93)},
{'equal_opportunity': ('privileged', 0.6901408450704225,'unprivileged',0.55)}],
'DebiasPipeline426':
[{'accuracy': ('privileged', 0.82, 'unprivileged', 0.91)},
{'equal_opportunity': ('privileged', 0.6539235412474849,'unprivileged', 0.65)}]}

Issues found by the pipeline

Our library is intended for you to test your models and see if there are any issues. The pipeline surfaces potential issues, and then it's up to you whether you consider them to be issues for your specific model or not. For more details on definitions please see Definitions tab

debias_pipeline.get_issues_summary()

Example output

To help make sense of the segments, we also have a profiler method which gives you an idea about the rows found to have specific issues.

debias_pipeline.get_profiler()

Evaluate method

If you've just built a pipeline using a repair method and want to see if the issues you've identified before, use the evaluate method

evaluate_debias = EvaluateDebiasPipeline(debias_pipeline=debias_pipeline,
identify_pipeline=identify_pipeline)
evaluate_debias.run()
evaluate_debias.get_issues_summary_before_repair()
evaluate_debias.get_issues_summary_after_repair()