Quickstart

Sign-up and install

The Etiq library supports Python versions 3.6, 3.7, 3.8, 3.9 and 3.10 on Windows, Mac and Linux.

We do not support Mac m1 at the moment

To start with, go to the dashboard site, sign-up and login. If you want to deploy directly on your AWS instance, just go to our AWS Marketplace listing and deploy from there (using Etiq via AWS Marketplace incurs a cost however).

If you have purchased version 1.2 via AWS Marketplace please go to this section of the docs.

Once you've signed-up to the dashboard, check up this interactive demo:

Below are detailed instructions:

To start logging tests from your notebook or other IDE to your dashboard you will need a token to associate your session with your account. To create this token, once in your account go to the Token Management window and just click on Add New Access Token. Then copy and paste into your notebook.

Download and install Etiq:

pip install etiq

For install considerations please go to this section.

Then import it in your IDE & log to the dashboard:

import etiq

from etiq import login as etiq_login
etiq_login("https://dashboard.etiq.ai/", "<token>")
pip install etiq-spark 

import etiq.spark

Go to an example notebook or keep reading to get an understanding of the key concepts used in the tool. Please don't leave your token lying around as if anyone finds it they can use it to retrieve information stored about your pipelines similarly to how you use a password/username authentication.

Data about your test results get stored on Etiq's AWS instance. However your datasets and models will not actually be stored anywhere, so you can rest assured.

If your security set-up is such that you would need a deployment entirely on your cloud instance or on prem just get in touch with us - info@etiq.ai

Projects

A project is a collection of snapshots. To start using the versioning and dashboard functionality, please set a project and a project name. You only have to run it once per session and all the details logged as part of data pipelines or debias pipelines will be stored. Once you go to your dashboard you will be able to see each of your projects and dig deeper into each of them.

#Create or open project

project = etiq.projects.open(name="<Project Name>")

#Retrieve all projects
all_projects = etiq.projects.get_all_projects()
print(all_projects)

Log a snapshot to etiq - key principles

This step is just about logging the relevant information so you can run your tests/scans afterwards. You will need to log your model, your training and test dataset and the config file which defines key parameters, such as what's the predicted features, what are the categorical/continuous features, etc.

Depending at which stage in the model build/production you are and what type of scans you are running, you will want to log differently:

  1. If you are using Etiq's wrapper around model classes, then essentially you log as you train. You can input your entire dataset (in an appropriate format, e.g. already encoded, or already transformed). And in the config you can give different % to the train/validation/test split, e.g. "train_valid_test_splits": [0.8, 0.1, 0.1]

  2. If you have already built your model, then you will need to log a hold-out sample to Etiq as your dataset, and this sample will need to be in a format appropriate for being scored by your scoring function. When you log the split in the config, you should reflect this as your hold-out sample is a validation sample: "train_valid_test_splits": [0.0, 1.0, 0.0]. You can also use this set-up for production type use cases.

Log a snapshot to etiq - example for already trained model

First you will need to load your config file. This file contains relevant parameters which will be useful in logging the rest of the elements so make sure you log this before you create your snapshot.

with etiq.etiq_config("./config_demo.json"):
    #load your dataset
    #log your already trained model
    #create a snapshot
    #conduct metrics scan

You can also load your config file in the way shown below. However, we prefer the "modern" way shown above because the config below is only used within the with block and doesn't persist until overridden, as in the global example above.

etiq.load_config(“./config_demo.json”)

Example configs are provided here and also below. For details on what to log to config check the Config Key Concept. For details on how to adjust the config for different scan types, check Accuracy, Leakage, Drift, Bias or relevant notebooks by scan type here.

{
    "dataset": {
        "label": "income",
        "bias_params": {
            "protected": "gender",
            "privileged": 1,
            "unprivileged": 0,
            "positive_outcome_label": 1,
            "negative_outcome_label": 0
        },
        "train_valid_test_splits": [0.0, 1.0, 0.0],
        "remove_protected_from_features": false
    },
    "scan_accuracy_metrics": {
        "thresholds": {
            "accuracy": [0.8, 1.0],
            "true_pos_rate": [0.6, 1.0],
            "true_neg_rate":  [0.6, 1.0]           
        }
	},
	"scan_bias_metrics": {
        "thresholds": {
            "equal_opportunity": [0.0, 0.2],
            "demographic_parity": [0.0, 0.2],
            "equal_odds_tnr":  [0.0, 0.2], 
			"individual_fairness": [0.0, 0.8], 
			"equal_odds_tpr": [0.0, 0.2]			
        }
    }, 
	"scan_leakage": {
        "leakage_threshold": 0.85
     }
}

For example notebooks and config files, just go to our demo repository.

Next, you will log your dataset and your model. To log your dataset please log the test dataset that you used to assess your model. (There are 2 scans for which your training dataset will be needed: scan_bias_sources and scan_leakage - for more details look at Accuracy).

If your dataset is not in a format your model can score, the scan will not run!

If you have a use case where you can use demographic feature in your training dataset, you have the option to leave it in using this clause in the config:

"remove_protected_from_features": false

The default is that the demographic feature is removed in the scoring. This is because in regulated use cases you shouldn't use the demographic/protected feature to train your model on, but the scan still needs information about the demographic if you want to run bias scans.

from etiq import Model


#log your dataset

dataset = etiq.BiasDatasetBuilder.dataset(test, label="<target-feature-name>") 
    #can also use SimpleDatasetBuilder

#Log your already trained model

model = Model(model_architecture=standard_model, model_fitted=model_fit)

Parameter 'model_architecture' refers to model architecture, and is optional, e.g.

standard_model = XGBClassifier(use_label_encoder=False, eval_metric='logloss', random_state=4)    

Parameter 'model_fitted' refers to model fit, however you store it, e.g.:

model_fit = standard_model.fit(x_train, y_train)

You can also specify the 'model_fitted' parameter only.

And create your snapshot:

snapshot = project.snapshots.create(name="<snapshot-name>", dataset=dataset, model=model, bias_params=etiq.BiasDatasetBuilder.bias_params())

For drift-type scans you will not need a model, instead you can have a dataset, e.g. this month's dataset, and a benchmark dataset that you're comparing against, e.g. last months' dataset. For more details on how to set-up drift scans, go here or for example notebooks with drift go here.

Run scans on your snapshot

Now you are ready to run scans on your snapshot:

snapshot.scan_accuracy_metrics()

snapshot.scan_bias_metrics()

The above is an example using an already trained model in pre-production. For a full notebook on this go here.

If you want to use one of Etiq's pre-configured model classes see an example here.

If you want to use the scans in production, just email us info@etiq.ai . A demo integration with Airflow will be available shortly.

Threshold values in the example config files are for example purposes. Different use cases will require different thresholds. As the AI regulation sector matures we will add corresponding standards and suggested thresholds, but this will never be a hard and fast rule, it will be a suggestion. What might work for one use case will not work for another.

You have the option to add the categorical and continuous features in your config, as per the example below. This is useful for certain types of scans which translate the findings into business rules, but you have to remember to update your config if you take out or add new features.

{
    "dataset": {
        "label": "income",
        "bias_params": {
            "protected": "gender",
            "privileged": 1,
            "unprivileged": 0,
            "positive_outcome_label": 1,
            "negative_outcome_label": 0
        },
        "train_valid_test_splits": [0.0, 1.0, 0.0],
        "remove_protected_from_features": false, 
        "cat_col": ["workclass", "relationship", "occupation", "gender", "race", "native-country", "marital-status", "income", "education"],
        "cont_col": ["age", "educational-num", "fnlwgt", "capital-gain", "capital-loss", "hours-per-week"]

    },
    "scan_accuracy_metrics": {
        "thresholds": {
            "accuracy": [0.8, 1.0],
            "true_pos_rate": [0.6, 1.0],
            "true_neg_rate":  [0.6, 1.0]           
        }
	},
	"scan_bias_metrics": {
        "thresholds": {
            "equal_opportunity": [0.0, 0.2],
            "demographic_parity": [0.0, 0.2],
            "equal_odds_tnr":  [0.0, 0.2], 
			"individual_fairness": [0.0, 0.8], 
			"equal_odds_tpr": [0.0, 0.2]			
        }
    }, 
	"scan_leakage": {
        "leakage_threshold": 0.85
     }
}

If you do not want to scan for bias or do not have in your dataset information about protected features, you can just not add that information to your config. An example config for a data drift use case below. For more details on this example check the github repo and/or the section about Drift

{
    "dataset": {
        "label": "income",
        "train_valid_test_splits": [0.0, 1.0, 0.0]
		
    },
    "scan_drift_metrics": {
        "thresholds": {
            "psi": [0.0, 0.15],
            "kolmogorov_smirnov": [0.05, 1.0]
        },
        "drift_measures": ["kolmogorov_smirnov" , "psi"]       
    }      
}

Last updated