Quickstart

Sign-up and install

The Etiq library supports Python versions 3.8, 3.9, 3.10, 3.11 and 3.12 on Windows, Mac and Linux.

With the release of Etiq 1.6.0 the package is now compatible with Apple Silicon processors.

Due to dependencies in the package you may need to install libomp via homebrew

If you haven't already, install homebrew on your computer: https://brew.sh/

Then run the following in your terminal: brew install libomp

If you're looking to use our Great Expectations integration, please ensure you install

great-expectations <= 0.18.19 due to breaking changes being introduced with v.1.0.0

If you have any questions please contact us at [email protected]

To start with, go to the dashboard site, sign-up and login. If you want to deploy directly on your AWS instance, just go to our AWS Marketplace listing and deploy from there (using Etiq via AWS Marketplace incurs a cost however).

Once you've signed-up to the dashboard, check up this interactive demo:

Click to navigate through the demo

Below are detailed instructions:

To start logging tests from your notebook or other IDE to your dashboard you will need a token to associate your session with your account. To create this token, once in your account go to the Token Management window and just click on Add New Access Token. Then copy and paste into your notebook.

Log tests to the centralized dashboard

Download and install Etiq:

pip install etiq

For install considerations please go to this section.

Then import it in your IDE & log to the dashboard:

import etiq

from etiq import login as etiq_login
etiq_login("https://dashboard.etiq.ai/", "<token>")

Exciting news 🎉🎉🎉 etiq for spark is now also available. The data and drift tests you know and love applied to more data than ever before. To install & import just run the below:

pip install etiq-spark 

import etiq.spark

Go to an example notebook or keep reading to get an understanding of the key concepts used in the tool. Please don't leave your token lying around as if anyone finds it they can use it to retrieve information stored about your pipelines similarly to how you use a password/username authentication.

Projects

A project is a collection of snapshots. To start using the versioning and dashboard functionality, please set a project and a project name. You only have to run it once per session and all the details logged as part of data pipelines or debias pipelines will be stored. Once you go to your dashboard you will be able to see each of your projects and dig deeper into each of them.

#Create or open project

project = etiq.projects.open(name="<Project Name>")

#Retrieve all projects
all_projects = etiq.projects.get_all_projects()
print(all_projects)

Log a snapshot to etiq - key principles

This step is just about logging the relevant information so you can run your tests/scans afterwards. You will need to log your model, your training and test dataset and the config file which defines key parameters, such as what's the predicted features, what are the categorical/continuous features, etc.

Depending at which stage in the model build/production you are and what type of scans you are running, you will want to log differently:

  1. If you are using Etiq's wrapper around model classes, then essentially you log as you train. You can input your entire dataset (in an appropriate format, e.g. already encoded, or already transformed). And in the config you can give different % to the train/validation/test split, e.g. "train_valid_test_splits": [0.8, 0.1, 0.1]

  2. If you have already built your model, then you will need to log a hold-out sample to Etiq as your dataset, and this sample will need to be in a format appropriate for being scored by your scoring function. When you log the split in the config, you should reflect this as your hold-out sample is a validation sample: "train_valid_test_splits": [0.0, 1.0, 0.0]. You can also use this set-up for production type use cases.

Scans like bias sources and leakage are about tests on the training dataset. For more details on how to run these scans, go to their corresponding sections: Leakage and Bias Sources Scan

Log a snapshot to etiq - example for already trained model

First you will need to load your config file. This file contains relevant parameters which will be useful in logging the rest of the elements so make sure you log this before you create your snapshot.

with etiq.etiq_config("./config_demo.json"):
    #load your dataset
    #log your already trained model
    #create a snapshot
    #conduct metrics scan

You can also load your config file in the way shown below. However, we prefer the "modern" way shown above because the config below is only used within the with block and doesn't persist until overridden, as in the global example above.

etiq.load_config(“./config_demo.json”)

Example configs are provided here and also below. For details on what to log to config check the Config Key Concept. For details on how to adjust the config for different scan types, check Accuracy, Leakage, Drift, Bias or relevant notebooks by scan type here.

{
    "dataset": {
        "label": "income",
        "bias_params": {
            "protected": "gender",
            "privileged": 1,
            "unprivileged": 0,
            "positive_outcome_label": 1,
            "negative_outcome_label": 0
        },
        "train_valid_test_splits": [0.0, 1.0, 0.0],
        "remove_protected_from_features": false
    },
    "scan_accuracy_metrics": {
        "thresholds": {
            "accuracy": [0.8, 1.0],
            "true_pos_rate": [0.6, 1.0],
            "true_neg_rate":  [0.6, 1.0]           
        }
	},
	"scan_bias_metrics": {
        "thresholds": {
            "equal_opportunity": [0.0, 0.2],
            "demographic_parity": [0.0, 0.2],
            "equal_odds_tnr":  [0.0, 0.2], 
			"individual_fairness": [0.0, 0.8], 
			"equal_odds_tpr": [0.0, 0.2]			
        }
    }, 
	"scan_leakage": {
        "leakage_threshold": 0.85
     }
}

Next, you will log your dataset and your model. To log your dataset please log the test dataset that you used to assess your model. (There are 2 scans for which your training dataset will be needed: scan_bias_sources and scan_leakage - for more details look at Scan Types).

from etiq import Model


#log your dataset

dataset = etiq.BiasDatasetBuilder.dataset(test, label="<target-feature-name>") 
    #can also use SimpleDatasetBuilder

#Log your already trained model

model = Model(model_architecture=standard_model, model_fitted=model_fit)

Parameter 'model_architecture' refers to model architecture, and is optional, e.g.

standard_model = XGBClassifier(use_label_encoder=False, eval_metric='logloss', random_state=4)    

Parameter 'model_fitted' refers to model fit, however you store it, e.g.:

model_fit = standard_model.fit(x_train, y_train)

You can also specify the 'model_fitted' parameter only.

And create your snapshot:

snapshot = project.snapshots.create(name="<snapshot-name>", dataset=dataset, model=model, bias_params=etiq.BiasDatasetBuilder.bias_params())

For drift-type scans you will not need a model, instead you can have a dataset, e.g. this month's dataset, and a benchmark dataset that you're comparing against, e.g. last months' dataset. For more details on how to set-up drift scans, go here or for example notebooks with drift go here.

Run scans on your snapshot

Now you are ready to run scans on your snapshot:

snapshot.scan_accuracy_metrics()

snapshot.scan_bias_metrics()

The above is an example using an already trained model in pre-production. For a full notebook on this go here.

If you want to use one of Etiq's pre-configured model classes see an example here.

If you want to use the scans in production, just email us [email protected] . A demo integration with Airflow will be available shortly.

You have the option to add the categorical and continuous features in your config, as per the example below. This is useful for certain types of scans which translate the findings into business rules, but you have to remember to update your config if you take out or add new features.

{
    "dataset": {
        "label": "income",
        "bias_params": {
            "protected": "gender",
            "privileged": 1,
            "unprivileged": 0,
            "positive_outcome_label": 1,
            "negative_outcome_label": 0
        },
        "train_valid_test_splits": [0.0, 1.0, 0.0],
        "remove_protected_from_features": false, 
        "cat_col": ["workclass", "relationship", "occupation", "gender", "race", "native-country", "marital-status", "income", "education"],
        "cont_col": ["age", "educational-num", "fnlwgt", "capital-gain", "capital-loss", "hours-per-week"]

    },
    "scan_accuracy_metrics": {
        "thresholds": {
            "accuracy": [0.8, 1.0],
            "true_pos_rate": [0.6, 1.0],
            "true_neg_rate":  [0.6, 1.0]           
        }
	},
	"scan_bias_metrics": {
        "thresholds": {
            "equal_opportunity": [0.0, 0.2],
            "demographic_parity": [0.0, 0.2],
            "equal_odds_tnr":  [0.0, 0.2], 
			"individual_fairness": [0.0, 0.8], 
			"equal_odds_tpr": [0.0, 0.2]			
        }
    }, 
	"scan_leakage": {
        "leakage_threshold": 0.85
     }
}

If you do not want to scan for bias or do not have in your dataset information about protected features, you can just not add that information to your config. An example config for a data drift use case below. For more details on this example check the github repo and/or the section about Drift

{
    "dataset": {
        "label": "income",
        "train_valid_test_splits": [0.0, 1.0, 0.0]
		
    },
    "scan_drift_metrics": {
        "thresholds": {
            "psi": [0.0, 0.15],
            "kolmogorov_smirnov": [0.05, 1.0]
        },
        "drift_measures": ["kolmogorov_smirnov" , "psi"]       
    }      
}

Last updated

Was this helpful?