Etiq AI - Data Science Copilot
Etiq 1.x Docs
Etiq 1.x Docs
  • Welcome to Etiq
  • Quickstart
  • Usage Plans
  • Key Concepts
    • Snapshot
    • Scan
    • Config
    • Custom Tests
  • Scan Types
    • Accuracy
    • Leakage
    • Drift
    • Bias
    • Data Issues
    • Data Fingerprinting
  • RCA
    • RCA Type Scans
    • Accuracy RCA Scan
    • Bias RCA Scan
    • Drift RCA Scan
  • Dashboard
    • Dashboard Components
    • Project Sharing
    • Data Synchronisation
  • Integrations
    • Airflow
    • Great Expectations
  • FAQ & Other
    • FAQ
    • API Documentation
    • Low Level API
Powered by GitBook
On this page
  • Sign-up and install
  • Projects
  • Log a snapshot to etiq - key principles
  • Log a snapshot to etiq - example for already trained model
  • Run scans on your snapshot

Was this helpful?

Quickstart

PreviousWelcome to EtiqNextUsage Plans

Last updated 8 months ago

Was this helpful?

Sign-up and install

The Etiq library supports Python versions 3.8, 3.9, 3.10, 3.11 and 3.12 on Windows, Mac and Linux.

With the release of Etiq 1.6.0 the package is now compatible with Apple Silicon processors.

Due to dependencies in the package you may need to install libomp via homebrew

If you haven't already, install homebrew on your computer:

Then run the following in your terminal: brew install libomp

If you're looking to use our Great Expectations integration, please ensure you install

great-expectations <= 0.18.19 due to breaking changes being introduced with v.1.0.0

If you have any questions please contact us at

To start with, go to the , sign-up and login. If you want to deploy directly on your AWS instance, just go to our AWS Marketplace listing and deploy from (using Etiq via AWS Marketplace incurs a cost however).

If you have purchased version 1.2 via AWS Marketplace please go to of the docs.

Once you've signed-up to the dashboard, check up this interactive demo:

Below are detailed instructions:

To start logging tests from your notebook or other IDE to your dashboard you will need a token to associate your session with your account. To create this token, once in your account go to the Token Management window and just click on Add New Access Token. Then copy and paste into your notebook.

Download and install Etiq:

pip install etiq

Then import it in your IDE & log to the dashboard:

import etiq

from etiq import login as etiq_login
etiq_login("https://dashboard.etiq.ai/", "<token>")
pip install etiq-spark 

import etiq.spark

Data about your test results get stored on Etiq's AWS instance. However your datasets and models will not actually be stored anywhere, so you can rest assured.

Projects

A project is a collection of snapshots. To start using the versioning and dashboard functionality, please set a project and a project name. You only have to run it once per session and all the details logged as part of data pipelines or debias pipelines will be stored. Once you go to your dashboard you will be able to see each of your projects and dig deeper into each of them.

#Create or open project

project = etiq.projects.open(name="<Project Name>")

#Retrieve all projects
all_projects = etiq.projects.get_all_projects()
print(all_projects)

Log a snapshot to etiq - key principles

This step is just about logging the relevant information so you can run your tests/scans afterwards. You will need to log your model, your training and test dataset and the config file which defines key parameters, such as what's the predicted features, what are the categorical/continuous features, etc.

Depending at which stage in the model build/production you are and what type of scans you are running, you will want to log differently:

  1. If you are using Etiq's wrapper around model classes, then essentially you log as you train. You can input your entire dataset (in an appropriate format, e.g. already encoded, or already transformed). And in the config you can give different % to the train/validation/test split, e.g. "train_valid_test_splits": [0.8, 0.1, 0.1]

  2. If you have already built your model, then you will need to log a hold-out sample to Etiq as your dataset, and this sample will need to be in a format appropriate for being scored by your scoring function. When you log the split in the config, you should reflect this as your hold-out sample is a validation sample: "train_valid_test_splits": [0.0, 1.0, 0.0]. You can also use this set-up for production type use cases.

Log a snapshot to etiq - example for already trained model

First you will need to load your config file. This file contains relevant parameters which will be useful in logging the rest of the elements so make sure you log this before you create your snapshot.

with etiq.etiq_config("./config_demo.json"):
    #load your dataset
    #log your already trained model
    #create a snapshot
    #conduct metrics scan

You can also load your config file in the way shown below. However, we prefer the "modern" way shown above because the config below is only used within the with block and doesn't persist until overridden, as in the global example above.

etiq.load_config(“./config_demo.json”)
{
    "dataset": {
        "label": "income",
        "bias_params": {
            "protected": "gender",
            "privileged": 1,
            "unprivileged": 0,
            "positive_outcome_label": 1,
            "negative_outcome_label": 0
        },
        "train_valid_test_splits": [0.0, 1.0, 0.0],
        "remove_protected_from_features": false
    },
    "scan_accuracy_metrics": {
        "thresholds": {
            "accuracy": [0.8, 1.0],
            "true_pos_rate": [0.6, 1.0],
            "true_neg_rate":  [0.6, 1.0]           
        }
	},
	"scan_bias_metrics": {
        "thresholds": {
            "equal_opportunity": [0.0, 0.2],
            "demographic_parity": [0.0, 0.2],
            "equal_odds_tnr":  [0.0, 0.2], 
			"individual_fairness": [0.0, 0.8], 
			"equal_odds_tpr": [0.0, 0.2]			
        }
    }, 
	"scan_leakage": {
        "leakage_threshold": 0.85
     }
}

Next, you will log your dataset and your model. To log your dataset please log the test dataset that you used to assess your model. (There are 2 scans for which your training dataset will be needed: scan_bias_sources and scan_leakage - for more details look at Scan Types).

If your dataset is not in a format your model can score, the scan will not run!

If you have a use case where you can use demographic feature in your training dataset, you have the option to leave it in using this clause in the config:

"remove_protected_from_features": false

The default is that the demographic feature is removed in the scoring. This is because in regulated use cases you shouldn't use the demographic/protected feature to train your model on, but the scan still needs information about the demographic if you want to run bias scans.

from etiq import Model


#log your dataset

dataset = etiq.BiasDatasetBuilder.dataset(test, label="<target-feature-name>") 
    #can also use SimpleDatasetBuilder

#Log your already trained model

model = Model(model_architecture=standard_model, model_fitted=model_fit)

Parameter 'model_architecture' refers to model architecture, and is optional, e.g.

standard_model = XGBClassifier(use_label_encoder=False, eval_metric='logloss', random_state=4)    

Parameter 'model_fitted' refers to model fit, however you store it, e.g.:

model_fit = standard_model.fit(x_train, y_train)

You can also specify the 'model_fitted' parameter only.

And create your snapshot:

snapshot = project.snapshots.create(name="<snapshot-name>", dataset=dataset, model=model, bias_params=etiq.BiasDatasetBuilder.bias_params())

Run scans on your snapshot

Now you are ready to run scans on your snapshot:

snapshot.scan_accuracy_metrics()

snapshot.scan_bias_metrics()

If you want to use the scans in production, just email us info@etiq.ai . A demo integration with Airflow will be available shortly.

Threshold values in the example config files are for example purposes. Different use cases will require different thresholds. As the AI regulation sector matures we will add corresponding standards and suggested thresholds, but this will never be a hard and fast rule, it will be a suggestion. What might work for one use case will not work for another.

You have the option to add the categorical and continuous features in your config, as per the example below. This is useful for certain types of scans which translate the findings into business rules, but you have to remember to update your config if you take out or add new features.

{
    "dataset": {
        "label": "income",
        "bias_params": {
            "protected": "gender",
            "privileged": 1,
            "unprivileged": 0,
            "positive_outcome_label": 1,
            "negative_outcome_label": 0
        },
        "train_valid_test_splits": [0.0, 1.0, 0.0],
        "remove_protected_from_features": false, 
        "cat_col": ["workclass", "relationship", "occupation", "gender", "race", "native-country", "marital-status", "income", "education"],
        "cont_col": ["age", "educational-num", "fnlwgt", "capital-gain", "capital-loss", "hours-per-week"]

    },
    "scan_accuracy_metrics": {
        "thresholds": {
            "accuracy": [0.8, 1.0],
            "true_pos_rate": [0.6, 1.0],
            "true_neg_rate":  [0.6, 1.0]           
        }
	},
	"scan_bias_metrics": {
        "thresholds": {
            "equal_opportunity": [0.0, 0.2],
            "demographic_parity": [0.0, 0.2],
            "equal_odds_tnr":  [0.0, 0.2], 
			"individual_fairness": [0.0, 0.8], 
			"equal_odds_tpr": [0.0, 0.2]			
        }
    }, 
	"scan_leakage": {
        "leakage_threshold": 0.85
     }
}
{
    "dataset": {
        "label": "income",
        "train_valid_test_splits": [0.0, 1.0, 0.0]
		
    },
    "scan_drift_metrics": {
        "thresholds": {
            "psi": [0.0, 0.15],
            "kolmogorov_smirnov": [0.05, 1.0]
        },
        "drift_measures": ["kolmogorov_smirnov" , "psi"]       
    }      
}

For install considerations please go to

Exciting news etiq for spark is now also available. The data and drift tests you know and love applied to more data than ever before. To install & import just run the below:

Go to an or keep reading to get an understanding of the key concepts used in the tool. Please don't leave your token lying around as if anyone finds it they can use it to retrieve information stored about your pipelines similarly to how you use a password/username authentication.

If your security set-up is such that you would need a deployment entirely on your cloud instance or on prem just get in touch with us -

Scans like bias sources and leakage are about tests on the training dataset. For more details on how to run these scans, go to their corresponding sections: Leakage and

Example configs are provided and also below. For details on what to log to config check the . For details on how to adjust the config for different scan types, check , , , or relevant notebooks by scan type .

For example notebooks and config files, just go to our .

For drift-type scans you will not need a model, instead you can have a dataset, e.g. this month's dataset, and a benchmark dataset that you're comparing against, e.g. last months' dataset. For more details on how to set-up drift scans, go or for example notebooks with drift go .

The above is an example using an already trained model in pre-production. For a full notebook on this go .

If you want to use one of Etiq's pre-configured model classes see an example .

If you do not want to scan for bias or do not have in your dataset information about protected features, you can just not add that information to your config. An example config for a data drift use case below. For more details on this example check the and/or the section about Drift

🎉
🎉
🎉
example notebook
info@etiq.ai
demo repository
here
here
here
here
github repo
https://brew.sh/
info@etiq.ai
dashboard site
there
this section
here
Config Key Concept
here
this section.
Drift
Bias Sources Scan
Bias
Click to navigate through the demo
Log tests to the centralized dashboard
Accuracy
Leakage