Etiq Docs
Search…
Snapshot

Logging a snapshot

Etiq works via a lightweight logging mechanic. You log your data and your model (a snapshot) and then you run a scan on it - which is the testing functionality itself. As you experiment with more and more snapshots, you keep scanning your model versions and all the test results and issues found get sent to a centralised dashboard.
A snapshot is a combination of dataset and model, and this is how it usually looks in pre-production. To start testing your system you need to log your snapshot to Etiq, and to do so you’d log the dataset and the model. For an end-to-end notebook example, go here.
Before you log a snapshot you will need to load your config file. Otherwise you will get an error.
1
#Log your dataset
2
3
dataset_loader = etiq.dataset(data_encoded)
4
5
#Log your already trained model
6
7
model = Model(model_architecture=standard_model, model_fitted=model_fit)
8
9
# Creating a snapshot
10
snapshot = project.snapshots.create(name="Test Snapshot", dataset=dataset_loader.initial_dataset, model=model, bias_params=dataset_loader.bias_params)
11
Copied!
For validation and production stages, snapshots are not produced in the course of experimentation, they are produced as a model is deployed and runs in production. But from the point of view of Etiq’s logging mechanic, they get logged the same way. Each time your model scores a new batch of data, it records a new snapshot. However the information needed for testing is slightly different in production vs. pre-production and the tests themselves are a bit different. For drift type tests or generally in production, you might not have the available model, but more importantly, you would need the dataset you’re considering for drift and the benchmark dataset you are comparing against.
This is how you’d log it to Etiq:
1
# Log a dataset with the comparison data
2
3
dataset_s = etiq.SimpleDatasetBuilder.from_dataframe(data_encoded, target_feature='income').build()
4
5
# Log a dataset with the data from your current view
6
todays_dataset_s = etiq.SimpleDatasetBuilder.from_dataframe(todays_dataset_df, target_feature='income').build()
7
8
# Create the snapshot
9
snapshot = project.snapshots.create(name="Test Snapshot", dataset=todays_dataset_s, comparison_dataset=dataset_s, model=None)
10
Copied!

Dataset

At the moment we support uploading pandas dataframes to the Etiq dataset object, but we are adding new formats all the time. The dataset you use should be already transformed in such a way that it can be inputted to a model class from any of the libraries mentioned. While Etiq contains some transformations, we recommend using your own. Especially with certain types of transformations (such as normalization) , please do NOT apply your transformation to your whole dataset prior to splitting it into train_test_valid as this will can contribute to leakage. (we will be adding scans to check for this as well in the future).

Model

You can use any already trained model from the supported libraries: XGBoost, LightGBM, PyTorch, TensorFlow, Keras and scikit-learn.
For example purposes, we also provide out-of-the box model architectures for some model types: DefaultXGBoostClassifier (a wrapper around XGBoost classifier), DefaultRandomForestClassifier (a wrapper around the random forest classifier from sklearn) and DefaultLogisticRegression (a wrapper around the logistic regression classifier from sklearn).
Call these using the following syntax. For a notebook example, go here.
1
#Log your dataset
2
3
dataset_loader = etiq.dataset(data_encoded)
4
5
# Load our model
6
from etiq.model import DefaultXGBoostClassifier
7
model = DefaultXGBoostClassifier()
8
9
# Creating a snapshot
10
snapshot = project.snapshots.create(name="Test Snapshot", dataset=dataset_loader.initial_dataset, model=model, bias_params=dataset_loader.bias_params)
11
Copied!