Snapshot
Etiq works via a lightweight logging mechanic. You log your data and your model (a snapshot) and then you run a scan on it - which is the testing functionality itself. As you experiment with more and more snapshots, you keep scanning your model versions and all the test results and issues found get sent to a centralised dashboard.
A snapshot is a combination of dataset and model, especially for in pre-production testing. To start testing your system you need to log your snapshot to Etiq, and to do so you’d log the dataset and the model. For an end-to-end notebook example, go here.
#Log your dataset
dataset = etiq.BiasDatasetBuilder.dataset(data_encoded, label="<target-feature-name>")
#can also use SimpleDatasetBuilder
#Log your already trained model
model = Model(model_architecture=standard_model, model_fitted=model_fit)
# Creating a snapshot
snapshot = project.snapshots.create(name="<snapshot-name>", dataset=dataset, model=model, bias_params=etiq.BiasDatasetBuilder.bias_params())
For validation and production stages, snapshots are not produced in the course of experimentation, they are produced as a model is deployed and runs in production. But from the point of view of Etiq’s logging mechanic, they get logged the same way. Each time your model scores a new batch of data, it records a new snapshot. However the information needed for testing is slightly different in production vs. pre-production and the tests themselves are a bit different. For drift type tests or generally in production, you might not have the available model, but more importantly, you would need the dataset you’re considering for drift and the benchmark dataset you are comparing against.
This is how you’d log it to Etiq:
# Log a dataset with the comparison data
dataset_s = etiq.SimpleDatasetBuilder.dataset(data_encoded, label="<target-feature-name>")
# Log a dataset with the data from your current view
todays_dataset_s = etiq.SimpleDatasetBuilder.dataset(todays_dataset_df, label="<target-feature-name>")
# Create the snapshot
snapshot = project.snapshots.create(name="<snapshot-name>", dataset=todays_dataset_s, comparison_dataset=dataset_s, model=None)
At the moment we support uploading pandas dataframes to the Etiq dataset object, but we are adding new formats all the time. The dataset you use should be already transformed in such a way that it can be inputted to a model class from any of the libraries mentioned. While Etiq contains some transformations, we recommend using your own. Especially with certain types of transformations (such as normalization) , please do NOT apply your transformation to your whole dataset prior to splitting it into train_test_valid as this will can contribute to leakage. (we will be adding scans to check for this as well in the future).
You can use any already trained model from the supported libraries: XGBoost, LightGBM, PyTorch, TensorFlow, Keras and scikit-learn.
For example purposes, we also provide out-of-the box model architectures for some model types:
DefaultXGBoostClassifier
(a wrapper around XGBoost classifier), DefaultRandomForestClassifier
(a wrapper around the random forest classifier from sklearn) and DefaultLogisticRegression
(a wrapper around the logistic regression classifier from sklearn). #Log your dataset
dataset = etiq.BiasDatasetBuilder.dataset(data_encoded, label="<target-feature-name>")
# Load our model
from etiq.model import DefaultXGBoostClassifier
model = DefaultXGBoostClassifier()
# Creating a snapshot
snapshot = project.snapshots.create(name="<snapshot-name>", dataset=dataset, model=model, bias_params=etiq.BiasDatasetBuilder.bias_params())
Last modified 4mo ago