Data collection and validation forms an essential part of any machine learning pipeline. A number of issues could come up at the data collection phase and the Etiq library provides a way of detecting these. Instead of having the user define explicit rules as to what constitutes valid data the rules are automatically generated based on an exemplar dataset.
The different kinds of data issues detected are:
Just as you do for drift, you will have to create your snapshot using the dataset you are assessing for issues and a comparison dataset. Example below:
snapshot = project.snapshots.create(name="Data Issues Snapshot",
For now you don't need to specify any particular config parameters and the syntax to run the scan is: