How to use Etiq with Airflow
Etiq can easily be used as a library in Airflow DAGs. Since Etiq is available as a python module it can be used with airflow with no changes. There are two different instances when you might want to use Etiq with Airflow.
If you already use Airflow as an orchestration tool for your pipelines, you can easily integrate Etiq in your existing DAGs as an additional few steps. This will give you on-going monitoring and testing. An example DAG using Etiq to determine dataset drift is available here.
Example DAG for feature & target drift detection using Etiq
We also recommend setting up multiple tests at different points in your DAGs. The benefit is that all the test results will be centralized in your dashboard instance. This will give you a view of how your pipelines are performing at every single step. We are adding additional tagging functionality to make it easy for you to group the tests and instantly see which test failure happen at which point in your DAG.
You can also use Etiq tests as triggers. For instance, you can set-up a DAG in such a way that: if a drift test fails the next step is automated model retrain.
The second instance in which you can use Etiq and Airflow together is if irrespective of your orchestration or deployment set-up, you want to automate testing/monitoring using Etiq and Airflow. We provide an out-of-the-box docker-compose script for you with appropriate settings.
Only requirements for using this container is docker-compose and setting up the environmental variables if different from defaults provided.
The DAG can be used in your own Airflow environment. The following environmental variables can be defined
- AIRFLOW_VAR_ETIQ_DATA - The data directory location i.e. the location where the base and latest sub-directories are located.
- AIRFLOW_CONN_ETIQ_FS - Defines the airflow connection (etiq_fs) to be used for the config file and datasets.
- AIRFLOW_VAR_ETIQ_PROJECT - The etiq project name to use.
- AIRFLOW_VAR_ETIQ_TOKEN - (Required if dashboard variable is set) The token to use to login to the dash board specified in AIRFLOW_VAR_ETIQ_DASHBOARD.