How to use Etiq with Airflow

Etiq can easily be used as a library in Airflow DAGs. Since Etiq is available as a python module it can be used with airflow with no changes. There are two different instances when you might want to use Etiq with Airflow.

Using Etiq within a pre-existing Airflow set-up

If you already use Airflow as an orchestration tool for your pipelines, you can easily integrate Etiq in your existing DAGs as an additional few steps. This will give you on-going monitoring and testing. An example DAG using Etiq to determine dataset drift is available here.

We also recommend setting up multiple tests at different points in your DAGs. The benefit is that all the test results will be centralized in your dashboard instance. This will give you a view of how your pipelines are performing at every single step. We are adding additional tagging functionality to make it easy for you to group the tests and instantly see which test failure happen at which point in your DAG.

You can also use Etiq tests as triggers. For instance, you can set-up a DAG in such a way that: if a drift test fails the next step is automated model retrain.

Separate container - Etiq + Airflow

The second instance in which you can use Etiq and Airflow together is if irrespective of your orchestration or deployment set-up, you want to automate testing/monitoring using Etiq and Airflow. We provide an out-of-the-box docker-compose script for you with appropriate settings.

Only requirements for using this container is docker-compose and setting up the environmental variables if different from defaults provided.

Environmental variables

The DAG can be used in your own Airflow environment. The following environmental variables can be defined

  • AIRFLOW_VAR_ETIQ_CONFIG - The location of the config file to be used with etiq. An example etiq config file is available here.

  • AIRFLOW_VAR_ETIQ_DATA - The data directory location i.e. the location where the base and latest sub-directories are located.

  • AIRFLOW_CONN_ETIQ_FS - Defines the airflow connection (etiq_fs) to be used for the config file and datasets.

  • AIRFLOW_VAR_ETIQ_PROJECT - The etiq project name to use.

  • AIRFLOW_VAR_ETIQ_DASHBOARD - (Optional) The location for the etiq dashboard to log results (e.g., or wherever your dashboard is deployed on your cloud instance)

  • AIRFLOW_VAR_ETIQ_TOKEN - (Required if dashboard variable is set) The token to use to login to the dash board specified in AIRFLOW_VAR_ETIQ_DASHBOARD.

Note that, currently, datasets have to be csv based. For specific database integrations or other orchestration tools reach out so us directly

Last updated