Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new dbt example DAG #9

Merged
merged 11 commits into from
Nov 30, 2021
Merged

Add new dbt example DAG #9

merged 11 commits into from
Nov 30, 2021

Conversation

denimalpaca
Copy link
Contributor

Add new dbt example DAG and necessary updates for dbt to showcase how to permanently store the store_failures tables generated by dbt test. The DAG comes from a question from the data quality webinar:

Any advise about using Airflow and dbt? We are currently using dbt for ELT together with our storage in Google Cloud Platform. dbt has some test functionality - simple built-in test as well as custom macros with jinja. The problem is that dbt overwrites the "store failures" tables everytime, and we can't use for any historic stuff. I am wonderring if airflow can work together with dbt to capture the test failure and store them in a specified table, together with sending custom slack alert.

From dbt PR #2593:

How can I preserve history of past test failures?

The same way you preserve historical data from sources, models, or anything else: snapshots on top of tables in the audit schema, or your own custom run-operation macros to copy or unload that data to external storage.

… to permanently store the store_failures tables generated by dbt test.
@denimalpaca denimalpaca self-assigned this Nov 19, 2021
Issues with BigQuery ID and dbt schema led to dropping the custom
test that was designed to fail, and designing the month test to
fail instead. The custom test for whatever reason was trying to
use the schema.project_id as the project ID and BigQuery did NOT
like that. Not sure what the fix is.

Additionally, the DAG is completed by adding a BigQuery copy table
operator. This copies a single table written by dbt test on failures
to a permament path. This is a not-so-great, but workable, solution
until AIP-42 gets merged and we can do some real dynamic task
generation based on XCOM return values. In preparation for that time,
BigQueryGetDatasetTablesOperator is left in commented-out.
Renamed DAGs and cleaned up bigquery version to not use certain
imports and operators until AIP-42 is released. Added a snowflake
version of the DAG which can also use the FFMC test. Runs dbt run
then dbt test, with 2 tests designed to fail. The outputs are then
loaded to a permanent table in the original schema. Modifications
to the directory structure to support these two separate DAGs as
their own dbt projects, as they work slightly differently.
New DAG and dbt project were created and modified to allow use of
Redshift as a backend to permanently store the store_failures tables
generated by dbt on test failures.
)

"""
Run dbt test suite
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only feedback here is that from the perspective of a new user (me), it's not clear reading this DAG what exactly the dbt test command is doing and what it's testing for. I don't know if that'd be obvious with someone with more dbt experience, but just wanted to call that out

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. So the dbt test command runs the test suite which is specified under include/dbt/[project]/models/[model]/schema.yml. I can add this to the docstring.

Copy link
Collaborator

@josh-fell josh-fell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Killer stuff! Really great to see another set of dbt examples.

dags/dbt_examples/copy_store_failures_bigquery.py Outdated Show resolved Hide resolved
dags/dbt_examples/copy_store_failures_bigquery.py Outdated Show resolved Hide resolved
dags/dbt_examples/copy_store_failures_bigquery.py Outdated Show resolved Hide resolved
dags/dbt_examples/copy_store_failures_redshift.py Outdated Show resolved Hide resolved
dags/dbt_examples/copy_store_failures_snowflake.py Outdated Show resolved Hide resolved
dags/dbt_examples/copy_store_failures_redshift.py Outdated Show resolved Hide resolved
dags/dbt_examples/copy_store_failures_redshift.py Outdated Show resolved Hide resolved
dags/dbt_examples/copy_store_failures_redshift.py Outdated Show resolved Hide resolved
dags/dbt_examples/copy_store_failures_snowflake.py Outdated Show resolved Hide resolved
limitations in dynamic task mapping, where needed values like 'source_table'
cannot be retrieved from Variables or other backend sources.

One is given as an example.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment needed here and in copy_store_failures_snowflake.py?

@denimalpaca denimalpaca merged commit cc411cb into main Nov 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants