Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-1506] [Bug] Nested {{ this }} not working as expected within macro #6249

Closed
2 tasks done
adamcunnington-mlg opened this issue Nov 15, 2022 · 6 comments
Closed
2 tasks done
Labels
bug Something isn't working

Comments

@adamcunnington-mlg
Copy link

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

I should call out at the outset that I'm suspicious this is actually a bug - possibly more the interplay of complex nuances that I don't fully understand.

I have a very similar use case to the one that is described by this exception in the docs about the need to nest {{ this }} when you want delayed evaluation. Indeed, it was also pointed out to the submitter of another bug which was a false negative here.

I am trying to grant a BigQuery row-level access policy on a table as a post-hook.
I understand that I need to nest {{ this }} inside a jinja expression to stop a compilation failure due to dbt-jinja not knowing about the just-created object.

The only difference between my use case and the docs is:

  • I am specifying this config globally from within dbt_project.yml
  • I am actually calling a custom macro which then makes a call to dbt_utils.get_column_values

dbt_project.yml:

models:
  my_project:
    my_sub_folder:
      +post-hook: "{{ grant_row_level_access({{ this }}) }}"

macros/grant_row_level_access.sql:

{% macro grant_row_level_access(table) -%}
  {% set column = "`APClient`" %}
  {% for client in dbt_utils.get_column_values(table=table, column=column) %}
    CREATE ROW ACCESS POLICY IF NOT EXISTS {{ client }}_filter
    ON {{ table }}
    GRANT TO ("group:ap-{{ client|lower }}[email protected]")
    FILTER USING ({{ column }} = '{{ client }}');
  {% endfor %}
{%- endmacro %}

However, at run-time, IF this is the first time this model is being created, I get this output:

Completed with 1 error and 0 warnings:
10:37:42
10:37:42  Compilation Error in model mrt_all_clients__tv__taxonomy (models/marts/cross_client/channel/tv/all/mrt_all_clients__tv__taxonomy.sql)
10:37:42    In get_column_values(): relation `mlg-apollo-data-prod`.`DBT_ADAM`.`TV_TAXONOMY` does not exist and no default value was provided.

So dbt_utils is still doing some "check" against dbt graph and semi-correctly identifying that it doesn't exist in the updated graph yet - despite the fact that the database relation DOES exist!
Maybe it needs double nesting?!

Expected Behavior

No error. The database relation DOES exist.

Steps To Reproduce

In a model's post-hook, call a custom macro that utilises dbt_utils.get_column_values

Relevant log output

No response

Environment

- OS: Ubuntu 20.04
- Python: python 3.9.10
- dbt: 1.3.0

Which database adapter are you using with dbt?

bigquery

Additional Context

No response

@adamcunnington-mlg adamcunnington-mlg added bug Something isn't working triage labels Nov 15, 2022
@github-actions github-actions bot changed the title [Bug] Nnested {{ this }} not working as expected within macro [CT-1506] [Bug] Nnested {{ this }} not working as expected within macro Nov 15, 2022
@adamcunnington-mlg adamcunnington-mlg changed the title [CT-1506] [Bug] Nnested {{ this }} not working as expected within macro [CT-1506] [Bug] Nested {{ this }} not working as expected within macro Nov 15, 2022
@adamcunnington-mlg
Copy link
Author

@jtcohen6 please excuse the highlight but given your response to the linked issue, I have a feeling you'll intuitively know what is going on here!

@jtcohen6
Copy link
Contributor

@adamcunnington-mlg Thanks for providing the reproduction case. I share your suspicion on this being a bug, versus the intersection of a few tricky moving pieces. If I'm understanding right, it's really about:

  • passing {{ this }} as an argument to a late-rendered macro within post-hook
  • calling dbt_utils.get_column_values on a model right after it exists

To confirm, are you seeing this error when you dbt run -s mrt_all_clients__tv__taxonomy for the first time? Or when you run dbt compile, before mrt_all_clients__tv__taxonomy has been successfully built?

I tried putting together an even simpler reproduction case (on Postgres for convenience), and I wasn't able to:

-- models/model_a.sql
select 1 as id
union all
select 2 as id
-- macros/some_macro.sql
{% macro select_from_table(table) -%}
  -- this is my post hook
  {% for value in dbt_utils.get_column_values(table=table, column="id") %}
    select '{{ value }}' from {{ table }}
    {{ "union all" if not loop.last }}
  {% endfor %}
{%- endmacro %}
models:
  +post-hook: "{{ select_from_table( this ) }}"
$ psql
jerco=# drop view dbt_jcohen.model_a;
DROP VIEW

Everything runs successfully (SQL copied from logs/dbt.log):

  create view "jerco"."dbt_jcohen"."model_a__dbt_tmp" as (
    select 1 as id
union all
select 2 as id
  );
select
                id as value

            from "jerco"."dbt_jcohen"."model_a"

            

            group by id
            order by count(*) desc
        -- this is my post hook
  
    select '1' from "jerco"."dbt_jcohen"."model_a"
    union all
  
    select '2' from "jerco"."dbt_jcohen"."model_a"

@jtcohen6
Copy link
Contributor

jtcohen6 commented Nov 17, 2022

Update: I was able to reproduce this after switching to dbt-bigquery.

The error is the one being returned by the dbt_utils.get_column_values macro here, where it's actually doing a cache lookup of the relation to see if it exists. dbt updates the cache at the end of the materialization macro, before post-hooks run. But then, why does this work on Postgres?

I'll share more details as I figure them out.

@jtcohen6
Copy link
Contributor

I was also able to reproduce this issue on dbt-snowflake.

Postgres might be the special case here. Its adapter cache does work slightly differently, because of the need to link dependent relations (bound views).

Summary

  • dbt_utils.get_column_values makes a cache check to confirm that the relation actually exists. This is in order to raise a more helpful error, before actually submitting the query.
  • dbt updates the cache after each model has finished materializing. Post hooks are part of the model materialization.
  • Because it's being run inside a post-hook, the first time a model builds, the cache has not yet been updated, and it raises the error.

There are two potential resolutions here:

  1. Enable dbt-core to update the cache before running post-hooks. I'm not sure exactly how we'd accomplish this, given the interplay of Jinja (materialization code) and Python (model runner / task code).
  2. Remove the cache check + exception handling from dbt_utils.get_column_values. (In the meantime, you can work around this issue by extracting just the bits you want from that macro, minus the cache check + exception.)

@jtcohen6 jtcohen6 removed the triage label Nov 17, 2022
@adamcunnington-mlg
Copy link
Author

@jtcohen6 thank you so much for the thorough investigation and detail. I was in some back to back meets so missed your progress in between.

Thank you for the report - we'll use the workaround for now whilst we awaiting a conclusion on which resolution is best.

Edge cases eh?!

@jtcohen6
Copy link
Contributor

jtcohen6 commented Feb 3, 2023

Closing (for now) in favor of an update to documentation: dbt-labs/docs.getdbt.com#2818

@jtcohen6 jtcohen6 closed this as not planned Won't fix, can't repro, duplicate, stale Feb 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants