Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent URN casing in DBT ingestion #7377

Closed
alex-magno opened this issue Feb 20, 2023 · 1 comment · Fixed by #7418
Closed

Inconsistent URN casing in DBT ingestion #7377

alex-magno opened this issue Feb 20, 2023 · 1 comment · Fixed by #7418
Labels
bug Bug report

Comments

@alex-magno
Copy link
Contributor

alex-magno commented Feb 20, 2023

Describe the bug
DBT ingestion deals with three main URNs during ingestion: the model name, the table name in source platform (bigquery, snowflake, etc.) and the column name. During ingestion, the table identifier is getting lowercased and the others are not.

I'm using DBT with Snowflake and this is causing two bug scenarios:

  1. When I do snowflake ingestion with convert_urns_to_lowercase=False:
    Then I get a mismatch between the table identifier urn coming from snowflake (not being lowercased) with the table identifier from DBT (db_fqn), that is actually being lowercased. So nodes are not matching.

  2. When I do snowflake ingestion with convert_urns_to_lowercase=True:
    Now table identifiers are matching (both URNs are lowercased), but then the column identifiers mismatch. Because the snowflake ingestion will convert the column URNs to lowercase, while DBT preserves column casing. And the result is a schema view with duplicated columns (lower and uppercase).

To Reproduce
Steps to reproduce the behavior:

  1. Ingest DBT using snowflake as the target_platform
  2. Ingest snowflake metadata with convert_urns_to_lowercase=False. Observe the first bug described.
  3. Ingest snowflake metadata with convert_urns_to_lowercase=True. Observe the second bug described.

Expected behavior
Columns and tables identifiers should match between DBT and source platform. Ideally, casing in DBT should be consistent - either lowercasing everything or preserving case in every identifier.

As I suggestion, I can submit a PR to introduce a convert_urns_to_lowercase flag to the DBT recipe as well, so users can decide if they want to lowercase or not every identifier. At least to make the behavior consistent.

Screenshots
image
Column names being duplicated when using convert_urns_to_lowercase=True in Snowflake ingestion.

@remisalmon
Copy link
Contributor

I am having the same issue starting with v0.10.0 where dbt and Snowflake columns are showing twice in upper and lower cases (with convert_urns_to_lowercase=true in Snowflake config), this looks like a regression that was introduced by #7063? Not having this issue with v0.9.5.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug report
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants