You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
DBT ingestion deals with three main URNs during ingestion: the model name, the table name in source platform (bigquery, snowflake, etc.) and the column name. During ingestion, the table identifier is getting lowercased and the others are not.
model_name preserves what is being used at the manifest:
I'm using DBT with Snowflake and this is causing two bug scenarios:
When I do snowflake ingestion with convert_urns_to_lowercase=False:
Then I get a mismatch between the table identifier urn coming from snowflake (not being lowercased) with the table identifier from DBT (db_fqn), that is actually being lowercased. So nodes are not matching.
When I do snowflake ingestion with convert_urns_to_lowercase=True:
Now table identifiers are matching (both URNs are lowercased), but then the column identifiers mismatch. Because the snowflake ingestion will convert the column URNs to lowercase, while DBT preserves column casing. And the result is a schema view with duplicated columns (lower and uppercase).
To Reproduce
Steps to reproduce the behavior:
Ingest DBT using snowflake as the target_platform
Ingest snowflake metadata with convert_urns_to_lowercase=False. Observe the first bug described.
Ingest snowflake metadata with convert_urns_to_lowercase=True. Observe the second bug described.
Expected behavior
Columns and tables identifiers should match between DBT and source platform. Ideally, casing in DBT should be consistent - either lowercasing everything or preserving case in every identifier.
As I suggestion, I can submit a PR to introduce a convert_urns_to_lowercase flag to the DBT recipe as well, so users can decide if they want to lowercase or not every identifier. At least to make the behavior consistent.
Screenshots
Column names being duplicated when using convert_urns_to_lowercase=True in Snowflake ingestion.
The text was updated successfully, but these errors were encountered:
I am having the same issue starting with v0.10.0 where dbt and Snowflake columns are showing twice in upper and lower cases (with convert_urns_to_lowercase=true in Snowflake config), this looks like a regression that was introduced by #7063? Not having this issue with v0.9.5.
Describe the bug
DBT ingestion deals with three main URNs during ingestion: the model name, the table name in source platform (bigquery, snowflake, etc.) and the column name. During ingestion, the table identifier is getting lowercased and the others are not.
datahub/metadata-ingestion/src/datahub/ingestion/source/dbt/dbt_core.py
Lines 138 to 148 in 1df806d
datahub/metadata-ingestion/src/datahub/ingestion/source/dbt/dbt_common.py
Line 400 in aa388f0
datahub/metadata-ingestion/src/datahub/ingestion/source/dbt/dbt_core.py
Line 113 in 1df806d
I'm using DBT with Snowflake and this is causing two bug scenarios:
When I do snowflake ingestion with
convert_urns_to_lowercase=False
:Then I get a mismatch between the table identifier urn coming from snowflake (not being lowercased) with the table identifier from DBT (db_fqn), that is actually being lowercased. So nodes are not matching.
When I do snowflake ingestion with
convert_urns_to_lowercase=True
:Now table identifiers are matching (both URNs are lowercased), but then the column identifiers mismatch. Because the snowflake ingestion will convert the column URNs to lowercase, while DBT preserves column casing. And the result is a schema view with duplicated columns (lower and uppercase).
To Reproduce
Steps to reproduce the behavior:
snowflake
as the target_platformconvert_urns_to_lowercase=False
. Observe the first bug described.convert_urns_to_lowercase=True
. Observe the second bug described.Expected behavior
Columns and tables identifiers should match between DBT and source platform. Ideally, casing in DBT should be consistent - either lowercasing everything or preserving case in every identifier.
As I suggestion, I can submit a PR to introduce a
convert_urns_to_lowercase
flag to the DBT recipe as well, so users can decide if they want to lowercase or not every identifier. At least to make the behavior consistent.Screenshots
Column names being duplicated when using
convert_urns_to_lowercase=True
in Snowflake ingestion.The text was updated successfully, but these errors were encountered: