-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(ingest/dbt): introduce lowercase column urn option #7418
fix(ingest/dbt): introduce lowercase column urn option #7418
Conversation
) -> str: | ||
db_fqn = self.get_db_fqn() | ||
if target_platform != DBT_PLATFORM: | ||
if (target_platform != DBT_PLATFORM) and convert_urns_to_lowercase: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part concerns me because it won't assure backwards compatibility. If convert_urns_to_lowercase=False
(default), then db_fqn
won't get lowercased. It may cause problems in setups where DBT ingestion is already running. Possible mismatch to source platform nodes? 🤔
Let me know what you think and if we can do it differently. The other parts of the code keep backwards compatibility when convert_urns_to_lowercase=False
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to check this in more detail, but I thought this was supposed to lowercase all outbound lineage edges by default. Basically dbt nodes retain their original casing, but their lineage to the snowflake/bigquery/etc tables would always be lowercased
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hsheth2 nice! If you confirm that outbound lineages (db_fqn
, in this case) need to be lowercased by default, its just a matter of removing this condition, reverting it back to the original behavior 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make sure I understand, what was the issue you were seeing that necessitated this change? Was it that the overall lineage didn't match up, or was it related to column-level lineage?
Having looked over #7063 and #7377 again, I think I understand what's going on. The model_name is only used internally in the dbt sources, so that one should be fine to leave as-is. The #7063 definitely did introduce a regression. That regression is only specific to Snowflake. In the Snowflake source, convert_urns_to_lowercase applies to all urns, including the schemaField references. For BigQuery / other sources, convert_urns_to_lowercase only applies to dataset urns. All that said, I think what we actually need is a Let me know what you think @alex-magno. cc @remisalmon, since you also commented on #7377. |
This last point makes sense considering that dbt itself has a different default for Snowflake specifically: https://docs.getdbt.com/reference/project-configs/quoting#default (the |
Thanks for the input @hsheth2 and @remisalmon! It definitely makes sense. Personally I would prefer setting this type of change explicitly rather than implicitly - meaning that I would go for a Let me know if you all agree and I'm happy to do it. I will also add it for |
Let's go ahead and add the I agree that it makes sense to match the defaults of dbt here. Basically if the user explicitly sets convert_column_urns_to_lowercase, then we respect that. If they haven't set it, then we default it to true when Re: dbt cloud - because dbt core and dbt cloud share the common logic, your PR already adds support for dbt-cloud too :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alex-magno made some tweaks, but otherwise LGTM
- do the lowercasing on the way out instead of the way in
- use a pydantic validator to handle the snowflake special casing
…ct#7418) Co-authored-by: Harshal Sheth <[email protected]>
…ct#7418) Co-authored-by: Harshal Sheth <[email protected]>
…ct#7418) Co-authored-by: Harshal Sheth <[email protected]>
This PR tries to solve the inconsistent URN casing issue with DBT. Closes #7377
The approach here is to introduce a
convert_urns_to_lowercase
flag, similar to what we have in the Snowflake ingestion. So the user of the recipe can have the option to force all URNs to lowercase.Also, this is my first time contributing to Datahub so please review carefully and give feedback 🙏
Checklist