-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(ingest): Use lower-case dataset names in the dataset urns for all SQL-styled datasets. #4140
fix(ingest): Use lower-case dataset names in the dataset urns for all SQL-styled datasets. #4140
Conversation
… SQL-styled datasets.
SQL_STYLE_PLATFORMS: Set[str] = { | ||
"athena", | ||
"bigquery", | ||
"druid", | ||
"hive", | ||
"mariadb", | ||
"mssql", | ||
"mysql", | ||
"oracle", | ||
"postgres", | ||
"redshift", | ||
"snowflake", | ||
"trino", | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any more generic way to specify sql systems? What's the plan for making sure this set stays updated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So long as we use the mce_builder to mint urns, there is no fool-proof way to ensure this, except via code-reviews. The other way would be to provide an urn minting factory that is customizable per source. That's a much bigger change, and we should do it at some point in future.
if platform_instance: | ||
# Use lower-case name for all SQL style datasets | ||
if platform in SQL_STYLE_PLATFORMS: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no need to use lower-case for platform instance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we expect that folks have previously ingested non-case sensitive datasets? This change will mint new datasets altogether, without any migration path. Do we have reason to believe this won't be disruptive?
… SQL-styled datasets. (datahub-project#4140)
… for all SQL-styled datasets. (datahub-project#4140)" This reverts commit 6c75185.
… SQL-styled datasets. (datahub-project#4140)
… for all SQL-styled datasets. (datahub-project#4140)" (datahub-project#4218) This reverts commit 6c75185.
Converts the dataset name to lower-case in the dataset urns for all SQL-styled datasets.
Checklist