Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(ingest): Use lower-case dataset names in the dataset urns for all SQL-styled datasets. #4140

Conversation

rslanka
Copy link
Contributor

@rslanka rslanka commented Feb 14, 2022

Converts the dataset name to lower-case in the dataset urns for all SQL-styled datasets.

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable)

@github-actions
Copy link

github-actions bot commented Feb 14, 2022

Unit Test Results (metadata ingestion)

    3 files  ±  0      3 suites  ±0   43m 20s ⏱️ - 1m 10s
332 tests +15  332 ✔️ +17    0 💤 ±0  0  - 2 
953 runs  +45  924 ✔️ +47  29 💤 ±0  0  - 2 

Results for commit f941f3b. ± Comparison against base commit d33a868.

♻️ This comment has been updated with latest results.

@github-actions
Copy link

github-actions bot commented Feb 14, 2022

Unit Test Results (build & test)

  70 files    70 suites   14m 5s ⏱️
609 tests 550 ✔️ 59 💤 0

Results for commit f941f3b.

♻️ This comment has been updated with latest results.

Comment on lines +38 to +51
SQL_STYLE_PLATFORMS: Set[str] = {
"athena",
"bigquery",
"druid",
"hive",
"mariadb",
"mssql",
"mysql",
"oracle",
"postgres",
"redshift",
"snowflake",
"trino",
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any more generic way to specify sql systems? What's the plan for making sure this set stays updated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So long as we use the mce_builder to mint urns, there is no fool-proof way to ensure this, except via code-reviews. The other way would be to provide an urn minting factory that is customizable per source. That's a much bigger change, and we should do it at some point in future.

if platform_instance:
# Use lower-case name for all SQL style datasets
if platform in SQL_STYLE_PLATFORMS:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need to use lower-case for platform instance

@rslanka rslanka changed the title fix(ingest): Use lower-case dataset and instance names in the dataset urns for all SQL-styled datasets. fix(ingest): Use lower-case dataset names in the dataset urns for all SQL-styled datasets. Feb 15, 2022
Copy link
Collaborator

@jjoyce0510 jjoyce0510 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we expect that folks have previously ingested non-case sensitive datasets? This change will mint new datasets altogether, without any migration path. Do we have reason to believe this won't be disruptive?

@shirshanka shirshanka merged commit 6c75185 into datahub-project:master Feb 17, 2022
hevandro-veiga pushed a commit to hevandro-veiga/datahub that referenced this pull request Feb 18, 2022
rslanka added a commit to rslanka/datahub that referenced this pull request Feb 22, 2022
shirshanka pushed a commit that referenced this pull request Feb 23, 2022
… for all SQL-styled datasets. (#4140)" (#4218)

This reverts commit 6c75185.
maggiehays pushed a commit to maggiehays/datahub that referenced this pull request Aug 1, 2022
maggiehays pushed a commit to maggiehays/datahub that referenced this pull request Aug 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants