Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Remove catalog from the DDL SQL generated by on_schema_change=sync_all_columns #684

Merged
merged 4 commits into from
Jul 5, 2024

Conversation

iconara
Copy link
Contributor

@iconara iconara commented Jul 4, 2024

Description

This resolves #677

When generating Hive DDL, the relation spec must not include the catalog name (i.e. "awsdatacatalog"), only the database (schema) and table name. #677 discovered that the DDL generated when on_schema_change is set to sync_all_columns generates alter table awsdatacatalog.schema_name.table_name replace columns (…) (with backticks around the relation spec components, but I don't know how to make Markdown include literal backticks). This DDL fails with a syntax error.

I don't know why all DDL doesn't fail, why this was only a problem when on_schema_change is set to sync_all_columns. For example, I can see multiple ALTER TABLE … SET TBLPROPERTIES … that are run when I run the functional tests, and none of them have the catalog in the relation spec.

My fix makes sure that the catalog is never included when rendering relation names in Hive DDL.

There were no functional tests that covered on_schema_change, and since I had to add one to figure out why #677 happened, I added tests for all four modes, even though only the sync_all_columns mode changes in this PR.

Checklist

  • You followed contributing section
  • You kept your Pull Request small and focused on a single feature or bug fix.
  • You added unit testing when necessary
  • You added functional testing when necessary

from dbt.contracts.results import RunStatus
from dbt.tests.util import run_dbt, run_dbt_and_capture

models__table_base_model = """
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great test, how about replicating it for iceberg tables as well, just to be sure that we don't break anything in there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I parameterized the tests so that they now run once with table_type=hive and once with table_type=iceberg. The file name was "test_hive_on_schema_change" and I renamed it to "test_on_schema_change" since it covers both table types.

@nicor88
Copy link
Contributor

nicor88 commented Jul 4, 2024

@iconara thanks for your PR, there are some issues with some functional tests, could you have a look at them?

iconara added 2 commits July 5, 2024 09:54
Using on_schema_change=sync_all_columns would result in an ALTER TABLE … REPLACE COLUMNS that had a relation that included the catalog, e.g. `awsdatacatalog`.`some_schema`.`some_table`. Hive DDL does not support this three-component relation specification, only `some_schema`.`some_table` is supported.

This change removes the database (i.e. `awsdatacatalog`) component from all Hive DDL.
The previous commit included tests for the sync_all_columns mode, since that had a bug. There are no functional tests that cover the on_schema_change modes, so this adds coverage for the other three modes.

I also added an additional check to ensure the column names have actually been changed (or not changed).
@iconara iconara force-pushed the fix-alter-table-replace-columns branch from 2b458fc to d60b2fb Compare July 5, 2024 08:19
@iconara
Copy link
Contributor Author

iconara commented Jul 5, 2024

I fixed the functional test that failed because of my change, but now a different functional test is failing. It's not failing locally, and I can't figure out why it fails from the output. It's TestIcebergRetriesDisabled.test__retries_iceberg that fails, and it seems to fail because the run doesn't fail as expected.

Parameterize the on_schema_change functional tests to run once with table_type=hive and once with table_type=iceberg so that both modes are covered.

Since the file is called "test_hive_…" rename it without "hive_".
@iconara iconara force-pushed the fix-alter-table-replace-columns branch from d60b2fb to 644f69a Compare July 5, 2024 08:40
@iconara
Copy link
Contributor Author

iconara commented Jul 5, 2024

I forced the tests to re-run by touching the last commit and they succeeded this time. Looks like Iceberg retry tests are not stable.

@nicor88
Copy link
Contributor

nicor88 commented Jul 5, 2024

I forced the tests to re-run by touching the last commit and they succeeded this time. Looks like Iceberg retry tests are not stable.

ah, I see, I tried to address this issue adding more retries, but as I was in a rush yesterday I didn't spot what went wrong.

Copy link
Contributor

@nicor88 nicor88 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job @iconara

@nicor88 nicor88 merged commit 47461bf into dbt-labs:main Jul 5, 2024
10 checks passed
kodiakhq bot referenced this pull request in cloudquery/policies Sep 1, 2024
This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [dbt-athena-community](https://togithub.com/dbt-athena/dbt-athena) | patch | `==1.8.3` -> `==1.8.4` |

---

### Release Notes

<details>
<summary>dbt-athena/dbt-athena (dbt-athena-community)</summary>

### [`v1.8.4`](https://togithub.com/dbt-athena/dbt-athena/releases/tag/v1.8.4)

[Compare Source](https://togithub.com/dbt-athena/dbt-athena/compare/v1.8.3...v1.8.4)

#### What's Changed

##### Fixes

-   fix: Remove catalog from the DDL SQL generated by on_schema_change=sync_all_columns by [@&#8203;iconara](https://togithub.com/iconara) in [https://github.com/dbt-athena/dbt-athena/pull/684](https://togithub.com/dbt-athena/dbt-athena/pull/684)
-   fix: Query comment for create table statement by [@&#8203;sanromeo](https://togithub.com/sanromeo) in [https://github.com/dbt-athena/dbt-athena/pull/702](https://togithub.com/dbt-athena/dbt-athena/pull/702)
-   fix: remove leading whitespaces on post-hook operations by [@&#8203;sanromeo](https://togithub.com/sanromeo) in [https://github.com/dbt-athena/dbt-athena/pull/705](https://togithub.com/dbt-athena/dbt-athena/pull/705)
-   fix: vacuum more runs needed error by [@&#8203;Jrmyy](https://togithub.com/Jrmyy) in [https://github.com/dbt-athena/dbt-athena/pull/703](https://togithub.com/dbt-athena/dbt-athena/pull/703)

##### Dependencies

-   chore: Update dbt-tests-adapter requirement from ~=1.9.1 to ~=1.9.2 by [@&#8203;dependabot](https://togithub.com/dependabot) in [https://github.com/dbt-athena/dbt-athena/pull/687](https://togithub.com/dbt-athena/dbt-athena/pull/687)
-   chore: Update pytest requirement from ~=8.2 to ~=8.3 by [@&#8203;dependabot](https://togithub.com/dependabot) in [https://github.com/dbt-athena/dbt-athena/pull/690](https://togithub.com/dbt-athena/dbt-athena/pull/690)
-   chore: Update pyupgrade requirement from ~=3.16 to ~=3.17 by [@&#8203;dependabot](https://togithub.com/dependabot) in [https://github.com/dbt-athena/dbt-athena/pull/692](https://togithub.com/dbt-athena/dbt-athena/pull/692)
-   chore: Update tenacity requirement from ~=8.2 to >=8.2,<10.0 by [@&#8203;dependabot](https://togithub.com/dependabot) in [https://github.com/dbt-athena/dbt-athena/pull/693](https://togithub.com/dbt-athena/dbt-athena/pull/693)
-   chore: Update black requirement from ~=24.4 to ~=24.8 by [@&#8203;dependabot](https://togithub.com/dependabot) in [https://github.com/dbt-athena/dbt-athena/pull/694](https://togithub.com/dbt-athena/dbt-athena/pull/694)
-   chore: Update boto3-stubs\[s3] requirement from ~=1.34 to ~=1.35 by [@&#8203;dependabot](https://togithub.com/dependabot) in [https://github.com/dbt-athena/dbt-athena/pull/707](https://togithub.com/dbt-athena/dbt-athena/pull/707)
-   chore: Update moto requirement from ~=5.0.12 to ~=5.0.13 by [@&#8203;dependabot](https://togithub.com/dependabot) in [https://github.com/dbt-athena/dbt-athena/pull/708](https://togithub.com/dbt-athena/dbt-athena/pull/708)
-   chore: Update pyparsing requirement from ~=3.1.2 to ~=3.1.4 by [@&#8203;dependabot](https://togithub.com/dependabot) in [https://github.com/dbt-athena/dbt-athena/pull/709](https://togithub.com/dbt-athena/dbt-athena/pull/709)

#### New Contributors

-   [@&#8203;iconara](https://togithub.com/iconara) made their first contribution in [https://github.com/dbt-athena/dbt-athena/pull/684](https://togithub.com/dbt-athena/dbt-athena/pull/684)

**Full Changelog**: dbt-labs/dbt-athena@v1.8.3...v1.8.4

</details>

---

### Configuration

📅 **Schedule**: Branch creation - "before 4am on the first day of the month" (UTC), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Renovate Bot](https://togithub.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy40NDAuNyIsInVwZGF0ZWRJblZlciI6IjM3LjQ0MC43IiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJhdXRvbWVyZ2UiXX0=-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] Using Data Source name in Replace Columns statement doesn't work
2 participants