Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug/databricks sql incremental #144

Merged
merged 17 commits into from
Aug 1, 2024

Conversation

fivetran-catfritz
Copy link
Contributor

@fivetran-catfritz fivetran-catfritz commented Jul 24, 2024

PR Overview

This PR will address the following Issue/Feature:

  • T-746924

This PR will result in the following new package version:

  • v0.18.0 since we're changing materializations

Please provide the finalized CHANGELOG entry which details the relevant changes included in this PR:

🚨 Breaking Changes 🚨

⚠️ Since the following changes result in the table format changing, we recommend running a --full-refresh after upgrading to this version to avoid possible incremental failures.

  • For Databricks All-Purpose clusters, incremental models will now be materialized using the delta table format (previously parquet).

    • Delta tables are generally more performant than parquet and are also more widely available for Databricks users. This will also prevent compilation issues on customers' managed tables.
  • For Databricks SQL Warehouses, incremental materialization will not be used due to the incompatibility of the insert_overwrite strategy.

Under the Hood

  • The is_databricks_sql_warehouse has been added to return true if the Databricks runtime being used is an all-purpose cluster or if any other Databricks non-supported destination is being used.
    • This update was applied as there have been other Databricks runtimes discovered (ie. an endpoint and external runtime) which do not support the insert_overwrite incremental strategy used.
  • Added integration testing for Databricks SQL Warehouse.
  • Added a bot to mark issues and PRs as stale if there is no activity for over 180 days.

PR Checklist

Basic Validation

Please acknowledge that you have successfully performed the following commands locally:

  • dbt run –full-refresh && dbt test
  • dbt run (if incremental models are present) && dbt test

Before marking this PR as "ready for review" the following have been applied:

  • The appropriate issue has been linked, tagged, and properly assigned
  • All necessary documentation and version upgrades have been applied
  • docs were regenerated (unless this PR does not include any code or yml updates)
  • BuildKite integration tests are passing
  • Detailed validation steps have been provided below

Detailed Validation

Please share any and all of your validation steps:

Consistency tests pass

  • Screenshot 2024-07-25 at 12 23 01 PM

Parquet file format

  • Before addressing the insert_overwrite incompatibility, our Databricks SQL instance is "managed", so I was getting the below error for the parquet file format.
    • Screenshot 2024-07-24 at 12 31 28 PM
  • Updating to delta format resolves this issue.

Insert-overwrite incremental strategy

  • Once the file format issue was resolved, I was able to reproduce the reported issue and get the below error.

    • Screenshot 2024-07-24 at 12 39 48 PM
  • Updating with the materialized='incremental' if is_incremental_compatible() else 'table' approach resolved this issue. When running a non-full-refresh run with a target of Databricks SQL, confirm there is no error and tables are created instead of and incremental run.

    • Screenshot 2024-07-24 at 1 00 41 PM

If you had to summarize this PR in an emoji, which would it be?

🤖

@fivetran-catfritz fivetran-catfritz self-assigned this Jul 24, 2024
Copy link
Contributor

@fivetran-joemarkiewicz fivetran-joemarkiewicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fivetran-catfritz thanks for quickly turning around this PR! I have a few comments and requests in my review. Additionally, would you be able to add consistency tests for the end models modified in this review. Thanks!

CHANGELOG.md Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
integration_tests/dbt_project.yml Outdated Show resolved Hide resolved
macros/is_incremental_compatible.sql Outdated Show resolved Hide resolved
.github/workflows/stale.yml Outdated Show resolved Hide resolved
Copy link
Contributor Author

@fivetran-catfritz fivetran-catfritz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fivetran-joemarkiewicz Thanks for the suggestions. I have applied them and also added the consistency tests, which I also added a screencap that they pass. This is ready for re-review!

CHANGELOG.md Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
integration_tests/dbt_project.yml Outdated Show resolved Hide resolved
.github/workflows/stale.yml Outdated Show resolved Hide resolved
macros/is_incremental_compatible.sql Outdated Show resolved Hide resolved
Copy link
Contributor

@fivetran-joemarkiewicz fivetran-joemarkiewicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fivetran-catfritz great work on this PR! I ran through the changes and your validations look great. No notes around the functionality of the PR, but I did come across one request that I realized when taking a look at your Mixpanel PR. I would like us to apply to the macro moving forward (adding the adapter.dispatch and the alias to when we call the macro).

Let me know once that update is applied and that re-review should be good for approval!

CHANGELOG.md Outdated Show resolved Hide resolved
packages.yml Outdated
Comment on lines 2 to 6
# - package: fivetran/hubspot_source
# version: [">=0.15.0", "<0.16.0"]
- git: https://github.com/fivetran/dbt_hubspot_source.git
revision: update/get-ticket-property-history-columns
warn-unpinned: false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reminder to swap before release

macros/is_incremental_compatible.sql Outdated Show resolved Hide resolved
Copy link
Contributor Author

@fivetran-catfritz fivetran-catfritz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fivetran-joemarkiewicz I have updated with your suggestions, and this is ready for re-review.

CHANGELOG.md Outdated Show resolved Hide resolved
macros/is_incremental_compatible.sql Outdated Show resolved Hide resolved
Copy link
Contributor

@fivetran-joemarkiewicz fivetran-joemarkiewicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fivetran-catfritz great work on this PR and for addressing my review notes. This PR looks good to go! I do have one open question that can be addressed as a team, but curious if you have any insight into it.

This will be good for release review! Be sure to swap the package dep before merging.

@@ -0,0 +1,35 @@
{% macro is_incremental_compatible() -%}
{{ return(adapter.dispatch('is_incremental_compatible', 'hubspot') ()) }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we will sometimes provide the return here, but other times we will not. Do you know what the difference is (if there is any)?

Looking at dbt_utils I can see they always provide a return here. However, a spot check on other package maintainers and our own Fivetran Utils I see that a mix of versions are provided. 🤔

Since we can see this is what is used by dbt_utils I would prefer we keep your code. However, I would like to explore what the difference between having return and not causes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fivetran-joemarkiewicz I recall having discussing this as a team a while back, and that the conclusion was we should use it, but I'm not sure what the reason was nor can I find a reference to it. I also realize that several of our macros don't use it and seem to function fine, and I've forgotten to use it recently as well.

export CI_DATABRICKS_DBT_CATALOG=$(gcloud secrets versions access latest --secret="CI_DATABRICKS_DBT_CATALOG" --project="dbt-package-testing-363917")
export CI_DATABRICKS_SQL_DBT_HTTP_PATH=$(gcloud secrets versions access latest --secret="CI_DATABRICKS_SQL_DBT_HTTP_PATH" --project="dbt-package-testing-363917")
Copy link
Contributor

@fivetran-avinash fivetran-avinash Jul 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 26/27 are not present in the equivalent source package pre-command, should they be added or is this outside scope.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

bash .buildkite/scripts/run_models.sh databricks

- label: ":databricks: :database: Run Tests - Databricks SQL Warehouse"
Copy link
Contributor

@fivetran-avinash fivetran-avinash Jul 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add this support to the source package as well or is this out of scope?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

Copy link
Contributor

@fivetran-avinash fivetran-avinash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fivetran-catfritz This looks mostly good! A few comments before approving.

(cc: @fivetran-joemarkiewicz if you are taking this on.)

Copy link
Contributor

@fivetran-avinash fivetran-avinash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

packages.yml Outdated Show resolved Hide resolved
@fivetran-catfritz fivetran-catfritz merged commit 86b93f6 into main Aug 1, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants