Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug/databricks sql incremental #144

Merged
merged 17 commits into from
Aug 1, 2024
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .buildkite/hooks/pre-command
Original file line number Diff line number Diff line change
Expand Up @@ -22,4 +22,6 @@ export CI_SNOWFLAKE_DBT_WAREHOUSE=$(gcloud secrets versions access latest --secr
export CI_DATABRICKS_DBT_HOST=$(gcloud secrets versions access latest --secret="CI_DATABRICKS_DBT_HOST" --project="dbt-package-testing-363917")
export CI_DATABRICKS_DBT_HTTP_PATH=$(gcloud secrets versions access latest --secret="CI_DATABRICKS_DBT_HTTP_PATH" --project="dbt-package-testing-363917")
export CI_DATABRICKS_DBT_TOKEN=$(gcloud secrets versions access latest --secret="CI_DATABRICKS_DBT_TOKEN" --project="dbt-package-testing-363917")
export CI_DATABRICKS_DBT_CATALOG=$(gcloud secrets versions access latest --secret="CI_DATABRICKS_DBT_CATALOG" --project="dbt-package-testing-363917")
export CI_DATABRICKS_DBT_CATALOG=$(gcloud secrets versions access latest --secret="CI_DATABRICKS_DBT_CATALOG" --project="dbt-package-testing-363917")
export CI_DATABRICKS_SQL_DBT_HTTP_PATH=$(gcloud secrets versions access latest --secret="CI_DATABRICKS_SQL_DBT_HTTP_PATH" --project="dbt-package-testing-363917")
Copy link
Contributor

@fivetran-avinash fivetran-avinash Jul 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 26/27 are not present in the equivalent source package pre-command, should they be added or is this outside scope.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

export CI_DATABRICKS_SQL_DBT_TOKEN=$(gcloud secrets versions access latest --secret="CI_DATABRICKS_SQL_DBT_TOKEN" --project="dbt-package-testing-363917")
17 changes: 16 additions & 1 deletion .buildkite/pipeline.yml
Original file line number Diff line number Diff line change
Expand Up @@ -71,4 +71,19 @@ steps:
- "CI_DATABRICKS_DBT_TOKEN"
- "CI_DATABRICKS_DBT_CATALOG"
commands: |
bash .buildkite/scripts/run_models.sh databricks
bash .buildkite/scripts/run_models.sh databricks

- label: ":databricks: :database: Run Tests - Databricks SQL Warehouse"
Copy link
Contributor

@fivetran-avinash fivetran-avinash Jul 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add this support to the source package as well or is this out of scope?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

key: "run_dbt_databricks_sql"
plugins:
- docker#v3.13.0:
image: "python:3.8"
shell: [ "/bin/bash", "-e", "-c" ]
environment:
- "BASH_ENV=/tmp/.bashrc"
- "CI_DATABRICKS_DBT_HOST"
- "CI_DATABRICKS_SQL_DBT_HTTP_PATH"
- "CI_DATABRICKS_SQL_DBT_TOKEN"
- "CI_DATABRICKS_DBT_CATALOG"
commands: |
bash .buildkite/scripts/run_models.sh databricks-sql
10 changes: 9 additions & 1 deletion .buildkite/scripts/run_models.sh
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,14 @@ db=$1
echo `pwd`
cd integration_tests
dbt deps
if [ "$db" = "databricks-sql" ]; then
dbt seed --vars '{hubspot_schema: hubspot_sqlw_tests}' --target "$db" --full-refresh
dbt compile --vars '{hubspot_schema: hubspot_sqlw_tests}' --target "$db"
dbt run --vars '{hubspot_schema: hubspot_sqlw_tests}' --target "$db" --full-refresh
dbt test --vars '{hubspot_schema: hubspot_sqlw_tests}' --target "$db"
dbt run --vars '{hubspot_schema: hubspot_sqlw_tests, hubspot_marketing_enabled: true, hubspot_contact_merge_audit_enabled: true, hubspot_sales_enabled: false}' --target "$db"
dbt run --vars '{hubspot_schema: hubspot_sqlw_tests, hubspot_marketing_enabled: false, hubspot_sales_enabled: true, hubspot_merged_deal_enabled: true, hubspot__pass_through_all_columns: true, hubspot_using_all_email_events: false, hubspot_owner_enabled: false}' --target "$db"
else
dbt seed --target "$db" --full-refresh
dbt compile --target "$db" --select hubspot # source does not compile at this time
dbt run --target "$db" --full-refresh
Expand All @@ -26,5 +34,5 @@ dbt test --target "$db"
dbt run --vars '{hubspot_marketing_enabled: true, hubspot_contact_merge_audit_enabled: true, hubspot_sales_enabled: false}' --target "$db" --full-refresh
dbt run --vars '{hubspot_marketing_enabled: false, hubspot_sales_enabled: true, hubspot_merged_deal_enabled: true, hubspot__pass_through_all_columns: true, hubspot_using_all_email_events: false, hubspot_owner_enabled: false}' --target "$db" --full-refresh
dbt test --target "$db"

fi
dbt run-operation fivetran_utils.drop_schemas_automation --target "$db"
19 changes: 19 additions & 0 deletions .github/workflows/stale.yml
fivetran-joemarkiewicz marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# For issues that have been open for awhile without activity, label
# them as stale with a warning that they will be closed out. If
# anyone comments to keep the issue open, it will automatically
# remove the stale label and keep it open.

# Runs once a day.

name: "Close stale issues and PRs"
on:
schedule:
- cron: "30 1 * * *"

permissions:
issues: write
pull-requests: write

jobs:
stale:
uses: fivetran/dbt_package_automations/.github/workflows/stale-bot.yml@22a7ff2 #update to @main once it's merged in the central repo
18 changes: 18 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,21 @@
# dbt_hubspot v0.18.0

[PR #144](https://github.com/fivetran/dbt_hubspot/pull/144) includes the following updates:

## 🚨 Breaking Changes 🚨
> ⚠️ Since the following changes result in the table format changing, we recommend running a `--full-refresh` after upgrading to this version to avoid possible incremental failures.

- For Databricks All-Purpose clusters, incremental models will now be materialized using the delta table format (previously parquet).
- Delta tables are generally more performant than parquet and are also more widely available for Databricks users. This will also prevent compilation issues on customers' managed tables.

- For Databricks SQL Warehouses, incremental materialization will not be used due to the incompatibility of the `insert_overwrite` strategy.
fivetran-joemarkiewicz marked this conversation as resolved.
Show resolved Hide resolved

## Under the Hood
- The `is_databricks_sql_warehouse` has been added to return `true` if the Databricks runtime being used is an all-purpose cluster **or** if any other Databricks non-supported destination is being used.
- This update was applied as there have been other Databricks runtimes discovered (ie. an endpoint and external runtime) which do not support the `insert_overwrite` incremental strategy used.
fivetran-joemarkiewicz marked this conversation as resolved.
Show resolved Hide resolved
- Added integration testing for Databricks SQL Warehouse.
- Added a bot to mark issues and PRs as stale if there is no activity for over 180 days.

# dbt_hubspot v0.17.2
[PR #142](https://github.com/fivetran/dbt_hubspot/pull/142) includes the following updates:

Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ Include the following hubspot package version in your `packages.yml` file:
```yaml
packages:
- package: fivetran/hubspot
version: [">=0.17.0", "<0.18.0"] # we recommend using ranges to capture non-breaking changes automatically
version: [">=0.18.0", "<0.19.0"] # we recommend using ranges to capture non-breaking changes automatically

```
Do **NOT** include the `hubspot_source` package in this file. The transformation package itself has a dependency on it and will install the source package as well.
Expand Down
2 changes: 1 addition & 1 deletion dbt_project.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
name: 'hubspot'
version: '0.17.2'
version: '0.18.0'

config-version: 2
require-dbt-version: [">=1.3.0", "<2.0.0"]
Expand Down
2 changes: 1 addition & 1 deletion docs/catalog.json

Large diffs are not rendered by default.

47 changes: 10 additions & 37 deletions docs/index.html

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/manifest.json

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/run_results.json

Large diffs are not rendered by default.

8 changes: 8 additions & 0 deletions integration_tests/ci/sample.profiles.yml
Original file line number Diff line number Diff line change
Expand Up @@ -51,4 +51,12 @@ integration_tests:
schema: hubspot_integration_tests_57
threads: 8
token: "{{ env_var('CI_DATABRICKS_DBT_TOKEN') }}"
type: databricks
databricks-sql:
fivetran-avinash marked this conversation as resolved.
Show resolved Hide resolved
catalog: "{{ env_var('CI_DATABRICKS_DBT_CATALOG') }}"
host: "{{ env_var('CI_DATABRICKS_DBT_HOST') }}"
http_path: "{{ env_var('CI_DATABRICKS_SQL_DBT_HTTP_PATH') }}"
schema: hubspot_sqlw_tests
threads: 8
token: "{{ env_var('CI_DATABRICKS_SQL_DBT_TOKEN') }}"
type: databricks
7 changes: 6 additions & 1 deletion integration_tests/dbt_project.yml
fivetran-avinash marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -1,8 +1,13 @@
name: 'hubspot_integration_tests'
version: '0.17.2'
version: '0.18.0'

profile: 'integration_tests'
config-version: 2

models:
hubspot:
+schema: "{{ 'hubspot_sqlw_tests' if target.name == 'databricks-sql' else 'hubspot' }}"
fivetran-joemarkiewicz marked this conversation as resolved.
Show resolved Hide resolved

vars:
hubspot_schema: hubspot_integration_tests_57
hubspot_service_enabled: true
Expand Down
17 changes: 17 additions & 0 deletions macros/is_incremental_compatible.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
{% macro is_incremental_compatible() %}
fivetran-joemarkiewicz marked this conversation as resolved.
Show resolved Hide resolved
{% if target.type in ('databricks') %}
{% set re = modules.re %}
{% set path_match = target.http_path %}
{% set regex_pattern = "sql/protocol" %}
{% set match_result = re.search(regex_pattern, path_match) %}
{% if match_result %}
{{ return(True) }}
{% else %}
{{ return(False) }}
{% endif %}
{% elif target.type in ('bigquery','snowflake','postgres','redshift','sqlserver') %}
fivetran-joemarkiewicz marked this conversation as resolved.
Show resolved Hide resolved
{{ return(True) }}
{% else %}
{{ return(False) }}
{% endif %}
{% endmacro %}
4 changes: 2 additions & 2 deletions models/service/hubspot__daily_ticket_history.sql
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
{{
config(
enabled=var('hubspot_service_enabled', False),
materialized='incremental',
materialized='incremental' if is_incremental_compatible() else 'table',
partition_by = {'field': 'date_day', 'data_type': 'date'}
if target.type not in ['spark', 'databricks'] else ['date_day'],
unique_key='ticket_day_id',
incremental_strategy = 'insert_overwrite' if target.type not in ('snowflake', 'postgres', 'redshift') else 'delete+insert',
file_format = 'parquet'
file_format = 'delta'
)
}}

Expand Down
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
{{
config(
enabled=var('hubspot_service_enabled', False),
materialized='incremental',
materialized='incremental' if is_incremental_compatible() else 'table',
partition_by = {'field': 'date_day', 'data_type': 'date'}
if target.type not in ['spark', 'databricks'] else ['date_day'],
unique_key='id',
incremental_strategy='insert_overwrite' if target.type in ('bigquery', 'spark', 'databricks') else 'delete+insert',
file_format='parquet'
file_format='delta'
)
}}

Expand Down
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
{{
config(
enabled=var('hubspot_service_enabled', False),
materialized='incremental',
materialized='incremental' if is_incremental_compatible() else 'table',
partition_by = {'field': 'date_day', 'data_type': 'date'}
if target.type not in ['spark', 'databricks'] else ['date_day'],
unique_key='id',
incremental_strategy = 'insert_overwrite' if target.type not in ('snowflake', 'postgres', 'redshift') else 'delete+insert',
file_format = 'parquet'
file_format = 'delta'
)
}}

Expand Down
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
{{
config(
enabled=var('hubspot_service_enabled', False),
materialized='incremental',
materialized='incremental' if is_incremental_compatible() else 'table',
partition_by = {'field': 'date_day', 'data_type': 'date'}
if target.type not in ['spark', 'databricks'] else ['date_day'],
unique_key='id',
incremental_strategy = 'insert_overwrite' if target.type not in ('snowflake', 'postgres', 'redshift') else 'delete+insert',
file_format = 'parquet'
file_format = 'delta'
)
}}

Expand Down
7 changes: 5 additions & 2 deletions packages.yml
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
packages:
- package: fivetran/hubspot_source
version: [">=0.14.0", "<0.15.0"]
# - package: fivetran/hubspot_source
# version: [">=0.14.0", "<0.15.0"]
- git: https://github.com/fivetran/dbt_hubspot_source.git
revision: update/get-ticket-property-history-columns
warn-unpinned: false