Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audit table incrementality + Postgres compatibility #31

Merged
merged 32 commits into from
Jan 14, 2022

Conversation

fivetran-jamie
Copy link
Contributor

@fivetran-jamie fivetran-jamie commented Dec 29, 2021

Are you a current Fivetran customer?

fivetran made PR

What change(s) does this PR introduce?

a couple of things:

  1. draws from Dylan's PR (Initial incremental setup for BigQuery #29) to make the audit table incremental (and partition the staging log table)
  2. merged Brice Luu's PR (Fix spine start_date #30) to explicitly cast the start date of the date spine in connector_daily_events to avoid string-type errors on postgres
  3. adds postgres to circle ci testing
  4. a minor tweak to use dbt_utils.group_by()

Does this PR introduce a breaking change?

  • Yes (please provide breaking change details below.)
  • No (please provide explanation how the change is non breaking below.)

mmm unsure if this is a breaking change or not. a pre-existing table now has a different materialization (incremental) but i believe users do not need to explicitly run a full refresh to capture this new change.

Is this PR in response to a previously created Issue

How did you test the PR changes?

  • CircleCi
  • Other (please provide additional testing details below) -> tested locally on BQ

Select which warehouse(s) were used to test the PR

  • BigQuery
  • Redshift
  • Snowflake
  • Postgres
  • Databricks
  • Other (provide details below)

Provide an emoji that best describes your current mood

🥱

Feedback

We are so excited you decided to contribute to the Fivetran community dbt package! We continue to work to improve the packages and would greatly appreciate your feedback on our existing dbt packages or what you'd like to see next.

@fivetran-jamie fivetran-jamie changed the title Audit table incrementality + Postgres compatibility fix Audit table incrementality + Postgres compatibility Dec 29, 2021
@fivetran-joemarkiewicz
Copy link
Contributor

The beauty of the dbt_fivetran_log latest release being a pre-release is that the next release will be breaking for v1 compatibility. Therefore, we can include this in the next official breaking release!

Copy link
Contributor

@fivetran-joemarkiewicz fivetran-joemarkiewicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for tackling this @fivetran-jamie 🙌

I have a few minor comments about ensuring day is the correct grain to partition by. It seems to be working great when I test locally. I know we encountered issues in other packages where we over-partitioned and I want to ensure this is the ideal partition to roll out.

Additionally, a few other updates that are not inline which we should apply before merging:

  • Update the integration_tests/dbt_project.yml to have the same versioning as the package
  • Update the CHANGELOG.md for the v0.5.0 release notes.

Copy link
Contributor

@fivetran-joemarkiewicz fivetran-joemarkiewicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few additional questions based off Dylan's response to your questions.

Copy link
Contributor

@fivetran-joemarkiewicz fivetran-joemarkiewicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for applying the changes @fivetran-jamie! I tested this locally by limiting the days in the staging model, running a --full-refresh and then removing my limit and running a normal dbt run. I found the merge worked as intended and didn't see any duplicate records. I think the merge key was the "key" here (pun intended). Thanks!

I have just a few quick comments in line and highlighted below. Once those changes are made then I will approve this PR!

  • Be sure to add Postgres compatibility to the README
  • Can you add some inline comments to the incremental logic within the fivetran_log__audit_table model. I imagine customers (and us in the future) will have a hard time understanding what exactly is happening here. Some quick comments will make that easier!
  • Add the unique_table_sync_key to the fivetran_log.yml and regenerate the docs.
  • Consider defining the test for the unique key instead of the combination for simplicity.

Copy link
Contributor

@fivetran-joemarkiewicz fivetran-joemarkiewicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks so much @fivetran-jamie 😄. Let's hold off merging until @gareginordyan gives a review.

One quick question - is there a reason for the extra files in the docs/ folder? Usually all we need is the index.html, catalog.json, and the manifest.json. Curious why the other files are included in this PR? Can you remove them if they are not necessary.

Copy link
Contributor

@fivetran-joemarkiewicz fivetran-joemarkiewicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fivetran-jamie thanks again for all your hard work on this update!

I chatted with @gareginordyan and as the insert_overwrite logic is not compatibile with all our supported destinations, it makes sense to keep the merge incremental strategy. I did make a few inline comments that we should apply before we merge this PR and cut a new release.

  • Add details to the README on how the customer can add the partition to their base table in order to improve performance and cost even more. As it stands currently, the time of the model is greatly reduced with our updates. However, if they add the partition logic like you highlighted in the CHANGELOG then they will see a better impact on the cost of the query.
  • Rename a few of the variables in the incremental logic to read a bit easier. I added a suggestion.

Once you make these changes, then we will be good to merge this PR and cut the release!

@fivetran-jamie fivetran-jamie merged commit 0b06213 into main Jan 14, 2022
@fivetran-jamie fivetran-jamie deleted the feature/audit-incrementality branch January 14, 2022 19:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants