-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specify Columns to update on incremental Models #1862
Comments
hey @richmintz! Can you say more about this? I think it's a good idea - I'm curious if you're interested in this feature for performance reasons, or something else. |
Hi @drewbanin! |
Got it! Sure, this makes a lot of sense in I can imagine a config like
If You buy all of that? |
As part of this proposed change, it would also be nice to be able to exclude the It could be a separate config Please note I tested this and would work for BigQuery, other databases might need a different syntax to support no-op for updates. Any thoughts? |
hey @bfil - if you do not supply a Check out the implementation here: |
@drewbanin I know, but I still want to compare and insert only the new data based on that unique key rather than merging on an always |
ah! Sure, that makes a lot of sense. What do you think about pushing this code down from the materialization layer and into the modeling layer. Could you do something like:
This is a kind of funny adaptation of the typical incremental modeling approach, but it should serve to only select the new records from your source table, inserting them directly into the destination table. |
@drewbanin : any update on where we are with this feature...i have a wide table [Snowflake] but i only need to update very few columns in the incremental model. any ETA's on the feature ? |
hey @ee07dazn - no movement on this issue on our end to report. I think the spec I mentioned above (pasted below) is still a good idea:
I think that this feature should only be supported on databases that support The key change here will be to replace the call to |
Thanks @drewbanin for the information and a pointer. Apart from what you had suggested, i had to make change to the default__get_merge_sql macro to make that approach work. It works now but i am just not happy to make a change to something as low level as default__get_merge_sql. Probably i think i can update the snowflake_get_merge_sql to do this job. Thinking out loud...but thanks for the help. |
Hi @drewbanin : i have another question and please bear with me if its a silly one as i am new to the tool and still exploring it through a POC. So i need to update the incremental materialisation to make sure i can use the "update_column" in config but rest of the materialisation does not need to change.
In this code, while the sql query gets executed correctly, i get the following error :
which is the 3rd last line. I want the default macro to be executed if i have not redefined a particular as in this case, i don't need to. How can i tell dbt to execute the default persist_docs macro in this model ? If you prefer, i can create it as a seperate issue but felt it was tied to the original question. |
I did redefine the macro "persist_docs" in my project and made the change in the materialisation to point to my macro. But now i get UPDATE : I updated to v0.17.0. and there is no issue any longer. Thanks anyways guys for help. |
ha! Ok @ee07dazn - glad to hear that you got it working in 0.17.0! I think we should still proceed with getting a feature like this merged into dbt. If you're interested in opening a PR we'd be happy to help, otherwise, someone else might be able to pick it up in the future :) |
@drewbanin : sorry, missed this comment. I will open a PR soon for this and tag you on it. |
@drewbanin any update on implemeting
@drewbanin any updates on implementing the |
@ee07dazn if you haven't had a chance to work on this yet, mind if I take a stab at it? @drewbanin other than the change to |
hey folks! No update from my side unfortunately, but please do feel free to open up a PR or follow up below if we want to jam on potential solutions together. Just re-reading this thread and saw:
I do think that's still the right approach, meaning that this feature should only be accessible on BigQuery/Snowflake and not Postgres/Redshift, for instance. @prratek I do think that replacing a call to @prratek if you'd like, feel free to drop in a draft PR with some example changes and we can take things from there too! |
Okay here's a draft PR. We'd probably want to default to |
Thanks for the draft PR @prratek! I can leave comments over there. |
Describe the feature
Allow model developer the ability to choose the set of columns that require updating on incremental loads
Who will this benefit?
This will benefit all users that are performing an incremental update of a wide table with few columns that are mutable
The text was updated successfully, but these errors were encountered: