-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to evolve the schema #124
Comments
@Fokko thanks for the detailed writeup! In dbt more generally, incremental models cannot handle column additions or deletions without a The reason that dbt grabs the list of columns from the existing table is to handle cases where the set of columns is the same between the temp view (new records) and the existing table (old records), but the order is different (#59). When you talk about schema evolution, is this or this what you have in mind? We could write some more code here: if |
I agree that deletions extremely hard, or handling breaking changes in general, for example changing an integer column to a string. I don't have to tell, as dbt-labs/dbt-core#1132. Also, as a committer on Parquet and Avro I had my fair share on this :) The things that you've pointed out are exactly what I've ment. There are two situations here that apply for Delta:
I'll come up with some code examples the upcoming days, so we can discuss it in more detail. |
I'd definitely welcome contributions to support both of those on Delta.
Agreed! I think it would make sense to update the |
We're running into an issue with Spark + DBT. When we add a column to an existing model, it won't be added to the table itself.
Let's say we have the following:
![image](https://user-images.githubusercontent.com/1134248/99360150-73f93880-28b0-11eb-9499-e51c2ab01740.png)
This translates into the following table:
![image](https://user-images.githubusercontent.com/1134248/99360529-03065080-28b1-11eb-8d6b-9e75abc1545a.png)
However, if we add a column, and rerun it again:
![image](https://user-images.githubusercontent.com/1134248/99360401-d3574880-28b0-11eb-8143-fbdbd2beb8da.png)
We don't see the freshly added column in the resulting table:
![image](https://user-images.githubusercontent.com/1134248/99360699-482a8280-28b1-11eb-8450-0cdde064fa1a.png)
If we remove the existing table:
And rerun it again:
![image](https://user-images.githubusercontent.com/1134248/99360994-b66f4500-28b1-11eb-9656-5b33b7c1f08b.png)
Then the column appears and everything looks sane again.
However, more disturbingly, when we remove the column again:
![image](https://user-images.githubusercontent.com/1134248/99361085-d7d03100-28b1-11eb-87f6-60c66037f8e6.png)
It will give an error that Spark is unable to resolve the column.
Looking at the logs, reveals the issue to us:
And then it will do the upsert:
So, the issue is when selecting the columns from the
dbt_tmp
table, it will take the existing columns, instead of the columns of the model. Changing this to the columns of the model should allow us to evolve the schema.The text was updated successfully, but these errors were encountered: