-
Notifications
You must be signed in to change notification settings - Fork 433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add deltaOps set metadata operation #2474
Conversation
Allow for the changing of the metadata of a delta table. This allows for simple schema migrations like changing the metadata of a column or adding new nullable columns. Note: you used to be able to do this by recalling DeltaOps create with overwrite on an existing table but since that was recently fixed to delete old data this allows for recreating that original behavior.
ACTION NEEDED delta-rs follows the Conventional Commits specification for release automation. The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification. |
Unfortunately it isn't that simple. If you do it like this you could put the table in an invalid state because the metadata contains schema, partitionColumns and configuration. For each one of them you need to do many checks before you can change it. For the configuration part I have 2 PRs open: #2264 #2075 For partitionColumns, you can't change that, at this point we don't allow evolving the partition columns of a table. And with respect to schema evolution or changes to it. That all needs to go into operations such as ALTER table DROP COLUMN, ALTER table ADD COLUMN |
Thank you @ion-elgreco , I was not aware that you had added support for setting table properties with #2264. If this operation added more checking that the old and new metadata were compatible would that be acceptable? |
@HawaiianSpork I don't see how you wouldn't be able to add a nested field in a struct column with ADD COLUMN I think it's still safe since you add something. But probably good to verify what happens when you read two parquet with partially different struct schema |
Good point, I had assumed ADD COLUMN only worked top level columns but at least in the Spark world nested columns are supported. So I guess I have to add ADD COLUMN support to delta-rs... |
@HawaiianSpork fyi, I am adding an So will close this one |
Description
Allow for the explicit changing of the metadata of a delta table. This allows for simple schema migrations like changing the metadata of a column or adding new nullable columns. The code doesn't currently do any checks that the table would still be readable after changing the metadata. The setMetadata operation is similar to mergeSchema but doesn't require a write at the same time so it can be run and tested as part of a deployment instead of on the next write of data.
Note: you used to be able to do this by recalling DeltaOps::create with overwrite on an existing table but since that was recently fixed to delete old data this allows for recreating that original behavior.