Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support deleting whole partitions in Iceberg table #21048

Merged
merged 1 commit into from
Oct 11, 2023

Conversation

hantangwangd
Copy link
Member

@hantangwangd hantangwangd commented Oct 5, 2023

Description

We can use delete from <table> [where <predicate>]; to delete one or more whole partitions or all data in Iceberg table when some conditions are met:

  • The predicate could be wholly pushed down and exactly enforced.
  • The predicate Indicated one or more whole partitions in Iceberg table.

Motivation and Context

This PR allows deleting partition level data from Iceberg tables.

Test Plan

  • Make sure filter push down optimization do not affect existing test cases in IcebergDistributedTestBase
  • Newly added test case in IcebergDistributedTestBase.testDelete()

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

== RELEASE NOTES ==

General Changes
* The Iceberg connector now supports `DELETE FROM <table> [where <filter>]` statements to delete one or more entire partitions

@hantangwangd hantangwangd requested a review from a team as a code owner October 5, 2023 19:10
@tdcmeehan tdcmeehan added the iceberg Apache Iceberg related label Oct 5, 2023
Copy link
Contributor

@ZacBlanco ZacBlanco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main concern is about semantics of delete with snapshots/partition evolution. It's not clear to me how exactly this is handled. Also, we should probably update the docs. Looks good otherwise though!!

Copy link
Contributor

@ZacBlanco ZacBlanco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Mostly just small nits now. I think we should also update the docs to specify this is now supported and limited to partition-only and without specific snapshots

@github-actions
Copy link

github-actions bot commented Oct 10, 2023

Codenotify: Notifying subscribers in CODENOTIFY files for diff 384aca5...6fcc0eb.

Notify File(s)
@steveburnett presto-docs/src/main/sphinx/connector/iceberg.rst

@hantangwangd
Copy link
Member Author

Looks good! Mostly just small nits now. I think we should also update the docs to specify this is now supported and limited to partition-only and without specific snapshots

Fixes are done, and iceberg doc is updated. Please take a look.

Copy link
Contributor

@ZacBlanco ZacBlanco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two minor grammatical suggestions on the docs, but @steveburnett might have better ones. Otherwise LGTM!

presto-docs/src/main/sphinx/connector/iceberg.rst Outdated Show resolved Hide resolved
presto-docs/src/main/sphinx/connector/iceberg.rst Outdated Show resolved Hide resolved
@hantangwangd
Copy link
Member Author

Two minor grammatical suggestions on the docs, but @steveburnett might have better ones. Otherwise LGTM!

@ZacBlanco Thanks a lot, I will squash them into one commit later.

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! I made a couple of suggestions for consistency and clarity.

presto-docs/src/main/sphinx/connector/iceberg.rst Outdated Show resolved Hide resolved

Columns in the filter must all be identity transformed partition columns of the target table.

Do not support non-comparison operator for the filter columns, for example ADD, SUBTRACT, MODULUS, etc.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latin such as etc., e.g, i.e, should be avoided. For an example of a standard for this, see the GitLab recommended word list entry for etc.. To address this adapting @ZacBlanco's suggestion, consider something like this.

Suggested change
Do not support non-comparison operator for the filter columns, for example ADD, SUBTRACT, MODULUS, etc.
Filtered columns only support comparison operators, such as EQUALS, LESS THAN, or LESS THAN EQUALS.

presto-docs/src/main/sphinx/connector/iceberg.rst Outdated Show resolved Hide resolved
@hantangwangd
Copy link
Member Author

Nice work! I made a couple of suggestions for consistency and clarity.

@steveburnett Thanks a lot. I have update the docs following your suggestion, and squash them into one commit. Please take a look.

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (docs)

Updated my local branch just now and built the docs locally to pick up and review the latest changes as they will be seen in production, everything looks great.

Copy link
Member

@ChunxuTang ChunxuTang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hantangwangd Thanks for your work! This is a nice feature that would benefit the connector users.
Just left some minor comments.

@ChunxuTang
Copy link
Member

@hantangwangd BTW, do you want to add a release note?

@hantangwangd
Copy link
Member Author

@hantangwangd BTW, do you want to add a release note?

@ChunxuTang Sure, release note have been added.

We can use 'delete from <table> [where <predicate>]' to delete whole
partitions or all data in Iceberg table.

The predicate should satisfy some requirements:

- The columns appeared in predicate should all be identity partitioned
   columns, that is, use their original value as partition value.

- The columns appeared in predicate should not be wrapped in a
   non-comparison operator(for example ADD, SUBTRACT, MODULUS etc.)

Co-authored-by: Zac Blanco <[email protected]>
Co-authored-by: Steve Burnett <[email protected]>
Copy link
Member

@ChunxuTang ChunxuTang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @hantangwangd Thanks for the work again.

@ChunxuTang
Copy link
Member

Re-running the failed test.

@hantangwangd
Copy link
Member Author

LGTM. @hantangwangd Thanks for the work again.

@ChunxuTang It's my pleasure.

@wanglinsong wanglinsong mentioned this pull request Dec 8, 2023
26 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
iceberg Apache Iceberg related
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

5 participants