feat!: Remove `BoundPartitionSpec` #771

c-thiel · 2024-12-09T16:14:01Z

Fixes #694 (kind of). This is based on what @Fokko and I discussed this morning.

Let me try to summarize our requirements:

In the REST spec and in V1 Table spec, field_id is optional -> we need an UnboundPartitionSpec type. We shouldn't use it anywhere in TableMetadata as field_id is required for V2 tables.
We are unsure if we need a spec type that holds the schema it was bound to. If a PartitionSpec is part of TableMetadata, then the answer is no, as we do have both side by side in the same struct. If PartitionSpec is not part of TableMetadata, then we might need it. Without the schema being part of the spec struct, we wouldn't be able to implement column name mapping as java does: https://github.com/apache/iceberg/blob/70d87f1750627b14b3b25a0216a97db86a786992/core/src/main/java/org/apache/iceberg/TableMetadata.java#L772-L791

My original commit introducing three structs (Bound, Unbound, Schemaless) assumed we need 2). However @Fokko and I couldn't find a good use case. Hence, he is taking the discussion now to Java and see how it goes. If the answer of that discussion is that we don't need 2, then we can merge this PR.

By not storing the schema as part of the partition spec we also loose an easy way to access partition_type of the metadata - it becomes fallible now. This is why I extended TableMetadata with default_partition_type. We need to compute the type anyway to ensure the compatibility of current schema and default spec on load - so we might as well store it.

I would suggest to already review this PR, so that we can click either the "Close" or the "Merge" button when we get Feedback from @Fokko.

CC @Fokko, @Xuanwo, @liurenjie1024

crates/iceberg/src/spec/partition.rs

Fokko · 2024-12-10T11:05:14Z

crates/iceberg/src/spec/table_metadata.rs

-    pub(crate) default_spec: BoundPartitionSpecRef,
+    pub(crate) default_spec: PartitionSpecRef,
+    /// Partition type of the default partition spec.
+    pub(crate) default_partition_type: StructType,


I had to think about this one, but I think I like it 👍

crates/iceberg/src/spec/table_metadata.rs

crates/iceberg/src/expr/visitors/expression_evaluator.rs

Fokko

I think this is great @c-thiel. It feels like we're simplifying the code quite a bit. This brings it much closer to the PyIceberg approach, as @liurenjie1024 already mentioned. In PyIceberg we even go a step further, by not validating the current-partition-spec, which there are arguments in both directions: You want to fail quickly, probably if you try to drop a field that's part of the current partition spec, it would fail right away. If you fail later on, you could still load the partition metadata, and rollback to a valid schema.

Co-authored-by: Fokko Driesprong <[email protected]>

Xuanwo

Thank you @c-thiel for working on this and also thank you @Fokko for keep watching this!

Fokko · 2024-12-11T11:34:10Z

Gentle ping @liurenjie1024 so we can wrap up the 0.4.0 milestone

liurenjie1024

Thanks @c-thiel for bringing this, LGTM!

liurenjie1024 · 2024-12-14T12:36:06Z

cc @c-thiel Seems we need to fix conflicts.

crates/iceberg/src/spec/partition.rs

sdd · 2024-12-14T22:56:57Z

Hey all. I'm a bit late to the party on this one, but having taken a first look after not having been closely involved in this, there's something that struck me as being a little confusing at first glance, and probably likely to confuse more people than me as the design discussions in here get stale over time and we're just looking at the code.

Prior to this change, in partition.rs, we have a BoundPartitionSpec and a SchemalessPartitionSpec. It's pretty clear, if not from the names of these structs, then from the struct definitions themselves, what the difference is between the two.

With this change though, we'll have UnboundPartitionSpec and PartitionSpec, which both have exactly the same shape - the distinction between these is not apparent at first glance for someone with fresh eyes or potentially even to future us after having not thought about this code for a while.

Can we add some detail to the docstring for the definition of each that distinguishes each from the other?

c-thiel · 2024-12-15T10:52:37Z

@liurenjie1024 conflicts resolved. Had to slightly change a function signature:
https://github.com/apache/iceberg-rust/pull/771/files#diff-8389535350ef7cddc266dfd18d589a978643da0334c23e16646e62e8d6a0892eR216-R219

@sdd thanks for the review! I fixed the typo and added a few comments.

/// Partition spec that defines how to produce a tuple of partition values from a record.
///
/// A [`PartitionSpec`] is originally obtained by binding an [`UnboundPartitionSpec`] to a schema and is
/// only guaranteed to be valid for that schema. The main difference between [`PartitionSpec`] and
/// [`UnboundPartitionSpec`] is that the former has field ids assigned,
/// while field ids are optional for [`UnboundPartitionSpec`].

Let me know what you think! Feel free to extend if if you think more is needed.

I like my previous names also more (SchemalessPartitionSpec). But as Python and Java both call it PartitionSpec, it might be better to follow.

sdd · 2024-12-15T11:19:05Z

Thanks @c-thiel, those comments are great, looks good to me :-)

Xuanwo · 2024-12-15T15:26:09Z

Thank you @c-thiel for working on this, and thank @Fokko, @liurenjie1024 and @sdd for the review. We have waited for this for so long, let's move!

Remove bound partition spec

261f239

c-thiel changed the title ~~feat!: Remove BoundPartitionSpec~~ feat!: Remove BoundPartitionSpec (WIP) Dec 9, 2024

c-thiel requested a review from Fokko December 9, 2024 16:34

c-thiel mentioned this pull request Dec 9, 2024

Tracking issues of iceberg rust v0.4.0 Release #739

Closed

15 tasks

liurenjie1024 reviewed Dec 10, 2024

View reviewed changes

crates/iceberg/src/spec/partition.rs Outdated Show resolved Hide resolved

c-thiel added 2 commits December 10, 2024 09:41

Fix UnboundPartitionSpec name

85fc497

Merge branch 'main' into ct/remove-bound-partition-spec

b215a30

Fokko mentioned this pull request Dec 10, 2024

Simplify partition structures #763

Closed

Fokko reviewed Dec 10, 2024

View reviewed changes

crates/iceberg/src/spec/table_metadata.rs Outdated Show resolved Hide resolved

Fokko reviewed Dec 10, 2024

View reviewed changes

crates/iceberg/src/expr/visitors/expression_evaluator.rs Outdated Show resolved Hide resolved

Fokko previously approved these changes Dec 10, 2024

View reviewed changes

c-thiel and others added 3 commits December 10, 2024 17:05

Update crates/iceberg/src/expr/visitors/expression_evaluator.rs

d8dc45f

Co-authored-by: Fokko Driesprong <[email protected]>

Update crates/iceberg/src/spec/table_metadata.rs

3ff8c12

Co-authored-by: Fokko Driesprong <[email protected]>

Fix syntax

75756c3

Fokko added this to the 0.4.0 Release milestone Dec 10, 2024

c-thiel changed the title ~~feat!: Remove BoundPartitionSpec (WIP)~~ feat!: Remove BoundPartitionSpec Dec 11, 2024

Xuanwo previously approved these changes Dec 11, 2024

View reviewed changes

Fokko requested a review from liurenjie1024 December 11, 2024 11:33

Fokko mentioned this pull request Dec 12, 2024

Dectect schema evolution or partition evolution for append DataFile #777

Open

2 tasks

c-thiel dismissed Fokko’s stale review via 75756c3 December 13, 2024 02:59

liurenjie1024 previously approved these changes Dec 14, 2024

View reviewed changes

sdd reviewed Dec 14, 2024

View reviewed changes

crates/iceberg/src/spec/partition.rs Outdated Show resolved Hide resolved

Merge branch 'main' into ct/remove-bound-partition-spec

d5053e7

c-thiel dismissed stale reviews from liurenjie1024 and Xuanwo via d5053e7 December 15, 2024 10:39

Address comments

d351b3a

Fokko approved these changes Dec 15, 2024

View reviewed changes

Xuanwo merged commit 821f8dd into apache:main Dec 15, 2024
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat!: Remove `BoundPartitionSpec` #771

feat!: Remove `BoundPartitionSpec` #771

c-thiel commented Dec 9, 2024 •

edited

Loading

Fokko Dec 10, 2024

Fokko left a comment

Xuanwo left a comment

Fokko commented Dec 11, 2024

liurenjie1024 left a comment

liurenjie1024 commented Dec 14, 2024

sdd commented Dec 14, 2024

c-thiel commented Dec 15, 2024

sdd commented Dec 15, 2024

Xuanwo commented Dec 15, 2024

feat!: Remove BoundPartitionSpec #771

feat!: Remove BoundPartitionSpec #771

Conversation

c-thiel commented Dec 9, 2024 • edited Loading

Fokko Dec 10, 2024

Choose a reason for hiding this comment

Fokko left a comment

Choose a reason for hiding this comment

Xuanwo left a comment

Choose a reason for hiding this comment

Fokko commented Dec 11, 2024

liurenjie1024 left a comment

Choose a reason for hiding this comment

liurenjie1024 commented Dec 14, 2024

sdd commented Dec 14, 2024

c-thiel commented Dec 15, 2024

sdd commented Dec 15, 2024

Xuanwo commented Dec 15, 2024

feat!: Remove `BoundPartitionSpec` #771

feat!: Remove `BoundPartitionSpec` #771

c-thiel commented Dec 9, 2024 •

edited

Loading