Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add support for UnboundPartitionSpec. #98

Closed
liurenjie1024 opened this issue Nov 17, 2023 · 9 comments · Fixed by #106
Closed

feat: Add support for UnboundPartitionSpec. #98

liurenjie1024 opened this issue Nov 17, 2023 · 9 comments · Fixed by #106
Labels
good first issue Good for newcomers

Comments

@liurenjie1024
Copy link
Contributor

We can see java implementation as reference.

@liurenjie1024 liurenjie1024 added the good first issue Good for newcomers label Nov 17, 2023
@my-vegetable-has-exploded
Copy link
Contributor

I'd like to have a try

@liurenjie1024
Copy link
Contributor Author

I'd like to have a try

Thanks!

@my-vegetable-has-exploded
Copy link
Contributor

Sorry, I am confusing about this issue. According to iceberg#4360, I think UnboundPartitionSpec provides a build method without a schema and can be later bound to a schema. It seems that the primary action in the bind function involves retrieving the source type from the schema and initializing the transform with the source type.
But in iceberg-rust, our transformFunction don't need source type to build. We determine source type using arrow_array.data_type() likes TransformFunction for Bucket. Do you think we still need this feature?

@liurenjie1024
Copy link
Contributor Author

Hi, @my-vegetable-has-exploded Sorry for the confusion. UnboundPartitionSpec has two important use cases:

  1. In rest api's definition. In this definition, the specId of PartitionSpec, field_id of PartitionFieldSpec are all optional in request(they are readonly.
  2. In TableMetadata builder api. It makes sense that you don't need to specify spec_id, field_id before actual committing transaction, since ids should be managed by catalogs.

Though we can use some other techniques, for example make them Option or use some default value (-1) to simplify it, but I prefer an unbound struct is more clear and type safe.

As with Transform, I think this is not important due to the api difference of java and rust.

@my-vegetable-has-exploded
Copy link
Contributor

Sorry for my misunderstanding and thanks for your patience @liurenjie1024. Without option, UnboundPartitionSpec and UnboundPartitionField may like this?

pub struct UnboundPartitionField {
    /// A source column id from the table’s schema
    pub source_id: i32,
    /// A partition name.
    pub name: String,
    /// A transform that is applied to the source column to produce a partition value.
    pub transform: Transform,
}

pub struct UnboundPartitionField {
    /// A source column id from the table’s schema
    pub source_id: i32,
    /// A partition name.
    pub name: String,
    /// A transform that is applied to the source column to produce a partition value.
    pub transform: Transform,
}

pub struct UnboundPartitionSpec {
    pub fields: Vec<UnboundPartitionField>,
}

impl UnboundPartitionSpec{
         pub fn bind(&self, schema: SchemaRef) -> Result<PartitionSpec> {
         // progress
         }
}

@liurenjie1024
Copy link
Contributor Author

Hi, @my-vegetable-has-exploded I mean I don't want to add Option in current PartitionSpec, but it's ok to add it in UnboundPartitonSpec.

@my-vegetable-has-exploded
Copy link
Contributor

Hi, @my-vegetable-has-exploded I mean I don't want to add Option in current PartitionSpec, but it's ok to add it in UnboundPartitonSpec.

get it!, thanks.

@my-vegetable-has-exploded
Copy link
Contributor

I think I'm still misunderstanding the UnboundPartitionSpec binding process, especially determining the spec_id during committing transaction.
I opened a draft pr,please take a look and leave your comments when you are free!
thanks, @liurenjie1024

@liurenjie1024
Copy link
Contributor Author

I think I'm still misunderstanding the UnboundPartitionSpec binding process, especially determining the spec_id during committing transaction. I opened a draft pr,please take a look and leave your comments when you are free! thanks, @liurenjie1024

Cool, I'll take a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants