-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: First version of rest catalog. #78
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
/// Update a table to the catalog. | ||
async fn update_table(&self, _table: &TableIdent, _commit: TableCommit) -> Result<Table> { | ||
todo!() | ||
} | ||
|
||
/// Update multiple tables to the catalog as an atomic operation. | ||
async fn update_tables(&self, _tables: &[(TableIdent, TableCommit)]) -> Result<()> { | ||
todo!() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this is the way that we want to expose update tables. It is important that the right updates and requirements are set, otherwise, race conditions might occur. Also, there is little validation here, for example, can you do two distinct schema changes (so two updates), in a single commit? I would maybe leave this out for now so we can decide later on.
We might want to introduce a similar API as PyIceberg (which is inspired on Java): https://py.iceberg.apache.org/api/#schema-evolution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree. It should be called by transaction api rather by user directly. The main challenge is that we should limit its visibility rather than exposing directly to user. I will figure out a way when actually implementing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I have said in pr description, the goal of this pr is to implement simple apis. Others will be left in following pr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! Just some context around the concern: Within Iceberg, when creating a new table, the IDs are being reassigned (python, java). This can lead to confusion and potentially also errors. Therefore we refrain users from assigning IDs as much as possible (or reassigning them).
tbl = table.create_table(
Schema(
Field(1, "id", IntegerType()),
Field(2, "str", StringType())
)
)
tbl.set_schema(
Schema(
Field(1, "id", IntegerType()),
Field(3, "dt", DateType()),
Field(2, "str", StringType())
)
)
Because the IDs are re-assigned, it will evolve from:
Schema {
1: id
2: str
}
to:
Schema {
1: id
2: dt
3: str
}
And this will corrupt the table since there is no valid promotion from string to date. The general consensus is that users shouldn't care about the IDs themselves, and this should not be exposed to users through APIs. The API should handle it.
cc @Fokko Any other comments? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a great start. Thanks for working on it @liurenjie1024, and @Xuanwo @ZENOTME for the review!
In this pr we add initial support for rest, which finished simple rest apis.
Complex apis such as create table, update table, commits which be added in following pr so that we can make each pr's size reasonable.
related: #60