Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Improvement] Clarify the definitions and behaviors of interfaces dropTable and purgeTable #1436

Closed
4 tasks done
mchades opened this issue Jan 10, 2024 · 9 comments
Closed
4 tasks done
Assignees
Labels
improvement Improvements on everything

Comments

@mchades
Copy link
Contributor

mchades commented Jan 10, 2024

What would you like to be improved?

The described of dropTable is:

Drop a table from the catalog.

and the purgeTable is:

Drop a table from the catalog and completely remove its data.
If the catalog supports to purge a table, this method should be overridden. The default implementation throws an UnsupportedOperationException.

However, the original intention behind designing these two interfaces was that:

  • dropTable should delete the metadata and data of the table simultaneously. (If a recycle bin is available, the deleted data should go to the recycle bin.)
  • purgeTable is intended for catalogs with a recycle bin feature, ensuring that the table data is deleted from the recycle bin after dropTable. Otherwise, an UnsupportedOperationException should be thrown.

How should we improve?

Improving the method comment, checking the method call and implementation, and making sure the method's invocation scenario is in line with the semantic meaning of the method.

@mchades mchades added the improvement Improvements on everything label Jan 10, 2024
@mchades
Copy link
Contributor Author

mchades commented Jan 10, 2024

@jerryshao @yuqi1129 @diqiu50 @FANNG1 @Clearvive @qqqttt123 Do you have any suggestions on this?

@jerryshao
Copy link
Contributor

LGTM, let's check it separately.

@FANNG1
Copy link
Contributor

FANNG1 commented Jan 11, 2024

Iceberg has different meanings, dropTable is drop metadata while purge table is drop metadata and data.
@qqqttt123 , do you know any background?

@mchades
Copy link
Contributor Author

mchades commented Jan 11, 2024

Iceberg has different meanings, dropTable is drop metadata while purge table is drop metadata and data. @qqqttt123 , do you know any background?

Can you provide some reference for that? @FANNG1

@diqiu50
Copy link
Contributor

diqiu50 commented Jan 11, 2024

After that, should the DROP TABLE statement in Trino call the drop table API instead of using purgeTable?

@FANNG1
Copy link
Contributor

FANNG1 commented Jan 11, 2024

Iceberg has different meanings, dropTable is drop metadata while purge table is drop metadata and data. @qqqttt123 , do you know any background?

Can you provide some reference for that? @FANNG1

see https://iceberg.apache.org/docs/latest/spark-ddl/#drop-table-purge

and dropTable in catalog interface

  /**
   * Drop a table; optionally delete data and metadata files.
   *
   * <p>If purge is set to true the implementation should delete all data and metadata files.
   *
   * @param identifier a table identifier
   * @param purge if true, delete all data and metadata files in the table
   * @return true if the table was dropped, false if the table did not exist
   */
  boolean dropTable(TableIdentifier identifier, boolean purge);

@qqqttt123
Copy link
Contributor

qqqttt123 commented Jan 11, 2024

Iceberg has different meanings, dropTable is drop metadata while purge table is drop metadata and data. @qqqttt123 , do you know any background?

I don't think they have different meaning. Iceberg is based on object storage. Object storage may not have their own trash mechnism. Maybe we can treat the snapshot of Iceberg as a trash. So DropTable don't delete data. They can be garbaged.

@mchades
Copy link
Contributor Author

mchades commented Jan 11, 2024

After that, should the DROP TABLE statement in Trino call the drop table API instead of using purgeTable?

I think in most scenarios it is, depending on the semantics of Trino's Drop TABLE statement. @diqiu50

@FANNG1
Copy link
Contributor

FANNG1 commented Jan 11, 2024

Iceberg has different meanings, dropTable is drop metadata while purge table is drop metadata and data. @qqqttt123 , do you know any background?

I don't think they have different meaning. Iceberg is based on object storage. Object storage may not have their own trash mechnism. Maybe we can treat the snapshot of Iceberg as a trash. So DropTable don't delete data. They can be garbaged.

yes, agree.

@jerryshao jerryshao added this to the Gravitino 0.4.0 milestone Jan 16, 2024
FANNG1 pushed a commit that referenced this issue Aug 30, 2024
…imon instead of dropTable inGravitino (#4806)

### What changes were proposed in this pull request?

the dropTable in Paimon will both delete the metadata and data and skip
the trash, as discussed in
#1436 , Gravitino Paimon
catalog should use purgeTable not dropTable

### Why are the changes needed?
#4814

### Does this PR introduce any user-facing change?
yes, updated the doc.

### How was this patch tested?
existing ITs and UTs.
github-actions bot pushed a commit that referenced this issue Aug 30, 2024
…imon instead of dropTable inGravitino (#4806)

### What changes were proposed in this pull request?

the dropTable in Paimon will both delete the metadata and data and skip
the trash, as discussed in
#1436 , Gravitino Paimon
catalog should use purgeTable not dropTable

### Why are the changes needed?
#4814

### Does this PR introduce any user-facing change?
yes, updated the doc.

### How was this patch tested?
existing ITs and UTs.
jerryshao pushed a commit that referenced this issue Aug 30, 2024
…imon instead of dropTable inGravitino (#4820)

### What changes were proposed in this pull request?

the dropTable in Paimon will both delete the metadata and data and skip
the trash, as discussed in
#1436 , Gravitino Paimon
catalog should use purgeTable not dropTable

### Why are the changes needed?
#4814

### Does this PR introduce any user-facing change?
yes, updated the doc.

### How was this patch tested?
existing ITs and UTs.

Co-authored-by: cai can <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvements on everything
Projects
None yet
Development

No branches or pull requests

5 participants