Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Spark] Add protocol validation in Metadata cleanup based on CRCs #4211

Conversation

andreaschat-db
Copy link
Contributor

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Description

CheckpointProtectionTableFeature requires clients to only cleanup metadata if they can clean up to requireCheckpointProtectionBeforeVersion. When this this not possible the metadata cleanup operation aborts. requireCheckpointProtectionBeforeVersion is updated with the version of the latest protocol downgrade every time a feature is dropped. Because of that, in certain scenarios, metadata cleanup could halt for extended periods of time.

This PR improves this behavior by allowing the client to proceed with metadata cleanup when the invariant above does not hold, as long as, a checkpoint already exists at the cleanup cut off version. If none of the invariants hold, it verifies it supports the protocols of all commits planning to remove (including any new checkpoint creation version).

How was this patch tested?

Added tests in DeltaRetentionSuite.

Does this PR introduce any user-facing changes?

No.

@vkorukanti vkorukanti merged commit 6dec3d7 into delta-io:master Mar 3, 2025
18 of 21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants