-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: list log files with checkpoint when version is None #312
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @mervynhang thanks for the PR! It does look like we aren't reading from the last checkpoint when asking for the latest version - thanks for fixing. Curious though if there is some constraint on the LIST API that's causing the 400? we should still be able to read the table without the checkpoint
Hi @zachschuermann , We have to use s3proxy in our environment, s3proxy require that continuation-token and start-after cannot be both set delta-kernel-rs will set start-after but when there are more than 1000 items, continuation-token will be set automatically and s3proxy will return error. |
Ah great, thanks for the context. So, put another way, attempting to list > 1000 items isn't supported for |
(the fix here is a useful optimization but more of a workaround for the underlying issue in ObjectStore/S3proxy it seems) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and we will also need to disable some tests in golden_tests.rs
(now that checkpoint reading is kicking in like it should): L411-412
- golden_test!("v2-checkpoint-json", latest_snapshot_test); // passing without v2 checkpoint support
- golden_test!("v2-checkpoint-parquet", latest_snapshot_test); // passing without v2 checkpoint support
+ skip_test!("v2-checkpoint-json": "v2 checkpoint not supported");
+ skip_test!("v2-checkpoint-parquet": "v2 checkpoint not supported");
Co-authored-by: Zach Schuermann <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @mervynhang if we modify the golden_tables tests (in my comment above) we should be good to go!
Hi @zachschuermann golden_tables tests has been updated, please let me know if anything else is needed. Thanks |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #312 +/- ##
==========================================
+ Coverage 72.63% 73.02% +0.39%
==========================================
Files 43 43
Lines 7823 7825 +2
Branches 7823 7825 +2
==========================================
+ Hits 5682 5714 +32
+ Misses 1770 1736 -34
- Partials 371 375 +4 ☔ View full report in Codecov by Sentry. |
could we update the PR description to reflect the changes being made? feel free to keep the original context but add something like: previously, if |
Hi @zachschuermann , PR description updated. Thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM I'll help take some of the testing in a follow-up PR!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Previously, if read_last_checkpoint returned a valid _last_checkpoint hint but the version requested was None (i.e. requesting the lastest version) we would still list_log_files without listing from the checkpoint hint. This change will detect when we have a valid _last_checkpoint and request the latest version and instead list_log_files_with_checkpoint
A side effect, we will get following error when load a table via s3proxy with a lot log files (more than 1000), but the same issue doesn't occur on delta-rs, after some troubleshooting, the error can be fixed by add this line from delta-rs
https://github.com/delta-io/delta-rs/blob/main/crates/core/src/kernel/snapshot/log_segment.rs#L98