-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: allow kernel to read tables with invalid _last_checkpoint #311
fix: allow kernel to read tables with invalid _last_checkpoint #311
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #311 +/- ##
==========================================
+ Coverage 72.51% 72.63% +0.11%
==========================================
Files 43 43
Lines 7783 7823 +40
Branches 7783 7823 +40
==========================================
+ Hits 5644 5682 +38
- Misses 1768 1770 +2
Partials 371 371 ☔ View full report in Codecov by Sentry. |
/// is invalid JSON. Unexpected/unrecoverable errors are returned as `Err` case and are assumed to | ||
/// cause failure. | ||
/// | ||
/// TODO: java kernel retries three times before failing, should we do the same? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my initial instinct would be, that if retry made sense, then on io errors when trying to read the file. I would expect this to be the responsibility of of the engine inside read_files
. This I think is is good as is. OR do we get errors back, that we think are retryable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it just assumed the read is happening over the network and isn't local? If it's local it wouldn't make sense to retry, but for a network read obviously it makes more sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my initial reaction is also to leave this to the engine. I don't think there is anything specific about reading the _last_checkpoint file and if failures occur across the network it should be handled the same for all our reads and taken care of within read_files
let path = Path::from("valid/_last_checkpoint"); | ||
let invalid_path = Path::from("invalid/_last_checkpoint"); | ||
|
||
tokio::runtime::Runtime::new() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can all of this be replace with a #[tokio::test]
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea considered doing that but figured this called out async a little better so kept it as-is. I would prefer to have these be sync tests but I think we are forced to async with object_store..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does calling the async out here provide some value to someone trying to understand the test? Assuming they know rust async, doing the runtime plumbing just seems like boiler plate right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably not. I would like to think about how to isolate the async code (really just needed for setup) so this is more of a TODO
Previously,
read_last_checkpoint
would returnErr(<json error>)
if json parsing of the_last_checkpoint
file failed. This caused the upstream call to fail due to propagation of the error. Instead we now returnNone
fromread_last_checkpoint
in this case - similar to what is returned if the_last_checkpoint
is not found. This allows log replay to continue without the hint and read the table.Additionally, this unblocks the
corrupted-last-checkpoint-kernel
golden table test.