-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(11397): surface proper errors in ParquetSink #11399
Conversation
…t the task join errors be surfaced
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I think we need to return early here rather than ignoring the error
BTW I tried returning early locally and the test still passes
I agree with @alamb early return is way better than ignoring error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me -- thanks @wiedld
* fix(11397): do not surface errors for closed channels, and instead let the task join errors be surfaced * fix(11397): terminate early on channel send failure
* fix(11397): do not surface errors for closed channels, and instead let the task join errors be surfaced * fix(11397): terminate early on channel send failure
* fix(11397): do not surface errors for closed channels, and instead let the task join errors be surfaced * fix(11397): terminate early on channel send failure
* fix(11397): do not surface errors for closed channels, and instead let the task join errors be surfaced * fix(11397): terminate early on channel send failure
* fix(11397): do not surface errors for closed channels, and instead let the task join errors be surfaced * fix(11397): terminate early on channel send failure
* fix(11397): do not surface errors for closed channels, and instead let the task join errors be surfaced * fix(11397): terminate early on channel send failure Add Optimizer Sanity Checker, improve sortedness equivalence properties (apache#11196) * Initial optimizer sanity checker. Only includes sort reqs, docs will be added. * Add distro and pipeline friendly checks * Also check the plans we create are correct. * Add distribution test cases using global limit exec. * Add test for multiple children using SortMergeJoinExec. * Move PipelineChecker to SanityCheckPlan * Fix some tests and add docs * Add some test docs and fix clippy diagnostics. * Fix some failing tests * Replace PipelineChecker with SanityChecker in .slt files. * Initial commit * Slt tests pass * Resolve linter errors * Minor changes * Minor changes * Minor changes * Minor changes * Sort PreservingMerge clear per partition * Minor changes * Update output_requirements.rs * Address reviews * Update datafusion/core/src/physical_optimizer/optimizer.rs Co-authored-by: Mehmet Ozan Kabak <[email protected]> * Update datafusion/core/src/physical_optimizer/sanity_checker.rs Co-authored-by: Mehmet Ozan Kabak <[email protected]> * Address reviews * Minor changes * Apply suggestions from code review Co-authored-by: Andrew Lamb <[email protected]> * Update comment * Add map implementation --------- Co-authored-by: Erman Yafay <[email protected]> Co-authored-by: berkaysynnada <[email protected]> Co-authored-by: Mehmet Ozan Kabak <[email protected]> Co-authored-by: Andrew Lamb <[email protected]>
* fix(11397): do not surface errors for closed channels, and instead let the task join errors be surfaced * fix(11397): terminate early on channel send failure
* fix(11397): do not surface errors for closed channels, and instead let the task join errors be surfaced * fix(11397): terminate early on channel send failure
Which issue does this PR close?
Closes #11397
Rationale for this change
During the parallel writes in ParquetSink, we spawn a series of parallel tasks and then message pass the outcome from one task to the next. In abstraction:
read_batches => channel =>
Vec<col_write_tasks>
=> channel =>Vec<serialize_rowgroup_tasks>
When we encounter an error in one of the
Vec<x_tasks>
we are first surfacing an error on the channel.send() rather than on the task join.What changes are included in this PR?
Don't surface the errors on the channel send.
This results in the proper error returned, as can be seen on the updated test.
Are these changes tested?
Yes.
Are there any user-facing changes?
No.