-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify update_skip_aggregation_probe
method
#12332
Simplify update_skip_aggregation_probe
method
#12332
Conversation
@@ -1024,11 +1017,7 @@ impl GroupedHashAggregateStream { | |||
/// Note: currently spilling is not supported for Partial aggregation | |||
fn update_skip_aggregation_probe(&mut self, input_rows: usize) { | |||
if let Some(probe) = self.skip_aggregation_probe.as_mut() { | |||
if !self.spill_state.spills.is_empty() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now partial aggregation only do early emit indeed, but I guess this check is trying to be defensive, if someone decides to do spilling in the partial stage also, the early emit logic won't be break 🤔 Maybe we can leave an assertion here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In #7400, the author tried spilling in partial stage, but he was asked to remove that. See #7400 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this check is trying to be defensive
Yes, that was the idea -- these features are mutually exclusive, with current implementation of spilling, but if they both are triggered (which never happens for now -- probably the note should be more informative) it shouldn't break query execution.
I think the change is reasonable since current code is redundant indeed, but not sure about assertion -- I suppose it's better to return not_implemented / internal error instead of panicking.
And it's probably worth to retain the note on why skip partial is incompatible with spilling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add another assert to ensure and emphasize that the update_state
is only called in Partial
mode?
IMO, when we are sure that some branches are actually unreacheable, assertion may be nice that it can let us easier to find the bug through tests?
...
assert!(self.spill_state.spills.is_empty() && self.mode == AggregateMode::Partial);
probe.update_state(input_rows, self.group_values.len());
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree -- adding an assertion would be great. Can you perhaps make a PR to do so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have submitted a pr about this #12640
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Let's get this merged 🚀
cc @Rachelint
…aggregation_probe
I took the liberty of merging up from main to resolve a conflict |
Which issue does this PR close?
Closes #.
Rationale for this change
What changes are included in this PR?
Spilling only happens when mode is not
AggregateMode::Partial
,datafusion/datafusion/physical-plan/src/aggregates/row_hash.rs
Line 890 in 91b1d2b
but skipping agg probe only happens when mode is
AggregateMode::Partial
.datafusion/datafusion/physical-plan/src/aggregates/row_hash.rs
Line 530 in 91b1d2b
So there is no spilling when
skip_aggregation_probe
isOption::Some
.Are these changes tested?
Are there any user-facing changes?