-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-34796][SQL] Initialize counter variable for LIMIT code-gen in doProduce() #31892
Conversation
cc @cloud-fan, @maropu and @HyukjinKwon to take a look if you have time, thanks. |
How about |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch!
Refer to this link for build results (access rights to CI server needed): |
Refer to this link for build results (access rights to CI server needed): |
Ah, I see. It looks interesting. I thought that |
Thanks! Merged to master. The commit has a conflict with branch-3.1, so could you open a backport PR for that? |
Thank you @maropu and @cloud-fan for review. Will submit a PR shortly for branch 3.1. Thanks. |
What changes were proposed in this pull request?
This PR is to fix the LIMIT code-gen bug in https://issues.apache.org/jira/browse/SPARK-34796, where the counter variable from
BaseLimitExec
is not initialized but used in code-gen. This is because the limit counter variable will be used in upstream operators (LIMIT's child plan, e.g.ColumnarToRowExec
operator for early termination), but in the same stage, there can be some operators doing the shortcut and not callingBaseLimitExec
'sdoConsume()
, e.g. HashJoin.codegenInner. So if we have query thatLocalLimit - BroadcastHashJoin - FileScan
in the same stage, the whole stage code-gen compilation will be failed.Here is an example:
Query plan:
Codegen failure - https://gist.github.com/c21/ea760c75b546d903247582be656d9d66 .
The uninitialized variable
_limit_counter_1
fromLocalLimitExec
is referenced inColumnarToRowExec
, butBroadcastHashJoinExec
does not callLocalLimitExec.doConsume()
to initialize the counter variable.The fix is to move the counter variable initialization to
doProduce()
, as in whole stage code-gen framework,doProduce()
will definitely be called if upstream operatorsdoProduce()
/doConsume()
is called.Note: this only happens in AQE disabled case, because we have an AQE optimization rule EliminateUnnecessaryJoin to change the whole query to an empty
LocalRelation
if inner join broadcast side is empty with AQE enabled.Why are the changes needed?
Fix query failure.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Added unit test in
SQLQuerySuite.scala
.