-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-35329][SQL] Split generated switch code into pieces in ExpandExec #32457
Conversation
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #138214 has finished for PR 32457 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #138220 has finished for PR 32457 at commit
|
This code seems to generate # of methods that is equal to # of columns. Am I correct? |
The number of added methods should be number of columns (different output) * number of projections? Maybe as @kiszk said, adding a threshold? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks okay otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another idea. Maybe adding one method for all columns per projection, instead of per column?
Totally. Two points.
I am a bit afraid that the number of methods may be over the limit of a class if there are multiple projections. For example, 100 columns x 700 projections (65536 is the upper limit at https://docs.oracle.com/javase/specs/jvms/se16/html/jvms-4.html#jvms-4.11). |
@kiszk @viirya Yea, you're right; the previous approach seems bad. So, I applied two improvements based on your suggestion;
Could you check the latest code? |
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #138257 has finished for PR 32457 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #138269 has finished for PR 32457 at commit
|
Is the description updated? Looks good except it. |
val inputVars = inputVarSets.foldLeft(Set.empty[VariableValue])(_ ++ _) | ||
val paramLength = CodeGenerator.calculateParamLengthFromExprValues(inputVars.toSeq) | ||
val maybeSplitUpdateCode = if (CodeGenerator.isValidParamLength(paramLength) && | ||
exprCodesWithIndices.map(_._2.code.length).sum > splitThreshold) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we only check if the code under current switch case is under threshold. Seems to me, we need to accumulate non-split code too. If accumulated code is over the threshold, we need to split it out on later switch cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. That's a good insight. More simply, how about checking if the overall size of switch cases goes over the limit? Could you check the latest code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm if CI passes.
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #138346 has finished for PR 32457 at commit
|
@kiszk Do you want to take another look before I merge this? Since you approved it, there is some change. |
val argList = inputVars.map { v => | ||
s"${CodeGenerator.typeName(v.javaType)} ${v.variableName}" | ||
val splitThreshold = SQLConf.get.methodSplitThreshold | ||
val cases = if (switchCaseExprs.flatMap(_._2.map(_._2.code.length)).sum > splitThreshold) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we call generateUpdateCode
only once before line 198?
IMHO, code in all three cases (line 203-, line 216, and line 225-) is generated by generateUpdateCode()
.
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #138421 has finished for PR 32457 at commit
|
Looks like a related test failure. |
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #138437 has finished for PR 32457 at commit
|
LGTM if failures disappear |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
What changes were proposed in this pull request?
This PR intends to split generated switch code into smaller ones in
ExpandExec
. In the current master, even a simple query like the one below generates a large method whose size (maxMethodCodeSize:7448
) is close to8000
(CodeGenerator.DEFAULT_JVM_HUGE_METHOD_LIMIT
);The fix in this PR can make the method smaller as follows;
Why are the changes needed?
For better generated code.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
GA passed.