-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-23267] [SQL] Increase spark.sql.codegen.hugeMethodLimit to 65535 #20434
Conversation
"codegen. When the compiled function exceeds this threshold, " + | ||
"the whole-stage codegen is deactivated for this subtree of the current query plan. " + | ||
s"The default value is ${CodeGenerator.DEFAULT_JVM_HUGE_METHOD_LIMIT} and " + | ||
"this is a limit in the OpenJDK JVM implementation.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: might want to still keep the last line around to indicate where the 64k limit is coming from
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 8000 byte limit is a HotSpot-specific thing, but the 64KB limit is imposed by the Java Class File format, as a part of the JVM spec.
We may want to wordsmith a bit here to explain that:
- 65535 is a largest bytecode size possible for a valid Java method; setting the default value to 65535 is effectively turning the limit off for whole-stage codegen;
- For those that wish to turn this limit on when running on HotSpot, it may be preferable to set the value to
CodeGenerator.DEFAULT_JVM_HUGE_METHOD_LIMIT
to match HotSpot's implementation.
I don't have a good concrete suggestion as to how to concisely expression these two points in the doc string, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did the update
LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except for a nit on wording of the default value.
"codegen. When the compiled function exceeds this threshold, " + | ||
"the whole-stage codegen is deactivated for this subtree of the current query plan. " + | ||
s"The default value is ${CodeGenerator.DEFAULT_JVM_HUGE_METHOD_LIMIT} and " + | ||
"this is a limit in the OpenJDK JVM implementation.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 8000 byte limit is a HotSpot-specific thing, but the 64KB limit is imposed by the Java Class File format, as a part of the JVM spec.
We may want to wordsmith a bit here to explain that:
- 65535 is a largest bytecode size possible for a valid Java method; setting the default value to 65535 is effectively turning the limit off for whole-stage codegen;
- For those that wish to turn this limit on when running on HotSpot, it may be preferable to set the value to
CodeGenerator.DEFAULT_JVM_HUGE_METHOD_LIMIT
to match HotSpot's implementation.
I don't have a good concrete suggestion as to how to concisely expression these two points in the doc string, though.
Test build #86813 has finished for PR 20434 at commit
|
Does this value lead to no performance degradation of other typical workloads (e.g. TPC-DS)? |
It would be good to put a problematic code for future fix. |
@kiszk TPC-DS just shows the typical data analytics workloads. However, Spark SQL is being used for ETL like workloads. The regression happened in a complex pipeline of structured streaming workloads. Will do more investigation after 2.3 release. |
.intConf | ||
.createWithDefault(CodeGenerator.DEFAULT_JVM_HUGE_METHOD_LIMIT) | ||
.createWithDefault(65535) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @mgaido91 .
@gatorsmile . |
This is to revert back to the original behavior. Thus, we do not introduce anything else compared with 2.2 |
I see. The baseline (you compared) is 2.2, right? |
Yes. We need to avoid the performance regression since the last release Spark 2.2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM.
Test build #86835 has finished for PR 20434 at commit
|
Thanks! Merged to master/2.3 |
## What changes were proposed in this pull request? Still saw the performance regression introduced by `spark.sql.codegen.hugeMethodLimit` in our internal workloads. There are two major issues in the current solution. - The size of the complied byte code is not identical to the bytecode size of the method. The detection is still not accurate. - The bytecode size of a single operator (e.g., `SerializeFromObject`) could still exceed 8K limit. We saw the performance regression in such scenario. Since it is close to the release of 2.3, we decide to increase it to 64K for avoiding the perf regression. ## How was this patch tested? N/A Author: gatorsmile <[email protected]> Closes #20434 from gatorsmile/revertConf. (cherry picked from commit 31c00ad) Signed-off-by: gatorsmile <[email protected]>
What changes were proposed in this pull request?
Still saw the performance regression introduced by
spark.sql.codegen.hugeMethodLimit
in our internal workloads. There are two major issues in the current solution.SerializeFromObject
) could still exceed 8K limit. We saw the performance regression in such scenario.Since it is close to the release of 2.3, we decide to increase it to 64K for avoiding the perf regression.
How was this patch tested?
N/A