[SPARK-23267] [SQL] Increase spark.sql.codegen.hugeMethodLimit to 65535 #20434

gatorsmile · 2018-01-30T08:41:57Z

What changes were proposed in this pull request?

Still saw the performance regression introduced by spark.sql.codegen.hugeMethodLimit in our internal workloads. There are two major issues in the current solution.

The size of the complied byte code is not identical to the bytecode size of the method. The detection is still not accurate.
The bytecode size of a single operator (e.g., SerializeFromObject) could still exceed 8K limit. We saw the performance regression in such scenario.

Since it is close to the release of 2.3, we decide to increase it to 64K for avoiding the perf regression.

How was this patch tested?

N/A

gatorsmile · 2018-01-30T08:43:25Z

cc @sameeragarwal @zsxwing @rxin @cloud-fan @rednaxelafx @yhuai @hvanhovell

sameeragarwal · 2018-01-30T09:05:31Z

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

-      "codegen. When the compiled function exceeds this threshold, " +
-      "the whole-stage codegen is deactivated for this subtree of the current query plan. " +
-      s"The default value is ${CodeGenerator.DEFAULT_JVM_HUGE_METHOD_LIMIT} and " +
-      "this is a limit in the OpenJDK JVM implementation.")


nit: might want to still keep the last line around to indicate where the 64k limit is coming from

The 8000 byte limit is a HotSpot-specific thing, but the 64KB limit is imposed by the Java Class File format, as a part of the JVM spec.

We may want to wordsmith a bit here to explain that:

65535 is a largest bytecode size possible for a valid Java method; setting the default value to 65535 is effectively turning the limit off for whole-stage codegen;

For those that wish to turn this limit on when running on HotSpot, it may be preferable to set the value to CodeGenerator.DEFAULT_JVM_HUGE_METHOD_LIMIT to match HotSpot's implementation.

I don't have a good concrete suggestion as to how to concisely expression these two points in the doc string, though.

Did the update

sameeragarwal · 2018-01-30T09:05:40Z

LGTM

rednaxelafx

LGTM except for a nit on wording of the default value.

rednaxelafx · 2018-01-30T09:44:53Z

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

-      "codegen. When the compiled function exceeds this threshold, " +
-      "the whole-stage codegen is deactivated for this subtree of the current query plan. " +
-      s"The default value is ${CodeGenerator.DEFAULT_JVM_HUGE_METHOD_LIMIT} and " +
-      "this is a limit in the OpenJDK JVM implementation.")


The 8000 byte limit is a HotSpot-specific thing, but the 64KB limit is imposed by the Java Class File format, as a part of the JVM spec.

We may want to wordsmith a bit here to explain that:

65535 is a largest bytecode size possible for a valid Java method; setting the default value to 65535 is effectively turning the limit off for whole-stage codegen;

For those that wish to turn this limit on when running on HotSpot, it may be preferable to set the value to CodeGenerator.DEFAULT_JVM_HUGE_METHOD_LIMIT to match HotSpot's implementation.

I don't have a good concrete suggestion as to how to concisely expression these two points in the doc string, though.

SparkQA · 2018-01-30T12:04:37Z

Test build #86813 has finished for PR 20434 at commit 4b358dc.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

kiszk · 2018-01-30T13:36:58Z

Does this value lead to no performance degradation of other typical workloads (e.g. TPC-DS)?

kiszk · 2018-01-30T13:37:45Z

It would be good to put a problematic code for future fix.

gatorsmile · 2018-01-30T15:57:20Z

@kiszk TPC-DS just shows the typical data analytics workloads. However, Spark SQL is being used for ETL like workloads. The regression happened in a complex pipeline of structured streaming workloads. Will do more investigation after 2.3 release.

dongjoon-hyun · 2018-01-30T19:02:55Z

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

    .intConf
-    .createWithDefault(CodeGenerator.DEFAULT_JVM_HUGE_METHOD_LIMIT)
+    .createWithDefault(65535)


cc @mgaido91 .

dongjoon-hyun · 2018-01-30T19:09:34Z

@gatorsmile .
In the original PR, #18810, there was a microbenchmark.
Can we have the result on the same benchmark here, too?

gatorsmile · 2018-01-30T19:12:39Z

This is to revert back to the original behavior. Thus, we do not introduce anything else compared with 2.2

dongjoon-hyun · 2018-01-30T19:13:35Z

I see. The baseline (you compared) is 2.2, right?

gatorsmile · 2018-01-30T19:14:40Z

Yes. We need to avoid the performance regression since the last release Spark 2.2

dongjoon-hyun

+1, LGTM.

SparkQA · 2018-01-30T19:31:07Z

Test build #86835 has finished for PR 20434 at commit c64bdfa.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2018-01-30T19:33:57Z

Thanks! Merged to master/2.3

## What changes were proposed in this pull request? Still saw the performance regression introduced by `spark.sql.codegen.hugeMethodLimit` in our internal workloads. There are two major issues in the current solution. - The size of the complied byte code is not identical to the bytecode size of the method. The detection is still not accurate. - The bytecode size of a single operator (e.g., `SerializeFromObject`) could still exceed 8K limit. We saw the performance regression in such scenario. Since it is close to the release of 2.3, we decide to increase it to 64K for avoiding the perf regression. ## How was this patch tested? N/A Author: gatorsmile <[email protected]> Closes #20434 from gatorsmile/revertConf. (cherry picked from commit 31c00ad) Signed-off-by: gatorsmile <[email protected]>

fix

4b358dc

sameeragarwal reviewed Jan 30, 2018

View reviewed changes

rednaxelafx reviewed Jan 30, 2018

View reviewed changes

update

c64bdfa

dongjoon-hyun reviewed Jan 30, 2018

View reviewed changes

dongjoon-hyun approved these changes Jan 30, 2018

View reviewed changes

asfgit closed this in 31c00ad Jan 30, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-23267] [SQL] Increase spark.sql.codegen.hugeMethodLimit to 65535 #20434

[SPARK-23267] [SQL] Increase spark.sql.codegen.hugeMethodLimit to 65535 #20434

gatorsmile commented Jan 30, 2018

gatorsmile commented Jan 30, 2018 •

edited

Loading

sameeragarwal Jan 30, 2018

rednaxelafx Jan 30, 2018

gatorsmile Jan 30, 2018

sameeragarwal commented Jan 30, 2018

rednaxelafx left a comment

rednaxelafx Jan 30, 2018

SparkQA commented Jan 30, 2018

kiszk commented Jan 30, 2018

kiszk commented Jan 30, 2018

gatorsmile commented Jan 30, 2018

dongjoon-hyun Jan 30, 2018

dongjoon-hyun commented Jan 30, 2018

gatorsmile commented Jan 30, 2018

dongjoon-hyun commented Jan 30, 2018 •

edited

Loading

gatorsmile commented Jan 30, 2018

dongjoon-hyun left a comment

SparkQA commented Jan 30, 2018

gatorsmile commented Jan 30, 2018

[SPARK-23267] [SQL] Increase spark.sql.codegen.hugeMethodLimit to 65535 #20434

[SPARK-23267] [SQL] Increase spark.sql.codegen.hugeMethodLimit to 65535 #20434

Conversation

gatorsmile commented Jan 30, 2018

What changes were proposed in this pull request?

How was this patch tested?

gatorsmile commented Jan 30, 2018 • edited Loading

sameeragarwal Jan 30, 2018

Choose a reason for hiding this comment

rednaxelafx Jan 30, 2018

Choose a reason for hiding this comment

gatorsmile Jan 30, 2018

Choose a reason for hiding this comment

sameeragarwal commented Jan 30, 2018

rednaxelafx left a comment

Choose a reason for hiding this comment

rednaxelafx Jan 30, 2018

Choose a reason for hiding this comment

SparkQA commented Jan 30, 2018

kiszk commented Jan 30, 2018

kiszk commented Jan 30, 2018

gatorsmile commented Jan 30, 2018

dongjoon-hyun Jan 30, 2018

Choose a reason for hiding this comment

dongjoon-hyun commented Jan 30, 2018

gatorsmile commented Jan 30, 2018

dongjoon-hyun commented Jan 30, 2018 • edited Loading

gatorsmile commented Jan 30, 2018

dongjoon-hyun left a comment

Choose a reason for hiding this comment

SparkQA commented Jan 30, 2018

gatorsmile commented Jan 30, 2018

gatorsmile commented Jan 30, 2018 •

edited

Loading

dongjoon-hyun commented Jan 30, 2018 •

edited

Loading