[SPARK-29239][SPARK-29221][SQL] Subquery should not cause NPE when eliminating subexpression #25925

viirya · 2019-09-25T07:44:40Z

What changes were proposed in this pull request?

This patch proposes to skip PlanExpression when doing subexpression elimination on executors.

Why are the changes needed?

Subexpression elimination can possibly cause NPE when applying on execution subquery expression like ScalarSubquery on executors. It is because PlanExpression wraps query plan. To compare query plan on executor when eliminating subexpression, can cause unexpected error, like NPE when accessing transient fields.

The NPE looks like:

[info] - SPARK-29239: Subquery should not cause NPE when eliminating subexpression *** FAILED *** (175 milliseconds)
[info]   org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1395.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1395.0 (TID   3447, 10.0.0.196, executor driver): java.lang.NullPointerException
[info]  at org.apache.spark.sql.execution.LocalTableScanExec.stringArgs(LocalTableScanExec.scala:62)
[info]  at org.apache.spark.sql.catalyst.trees.TreeNode.argString(TreeNode.scala:506)
[info]  at org.apache.spark.sql.catalyst.trees.TreeNode.simpleString(TreeNode.scala:534)
[info]  at org.apache.spark.sql.catalyst.plans.QueryPlan.simpleString(QueryPlan.scala:179)
[info]  at org.apache.spark.sql.catalyst.plans.QueryPlan.verboseString(QueryPlan.scala:181)
[info]  at org.apache.spark.sql.catalyst.trees.TreeNode.generateTreeString(TreeNode.scala:647)
[info]  at org.apache.spark.sql.catalyst.trees.TreeNode.generateTreeString(TreeNode.scala:675)
[info]  at org.apache.spark.sql.catalyst.trees.TreeNode.generateTreeString(TreeNode.scala:675)
[info]  at org.apache.spark.sql.catalyst.trees.TreeNode.treeString(TreeNode.scala:569)
[info]  at org.apache.spark.sql.catalyst.trees.TreeNode.treeString(TreeNode.scala:559)
[info]  at org.apache.spark.sql.catalyst.trees.TreeNode.treeString(TreeNode.scala:551)
[info]  at org.apache.spark.sql.catalyst.trees.TreeNode.toString(TreeNode.scala:548)
[info]  at org.apache.spark.sql.catalyst.errors.package$TreeNodeException.<init>(package.scala:36)
[info]  at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56)
[info]  at org.apache.spark.sql.catalyst.trees.TreeNode.makeCopy(TreeNode.scala:436)
[info]  at org.apache.spark.sql.catalyst.trees.TreeNode.makeCopy(TreeNode.scala:425)
[info]  at org.apache.spark.sql.execution.SparkPlan.makeCopy(SparkPlan.scala:102)
[info]  at org.apache.spark.sql.execution.SparkPlan.makeCopy(SparkPlan.scala:63)
[info]  at org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:132)
[info]  at org.apache.spark.sql.catalyst.plans.QueryPlan.doCanonicalize(QueryPlan.scala:261)

Does this PR introduce any user-facing change?

No

How was this patch tested?

Added unit test.

cloud-fan · 2019-09-25T07:52:08Z

...atalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala

+      expr.find(_.isInstanceOf[LambdaVariable]).isDefined ||
+      // `PlanExpression` wraps query plan. To compare query plans of `PlanExpression` on executor,
+      // can cause unexpected error.
+      expr.isInstanceOf[PlanExpression[_]]


is there a way to only skip it on executors? This may still be useful when we codegen at driver side.

Like check TaskContext.get != null?

ah that's a good idea!

dongjoon-hyun · 2019-09-25T09:17:32Z

...atalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala

-      expr.find(_.isInstanceOf[LambdaVariable]).isDefined
+      expr.find(_.isInstanceOf[LambdaVariable]).isDefined ||
+      // `PlanExpression` wraps query plan. To compare query plans of `PlanExpression` on executor,
+      // can cause unexpected error.


If there isn't any other reason, shall we mention NPE specifically instead of unexpected error? Both SPARK-29239 and SPARK-29221 are due to NPE.

Ok, I updated this.

How about adding including NPE after unexpected error? IMHO unexpected error is actually correct (we can't predict which error we will get), but it would help much if we enumerate known errors as well.

Oh I realized my browser didn't show the comment from @viirya . It's just a 2 cents and like NPE seems OK to me.

SparkQA · 2019-09-25T11:29:34Z

Test build #111333 has finished for PR 25925 at commit 3a3bde0.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-09-25T19:25:51Z

Test build #111359 has finished for PR 25925 at commit 75109a0.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2019-09-26T05:28:46Z

...atalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala

+      expr.find(_.isInstanceOf[LambdaVariable]).isDefined ||
+      // `PlanExpression` wraps query plan. To compare query plans of `PlanExpression` on executor,
+      // can cause error like NPE.
+      (expr.isInstanceOf[PlanExpression[_]] && TaskContext.get != null)


Just for curiosity, does this issue happen in interpreted code path as well? e.g. we send PlanExpression to executor side and eval it, and hit NPE.

IIUC,EquivalentExpressions is only used in the codegen mode now, e.g., GenerateUnsafeProjection uses this class in common subexpr elimination, but `InterpretedUnsafeProject does not elimnate common subexprs.

Not sure I understand your question correctly. But PlanExpressions of a SparkPlan are evaluated and updated (e.g., ExecSubqueryExpression.updateResult) with values before a query begins to run. The values are kept in PlanExpression, and on executor side when to call eval of PlanExpression, it simply returns the kept value. I think we do not really evaluate a PlanExpression at executor side.

ah got it, so the kept value is serialized and sent to executor side in interpreted code path.

This issue also reminds me that it's better to always do codegen at driver side, even if whole-stage-codegen is false. We can investigate it later.

Ok. Please let me know if you have some ideas later.

cloud-fan · 2019-09-26T06:03:08Z

thanks, merging to master!

@viirya can you send another PR to 2.4? I tried to backport but it has conflicts.

viirya · 2019-09-26T06:06:39Z

@cloud-fan Ok. Let me create a backport PR.

viirya · 2019-09-26T07:10:58Z

@cloud-fan I just tried to make a backport.

In branch-2.4, Project uses an UnsafeProjection API which is not covered by CODEGEN_FACTORY_MODE config yet. So can not have an end-to-end test like this.

WDYT? Should we make a backport with unit test against EquivalentExpressions, or skip backport?

cloud-fan · 2019-09-26T07:15:10Z

do we have to trigger the bug via CODEGEN_FACTORY_MODE=CODEGEN_ONLY? This is a runtime exception and I think we can't fallback anyway.

viirya · 2019-09-26T15:36:24Z

do we have to trigger the bug via CODEGEN_FACTORY_MODE=CODEGEN_ONLY? This is a runtime exception and I think we can't fallback anyway.

It captures NonFatal so actually it can fallback when NPE.

cloud-fan · 2019-09-26T15:59:55Z

It captures NonFatal when compiling the code, not running the code, right?

viirya · 2019-09-26T18:17:09Z

Yes, it captures NonFatal when compiling the code. NPE happens when generating the code to be compiled on executor.

cloud-fan · 2019-09-27T01:54:02Z

ah I see. Then we can skip backporting as users can still run the query

Subquery should not cause NPE when eliminating subexpression.

3a3bde0

viirya mentioned this pull request Sep 25, 2019

[WIP][SPARK-29221][SQL] LocalTableScanExec: handle the case where executors are accessing "null" rows #25913

Closed

cloud-fan reviewed Sep 25, 2019

View reviewed changes

viirya changed the title ~~[SPARK-29239][SQL] Subquery should not cause NPE when eliminating subexpression~~ [SPARK-29239][SPARK-29221][SQL] Subquery should not cause NPE when eliminating subexpression Sep 25, 2019

dongjoon-hyun reviewed Sep 25, 2019

View reviewed changes

dongjoon-hyun added the SQL label Sep 25, 2019

Only skip PlanExpression on executor.

75109a0

cloud-fan reviewed Sep 26, 2019

View reviewed changes

cloud-fan closed this in b8b59d6 Sep 26, 2019

wangshuo128 mentioned this pull request Sep 27, 2019

[SPARK-29213][SQL] Generate extra IsNotNull predicate in FilterExec #25902

Closed

monkeyboy123 mentioned this pull request Feb 28, 2022

[SPARK-38333][SQL] [3.1]DPP cause DataSourceScanExec java.lang.NullPointer… #35662

Closed

viirya deleted the SPARK-29239 branch December 27, 2023 18:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-29239][SPARK-29221][SQL] Subquery should not cause NPE when eliminating subexpression #25925

[SPARK-29239][SPARK-29221][SQL] Subquery should not cause NPE when eliminating subexpression #25925

viirya commented Sep 25, 2019 •

edited

Loading

cloud-fan Sep 25, 2019

viirya Sep 25, 2019

cloud-fan Sep 25, 2019

dongjoon-hyun Sep 25, 2019

viirya Sep 25, 2019

HeartSaVioR Sep 25, 2019

HeartSaVioR Sep 25, 2019 •

edited

Loading

SparkQA commented Sep 25, 2019

SparkQA commented Sep 25, 2019

cloud-fan Sep 26, 2019

maropu Sep 26, 2019

viirya Sep 26, 2019

cloud-fan Sep 26, 2019

cloud-fan Sep 26, 2019

viirya Sep 26, 2019

cloud-fan commented Sep 26, 2019

viirya commented Sep 26, 2019

viirya commented Sep 26, 2019

cloud-fan commented Sep 26, 2019

viirya commented Sep 26, 2019

cloud-fan commented Sep 26, 2019

viirya commented Sep 26, 2019

cloud-fan commented Sep 27, 2019

[SPARK-29239][SPARK-29221][SQL] Subquery should not cause NPE when eliminating subexpression #25925

[SPARK-29239][SPARK-29221][SQL] Subquery should not cause NPE when eliminating subexpression #25925

Conversation

viirya commented Sep 25, 2019 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HeartSaVioR Sep 25, 2019 • edited Loading

Choose a reason for hiding this comment

SparkQA commented Sep 25, 2019

SparkQA commented Sep 25, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cloud-fan commented Sep 26, 2019

viirya commented Sep 26, 2019

viirya commented Sep 26, 2019

cloud-fan commented Sep 26, 2019

viirya commented Sep 26, 2019

cloud-fan commented Sep 26, 2019

viirya commented Sep 26, 2019

cloud-fan commented Sep 27, 2019

viirya commented Sep 25, 2019 •

edited

Loading

HeartSaVioR Sep 25, 2019 •

edited

Loading