[SPARK-24865] Remove AnalysisBarrier #21822

rxin · 2018-07-20T00:06:57Z

What changes were proposed in this pull request?

AnalysisBarrier was introduced in SPARK-20392 to improve analysis speed (don't re-analyze nodes that have already been analyzed).

Before AnalysisBarrier, we already had some infrastructure in place, with analysis specific functions (resolveOperators and resolveExpressions). These functions do not recursively traverse down subplans that are already analyzed (with a mutable boolean flag _analyzed). The issue with the old system was that developers started using transformDown, which does a top-down traversal of the plan tree, because there was not top-down resolution function, and as a result analyzer performance became pretty bad.

In order to fix the issue in SPARK-20392, AnalysisBarrier was introduced as a special node and for this special node, transform/transformUp/transformDown don't traverse down. However, the introduction of this special node caused a lot more troubles than it solves. This implicit node breaks assumptions and code in a few places, and it's hard to know when analysis barrier would exist, and when it wouldn't. Just a simple search of AnalysisBarrier in PR discussions demonstrates it is a source of bugs and additional complexity.

Instead, this pull request removes AnalysisBarrier and reverts back to the old approach. We added infrastructure in tests that fail explicitly if transform methods are used in the analyzer.

How was this patch tested?

Added a test suite AnalysisHelperSuite for testing the resolve* methods and transform* methods.

SparkQA · 2018-07-20T00:32:04Z

Test build #93306 has finished for PR 21822 at commit f6f2bcc.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-07-20T00:39:59Z

Test build #93307 has finished for PR 21822 at commit 8ccafca.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-07-20T00:45:02Z

Test build #93308 has finished for PR 21822 at commit 0afa7ea.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2018-07-20T01:38:34Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

      // Lookup WindowSpecDefinitions. This rule works with unresolved children.
      case WithWindowDefinition(windowDefinitions, child) =>
-        child.transform {
+        // TODO(rxin): Check with Herman whether the next line is OK.


cc @hvanhovell

It is good. The earlier resolveOperators makes sure we don't overwrite a window spec, with a similarly named one defined higher up the tree. BTW I don't think we have a test that covers this (it is pretty rare).

rxin · 2018-07-20T01:38:49Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

-                dedupAttr(a, attributeRewrites)
-              case s: SubqueryExpression =>
-                s.withNewPlan(dedupOuterReferencesInSubquery(s.plan, attributeRewrites))
+          // TODO(rxin): Why do we need transformUp here?


cc @hvanhovell @cloud-fan Why do we need transformUp here?

rxin · 2018-07-20T01:39:36Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala

@@ -533,7 +537,8 @@ trait CheckAnalysis extends PredicateHelper {

    // Simplify the predicates before validating any unsupported correlation patterns
    // in the plan.
-    BooleanSimplification(sub).foreachUp {
+    // TODO(rxin): Why did this need to call BooleanSimplification???


cc @dilipbiswal @cloud-fan @gatorsmile

Why did we need BooleanSimplification here?

@rxin From what i remember Reynold, most of this logic was housed in Analyzer before and we moved it to optimizer. In the old code we used to walk the plan after simplifying the predicates. The comment used to read "Simplify the predicates before pulling them out.". I just retained that semantics.

@rxin I tracked down the PR that introduced the change in the Analyzer. Here is the link -
#12954

Thanks. I'm going to add it back.

Yeah, I added boolean simplification here. I didn't quite like it back then, and I still don't like it. I was hoping this was happening in the Optimizer now.

@hvanhovell Hi Herman, as you said, we do the actual pulling up of the predicates in the optimizer in PullupCorrelatedPredicates in subquery.scala. We are also doing a BooleanSimplication first before traversing the plan there. In here, we are doing the error reporting and i thought it would be better to keep the traversal the same way. Basically previously we did the error reporting and rewriting in Analyzer and now, we do the error reporting in checkAnalysis and rewriting in Optimizer. Just to refresh your memory so you can help to take the right call here :-)

Well tests fail without it, so we don't really have a choice here. For a second I thought we could also create some utils class, but that would just mean moving the code in BooleanSimplification in there just for esthetics.

@hvanhovell Yeah. I agree.

SparkQA · 2018-07-20T03:40:36Z

Test build #93310 has finished for PR 21822 at commit 738e99c.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-07-20T07:05:02Z

Test build #93320 has finished for PR 21822 at commit 83ffa51.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2018-07-20T07:06:26Z

retest this please

SparkQA · 2018-07-20T10:00:16Z

Test build #93325 has finished for PR 21822 at commit 83ffa51.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2018-07-20T20:23:56Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

@@ -787,6 +782,7 @@ class Analyzer(
          right
        case Some((oldRelation, newRelation)) =>
          val attributeRewrites = AttributeMap(oldRelation.output.zip(newRelation.output))
+          // TODO(rxin): Why do we need transformUp here?


cc @cloud-fan why do we need transformUp here?

@cloud-fan ?

we still need to transform resolved plan here to resolve self-join. Image

val df = ... df.as("a").join(df.as("b"), ...)

We need to look into the resolved plan to replace the conflicted attributes.

SparkQA · 2018-07-20T20:30:29Z

Test build #93351 has finished for PR 21822 at commit 7c76c83.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2018-07-20T20:38:15Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

-    case SubqueryAlias(_, child) => child
+  // This is actually called in the beginning of the optimization phase, and as a result
+  // is using transformUp rather than resolveOperators. This is also often called in the
+  //


note: finish comment

rxin · 2018-07-20T20:38:44Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala

+
+object LogicalPlan {
+
+  private val resolveOperatorDepth = new ThreadLocal[Int] {


todo: explain what this is

rxin · 2018-07-20T20:39:01Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala

+   *
+   * @param rule the function use to transform this nodes children
+   */
+  def resolveOperators(rule: PartialFunction[LogicalPlan, LogicalPlan]): LogicalPlan = {


todo: add unit tests

rxin · 2018-07-20T22:15:35Z

retest this please

SparkQA · 2018-07-21T01:38:17Z

Test build #93366 has finished for PR 21822 at commit 38980ad.

This patch fails from timeout after a configured wait of `300m`.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2018-07-21T01:55:40Z

retest this please

SparkQA · 2018-07-21T03:18:11Z

Test build #93370 has finished for PR 21822 at commit 38980ad.

This patch fails from timeout after a configured wait of `300m`.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-07-21T06:58:06Z

Test build #93375 has finished for PR 21822 at commit 38980ad.

This patch fails from timeout after a configured wait of `300m`.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2018-07-21T08:04:54Z

retest this please

SparkQA · 2018-07-21T12:25:12Z

Test build #93380 has finished for PR 21822 at commit 38980ad.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-07-23T06:03:13Z

Test build #93420 has finished for PR 21822 at commit 38980ad.

This patch fails from timeout after a configured wait of `300m`.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2018-07-24T01:41:39Z

retest this please

HyukjinKwon · 2018-07-24T06:27:54Z

@rxin, I think this PR could possibly cause some performance effect given the latest test ran above https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93470/ and from a rough scan of other builds:

ArithmeticExpressionSuite:
- SPARK-22499: Least and greatest should not generate codes beyond 64KB (11 minutes, 51 seconds)

CastSuite:
- cast string to timestamp (8 minutes, 42 seconds)

TPCDSQuerySuite:
- q14a-v2.7 (2 minutes, 29 seconds)

SQLQueryTestSuite:
- subquery/in-subquery/in-joins.sql (2 minutes, 36 seconds)

ContinuousStressSuite:
- only one epoch (3 minutes, 21 seconds)
- automatic epoch advancement (3 minutes, 21 seconds)

vs https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.7/4699/consoleFull


ArithmeticExpressionSuite:
- SPARK-22499: Least and greatest should not generate codes beyond 64KB (7 minutes, 49 seconds)

CastSuite:
- cast string to timestamp (1 minute)

TPCDSQuerySuite:
- q14a-v2.7 (3 seconds, 442 milliseconds)

SQLQueryTestSuite:
- subquery/in-subquery/in-joins.sql (2 minutes, 21 seconds)

ContinuousStressSuite:
- only one epoch (3 minutes, 21 seconds)
- automatic epoch advancement (3 minutes, 21 seconds)

There could of course other factors like machine's status as well but I thought it's good to note while I am taking a look for those.

rxin · 2018-07-24T06:31:23Z

Yea the extra check in test cases might've contributed to the longer test time. Let me think about how to reduce it.

SparkQA · 2018-07-24T06:43:14Z

Test build #93470 has finished for PR 21822 at commit 38980ad.

This patch fails from timeout after a configured wait of `300m`.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2018-07-25T01:05:55Z

I changed the way we do the checks in test to use a thread local rather than checking the stacktrace, so they should run faster now. Also added test cases for the various new methods. Also moved the relevant code into AnalysisHelper for better code structure.

This should be ready now if tests pass.

rxin · 2018-07-25T01:15:22Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala

@@ -751,7 +751,8 @@ object TypeCoercion {
   */
  case class ConcatCoercion(conf: SQLConf) extends TypeCoercionRule {

-    override protected def coerceTypes(plan: LogicalPlan): LogicalPlan = plan transform { case p =>
+    override protected def coerceTypes(
+      plan: LogicalPlan): LogicalPlan = plan resolveOperatorsDown { case p =>


im using a weird wrapping here to minimize the diff.

SparkQA · 2018-07-25T02:44:30Z

Test build #93524 has finished for PR 21822 at commit abfd0a8.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
trait AnalysisHelper extends QueryPlan[LogicalPlan]

gatorsmile · 2018-07-25T02:50:25Z

retest this please

SparkQA · 2018-07-25T05:26:01Z

Test build #93527 has finished for PR 21822 at commit f2f1a97.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-07-25T05:29:54Z

Test build #93526 has finished for PR 21822 at commit 75fb114.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-07-25T07:01:14Z

Test build #93533 has finished for PR 21822 at commit f2f1a97.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

resolve the conflicts

SparkQA · 2018-07-27T00:46:08Z

Test build #93630 has finished for PR 21822 at commit fe52801.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2018-07-27T01:16:27Z

retest this please

SparkQA · 2018-07-27T05:34:47Z

Test build #93648 has finished for PR 21822 at commit fe52801.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2018-07-27T06:29:23Z

LGTM, merging to master!

…enkins build (from 300m to 340m) ## What changes were proposed in this pull request? Currently, looks we hit the time limit time to time. Looks better increasing the time a bit. For instance, please see apache#21822 For clarification, current Jenkins timeout is 400m. This PR just proposes to fix the test script to increase it correspondingly. *This PR does not target to change the build configuration* ## How was this patch tested? Jenkins tests. Closes apache#21845 from HyukjinKwon/SPARK-24886. Authored-by: hyukjinkwon <[email protected]> Signed-off-by: hyukjinkwon <[email protected]>

rxin added 3 commits July 19, 2018 17:06

[SPARK-24865] Remove AnalysisBarrier

f6f2bcc

Minimize change

8ccafca

Fix bug

0afa7ea

More bug fixes

738e99c

rxin commented Jul 20, 2018

View reviewed changes

more fixes

83ffa51

bypass EliminateSubqueryAliases

7c76c83

rxin commented Jul 20, 2018

View reviewed changes

rxin added 2 commits July 20, 2018 13:32

Added BooleanSimplification back

14ac09c

revert mistake

38980ad

rxin commented Jul 20, 2018

View reviewed changes

HyukjinKwon mentioned this pull request Jul 23, 2018

[SPARK-24886][INFRA] Fix the testing script to increase timeout for Jenkins build (from 300m to 340m) #21845

Closed

rxin added 3 commits July 24, 2018 17:40

Switch to use thread local to make things go faster.

abfd0a8

Test cases

75fb114

Remove one TODO

f2f1a97

rxin changed the title ~~[SPARK-24865] Remove AnalysisBarrier - WIP~~ [SPARK-24865] Remove AnalysisBarrier Jul 25, 2018

rxin commented Jul 25, 2018

View reviewed changes

mgaido91 mentioned this pull request Jul 25, 2018

[SPARK-24867] [SQL] Add AnalysisBarrier to DataFrameWriter #21821

Closed

gatorsmile mentioned this pull request Jul 25, 2018

[SPARK-24341][SQL] Support only IN subqueries with the same number of items per row #21403

Closed

gatorsmile and others added 3 commits July 26, 2018 11:57

Merge remote-tracking branch 'upstream/master' into SPARK-24865-xiao

e7d3f53

add a dummy fix

7995272

Merge pull request #22 from gatorsmile/SPARK-24865-xiao

fe52801

resolve the conflicts

asfgit closed this in e6e9031 Jul 27, 2018

This was referenced Jul 27, 2018

[SPARK-23229][SQL] Dataset.hint should use planWithBarrier logical plan #20405

Closed

[SPARK-24979][SQL] add AnalysisHelper#resolveOperatorsUp #21932

Closed


		object LogicalPlan {

		private val resolveOperatorDepth = new ThreadLocal[Int] {

[SPARK-24865] Remove AnalysisBarrier #21822

[SPARK-24865] Remove AnalysisBarrier #21822

Conversation

rxin commented Jul 20, 2018 • edited Loading

What changes were proposed in this pull request?

How was this patch tested?

SparkQA commented Jul 20, 2018

SparkQA commented Jul 20, 2018

SparkQA commented Jul 20, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rxin Jul 20, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Jul 20, 2018

SparkQA commented Jul 20, 2018

maropu commented Jul 20, 2018

SparkQA commented Jul 20, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Jul 20, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rxin commented Jul 20, 2018

SparkQA commented Jul 21, 2018

rxin commented Jul 21, 2018

SparkQA commented Jul 21, 2018

SparkQA commented Jul 21, 2018

HyukjinKwon commented Jul 21, 2018

SparkQA commented Jul 21, 2018

SparkQA commented Jul 23, 2018

HyukjinKwon commented Jul 24, 2018

HyukjinKwon commented Jul 24, 2018

rxin commented Jul 24, 2018 via email • edited Loading

SparkQA commented Jul 24, 2018

rxin commented Jul 25, 2018

Choose a reason for hiding this comment

SparkQA commented Jul 25, 2018

gatorsmile commented Jul 25, 2018

SparkQA commented Jul 25, 2018

SparkQA commented Jul 25, 2018

SparkQA commented Jul 25, 2018

SparkQA commented Jul 27, 2018

gatorsmile commented Jul 27, 2018

SparkQA commented Jul 27, 2018

cloud-fan commented Jul 27, 2018

rxin commented Jul 20, 2018 •

edited

Loading

rxin Jul 20, 2018 •

edited

Loading

rxin commented Jul 24, 2018 via email •

edited

Loading