Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-24865] Remove AnalysisBarrier #21822

Closed
wants to merge 14 commits into from
Closed

Conversation

rxin
Copy link
Contributor

@rxin rxin commented Jul 20, 2018

What changes were proposed in this pull request?

AnalysisBarrier was introduced in SPARK-20392 to improve analysis speed (don't re-analyze nodes that have already been analyzed).

Before AnalysisBarrier, we already had some infrastructure in place, with analysis specific functions (resolveOperators and resolveExpressions). These functions do not recursively traverse down subplans that are already analyzed (with a mutable boolean flag _analyzed). The issue with the old system was that developers started using transformDown, which does a top-down traversal of the plan tree, because there was not top-down resolution function, and as a result analyzer performance became pretty bad.

In order to fix the issue in SPARK-20392, AnalysisBarrier was introduced as a special node and for this special node, transform/transformUp/transformDown don't traverse down. However, the introduction of this special node caused a lot more troubles than it solves. This implicit node breaks assumptions and code in a few places, and it's hard to know when analysis barrier would exist, and when it wouldn't. Just a simple search of AnalysisBarrier in PR discussions demonstrates it is a source of bugs and additional complexity.

Instead, this pull request removes AnalysisBarrier and reverts back to the old approach. We added infrastructure in tests that fail explicitly if transform methods are used in the analyzer.

How was this patch tested?

Added a test suite AnalysisHelperSuite for testing the resolve* methods and transform* methods.

@SparkQA
Copy link

SparkQA commented Jul 20, 2018

Test build #93306 has finished for PR 21822 at commit f6f2bcc.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 20, 2018

Test build #93307 has finished for PR 21822 at commit 8ccafca.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 20, 2018

Test build #93308 has finished for PR 21822 at commit 0afa7ea.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

// Lookup WindowSpecDefinitions. This rule works with unresolved children.
case WithWindowDefinition(windowDefinitions, child) =>
child.transform {
// TODO(rxin): Check with Herman whether the next line is OK.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is good. The earlier resolveOperators makes sure we don't overwrite a window spec, with a similarly named one defined higher up the tree. BTW I don't think we have a test that covers this (it is pretty rare).

dedupAttr(a, attributeRewrites)
case s: SubqueryExpression =>
s.withNewPlan(dedupOuterReferencesInSubquery(s.plan, attributeRewrites))
// TODO(rxin): Why do we need transformUp here?
Copy link
Contributor Author

@rxin rxin Jul 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @hvanhovell @cloud-fan Why do we need transformUp here?

@@ -533,7 +537,8 @@ trait CheckAnalysis extends PredicateHelper {

// Simplify the predicates before validating any unsupported correlation patterns
// in the plan.
BooleanSimplification(sub).foreachUp {
// TODO(rxin): Why did this need to call BooleanSimplification???
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @dilipbiswal @cloud-fan @gatorsmile

Why did we need BooleanSimplification here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rxin From what i remember Reynold, most of this logic was housed in Analyzer before and we moved it to optimizer. In the old code we used to walk the plan after simplifying the predicates. The comment used to read "Simplify the predicates before pulling them out.". I just retained that semantics.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rxin I tracked down the PR that introduced the change in the Analyzer. Here is the link -
#12954

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I'm going to add it back.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I added boolean simplification here. I didn't quite like it back then, and I still don't like it. I was hoping this was happening in the Optimizer now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hvanhovell Hi Herman, as you said, we do the actual pulling up of the predicates in the optimizer in PullupCorrelatedPredicates in subquery.scala. We are also doing a BooleanSimplication first before traversing the plan there. In here, we are doing the error reporting and i thought it would be better to keep the traversal the same way. Basically previously we did the error reporting and rewriting in Analyzer and now, we do the error reporting in checkAnalysis and rewriting in Optimizer. Just to refresh your memory so you can help to take the right call here :-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well tests fail without it, so we don't really have a choice here. For a second I thought we could also create some utils class, but that would just mean moving the code in BooleanSimplification in there just for esthetics.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hvanhovell Yeah. I agree.

@SparkQA
Copy link

SparkQA commented Jul 20, 2018

Test build #93310 has finished for PR 21822 at commit 738e99c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 20, 2018

Test build #93320 has finished for PR 21822 at commit 83ffa51.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Jul 20, 2018

retest this please

@SparkQA
Copy link

SparkQA commented Jul 20, 2018

Test build #93325 has finished for PR 21822 at commit 83ffa51.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -787,6 +782,7 @@ class Analyzer(
right
case Some((oldRelation, newRelation)) =>
val attributeRewrites = AttributeMap(oldRelation.output.zip(newRelation.output))
// TODO(rxin): Why do we need transformUp here?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @cloud-fan why do we need transformUp here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we still need to transform resolved plan here to resolve self-join. Image

val df = ...
df.as("a").join(df.as("b"), ...)

We need to look into the resolved plan to replace the conflicted attributes.

@SparkQA
Copy link

SparkQA commented Jul 20, 2018

Test build #93351 has finished for PR 21822 at commit 7c76c83.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

case SubqueryAlias(_, child) => child
// This is actually called in the beginning of the optimization phase, and as a result
// is using transformUp rather than resolveOperators. This is also often called in the
//
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: finish comment


object LogicalPlan {

private val resolveOperatorDepth = new ThreadLocal[Int] {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: explain what this is

*
* @param rule the function use to transform this nodes children
*/
def resolveOperators(rule: PartialFunction[LogicalPlan, LogicalPlan]): LogicalPlan = {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: add unit tests

@rxin
Copy link
Contributor Author

rxin commented Jul 20, 2018

retest this please

@SparkQA
Copy link

SparkQA commented Jul 21, 2018

Test build #93366 has finished for PR 21822 at commit 38980ad.

  • This patch fails from timeout after a configured wait of `300m`.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor Author

rxin commented Jul 21, 2018

retest this please

@SparkQA
Copy link

SparkQA commented Jul 21, 2018

Test build #93370 has finished for PR 21822 at commit 38980ad.

  • This patch fails from timeout after a configured wait of `300m`.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 21, 2018

Test build #93375 has finished for PR 21822 at commit 38980ad.

  • This patch fails from timeout after a configured wait of `300m`.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Jul 21, 2018

Test build #93380 has finished for PR 21822 at commit 38980ad.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 23, 2018

Test build #93420 has finished for PR 21822 at commit 38980ad.

  • This patch fails from timeout after a configured wait of `300m`.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

retest this please

@HyukjinKwon
Copy link
Member

@rxin, I think this PR could possibly cause some performance effect given the latest test ran above https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93470/ and from a rough scan of other builds:

ArithmeticExpressionSuite:
- SPARK-22499: Least and greatest should not generate codes beyond 64KB (11 minutes, 51 seconds)

CastSuite:
- cast string to timestamp (8 minutes, 42 seconds)

TPCDSQuerySuite:
- q14a-v2.7 (2 minutes, 29 seconds)

SQLQueryTestSuite:
- subquery/in-subquery/in-joins.sql (2 minutes, 36 seconds)

ContinuousStressSuite:
- only one epoch (3 minutes, 21 seconds)
- automatic epoch advancement (3 minutes, 21 seconds)

vs https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.7/4699/consoleFull


ArithmeticExpressionSuite:
- SPARK-22499: Least and greatest should not generate codes beyond 64KB (7 minutes, 49 seconds)

CastSuite:
- cast string to timestamp (1 minute)

TPCDSQuerySuite:
- q14a-v2.7 (3 seconds, 442 milliseconds)

SQLQueryTestSuite:
- subquery/in-subquery/in-joins.sql (2 minutes, 21 seconds)

ContinuousStressSuite:
- only one epoch (3 minutes, 21 seconds)
- automatic epoch advancement (3 minutes, 21 seconds)

There could of course other factors like machine's status as well but I thought it's good to note while I am taking a look for those.

@rxin
Copy link
Contributor Author

rxin commented Jul 24, 2018 via email

@SparkQA
Copy link

SparkQA commented Jul 24, 2018

Test build #93470 has finished for PR 21822 at commit 38980ad.

  • This patch fails from timeout after a configured wait of `300m`.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin rxin changed the title [SPARK-24865] Remove AnalysisBarrier - WIP [SPARK-24865] Remove AnalysisBarrier Jul 25, 2018
@rxin
Copy link
Contributor Author

rxin commented Jul 25, 2018

I changed the way we do the checks in test to use a thread local rather than checking the stacktrace, so they should run faster now. Also added test cases for the various new methods. Also moved the relevant code into AnalysisHelper for better code structure.

This should be ready now if tests pass.

@@ -751,7 +751,8 @@ object TypeCoercion {
*/
case class ConcatCoercion(conf: SQLConf) extends TypeCoercionRule {

override protected def coerceTypes(plan: LogicalPlan): LogicalPlan = plan transform { case p =>
override protected def coerceTypes(
plan: LogicalPlan): LogicalPlan = plan resolveOperatorsDown { case p =>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im using a weird wrapping here to minimize the diff.

@SparkQA
Copy link

SparkQA commented Jul 25, 2018

Test build #93524 has finished for PR 21822 at commit abfd0a8.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • trait AnalysisHelper extends QueryPlan[LogicalPlan]

@gatorsmile
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Jul 25, 2018

Test build #93527 has finished for PR 21822 at commit f2f1a97.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 25, 2018

Test build #93526 has finished for PR 21822 at commit 75fb114.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 25, 2018

Test build #93533 has finished for PR 21822 at commit f2f1a97.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 27, 2018

Test build #93630 has finished for PR 21822 at commit fe52801.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Jul 27, 2018

Test build #93648 has finished for PR 21822 at commit fe52801.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

LGTM, merging to master!

@asfgit asfgit closed this in e6e9031 Jul 27, 2018
zzcclp pushed a commit to zzcclp/spark that referenced this pull request Aug 10, 2018
…enkins build (from 300m to 340m)

## What changes were proposed in this pull request?

Currently, looks we hit the time limit time to time. Looks better increasing the time a bit.

For instance, please see apache#21822

For clarification, current Jenkins timeout is 400m. This PR just proposes to fix the test script to increase it correspondingly.

*This PR does not target to change the build configuration*

## How was this patch tested?

Jenkins tests.

Closes apache#21845 from HyukjinKwon/SPARK-24886.

Authored-by: hyukjinkwon <[email protected]>
Signed-off-by: hyukjinkwon <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants