Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-17751] [SQL] Remove spark.sql.eagerAnalysis and Output the Plan if Existed in AnalysisException #15316

Closed
wants to merge 8 commits into from

Conversation

gatorsmile
Copy link
Member

@gatorsmile gatorsmile commented Sep 30, 2016

What changes were proposed in this pull request?

Dataset always does eager analysis now. Thus, spark.sql.eagerAnalysis is not used any more. Thus, we need to remove it.

This PR also outputs the plan. Without the fix, the analysis error is like

cannot resolve '`k1`' given input columns: [k, v]; line 1 pos 12

After the fix, the analysis error becomes:

org.apache.spark.sql.AnalysisException: cannot resolve '`k1`' given input columns: [k, v]; line 1 pos 12;
'Project [unresolvedalias(CASE WHEN ('k1 = 2) THEN 22 WHEN ('k1 = 4) THEN 44 ELSE 0 END, None), v#6]
+- SubqueryAlias t
   +- Project [_1#2 AS k#5, _2#3 AS v#6]
      +- LocalRelation [_1#2, _2#3]

How was this patch tested?

N/A

@SparkQA
Copy link

SparkQA commented Sep 30, 2016

Test build #66189 has finished for PR 15316 at commit c71a03e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@hvanhovell
Copy link
Contributor

Huh, when did we remove eager analysis?

@gatorsmile
Copy link
Member Author

gatorsmile commented Oct 1, 2016

@hvanhovell It was removed when we migrated DataFrame to Dataset. For details, see the PR: #11443

@hvanhovell
Copy link
Contributor

Hmmmm. I did kinda like this feature. @liancheng why did we remove this?

@rxin
Copy link
Contributor

rxin commented Oct 1, 2016

Shouldn't we bring it back? This is an important feature when there is a bug.

@gatorsmile
Copy link
Member Author

gatorsmile commented Oct 1, 2016

To @hvanhovell @rxin

@liancheng listed the reason in the original PR.

Dataset always do eager analysis now
We used to support disabling DataFrame eager analysis to help reporting partially analyzed malformed logical plan on analysis failure. However, Dataset encoders requires eager analysis during Dataset construction. To preserve the error reporting feature, AnalysisException now takes an extra Option[LogicalPlan] argument to hold the partially analyzed plan, so that we can check the plan tree when reporting test failures. This plan is passed by QueryExecution.assertAnalyzed.

However, there is a bug in AnslysisException output. It does not output the plan. I just fixed it. See the updated PR description.

@gatorsmile gatorsmile changed the title [SPARK-17751] [SQL] Remove spark.sql.eagerAnalysis [SPARK-17751] [SQL] Remove spark.sql.eagerAnalysis and Output the Plan if Existed in AnalysisException. Oct 1, 2016
@gatorsmile gatorsmile changed the title [SPARK-17751] [SQL] Remove spark.sql.eagerAnalysis and Output the Plan if Existed in AnalysisException. [SPARK-17751] [SQL] Remove spark.sql.eagerAnalysis and Output the Plan if Existed in AnalysisException Oct 1, 2016
@gatorsmile
Copy link
Member Author

Do we need a test for this fix?

@SparkQA
Copy link

SparkQA commented Oct 1, 2016

Test build #66203 has finished for PR 15316 at commit ca45217.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 1, 2016

Test build #66222 has finished for PR 15316 at commit ab893a6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 1, 2016

Test build #66221 has finished for PR 15316 at commit 8fb4f40.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -43,6 +43,11 @@ class AnalysisException protected[sql] (
}

override def getMessage: String = {
val planAnnotation = plan.map(p => s";\n$p").getOrElse("")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a separate method here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When hitting an AnalysisException, we called getMessage in SQLQueryTestSuite.scala. That means, we also output the logical plan in the .sql.out when hitting an AnalysisException. The logical plan contains expression IDs. Thus, when we doing the text compare between the query results with the contents in .sql.out, it might fail due to the mismatch of expression IDs. Thus, I added an extra function here. Maybe I need to add a comment to say this is for testing only.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that makes sense.

@SparkQA
Copy link

SparkQA commented Oct 10, 2016

Test build #66668 has finished for PR 15316 at commit 2c5de3c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Oct 11, 2016

Test build #66687 has finished for PR 15316 at commit 2c5de3c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

cc @hvanhovell @rxin Any more comment about this PR? I assume Spark 2.0.2 needs it.

Recently, when we analyzing the JIRA https://issues.apache.org/jira/browse/SPARK-17709, we are unable to see the plan due the analyzer failure. The users have to manually rebuild it with this fix. Then, we can see the failed analyzed plan.

@gatorsmile
Copy link
Member Author

Also cc @cloud-fan and @liancheng

@cloud-fan
Copy link
Contributor

LGTM, cc @hvanhovell @rxin for final sign-off

@rxin
Copy link
Contributor

rxin commented Oct 17, 2016

LGTM

@SparkQA
Copy link

SparkQA commented Oct 17, 2016

Test build #67058 has finished for PR 15316 at commit f082643.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@hvanhovell
Copy link
Contributor

hvanhovell commented Oct 17, 2016

LGTM. Merging to master. Thanks!

@asfgit asfgit closed this in d88a1ba Oct 17, 2016
@hvanhovell
Copy link
Contributor

@gatorsmile I cannot merge this 2.0. Can you open a backport for this?

@gatorsmile
Copy link
Member Author

Sure, will do it soon. Thanks!

asfgit pushed a commit that referenced this pull request Oct 18, 2016
…utput the Plan if Existed in AnalysisException

### What changes were proposed in this pull request?
This PR is to backport the fix #15316 to 2.0.

Dataset always does eager analysis now. Thus, `spark.sql.eagerAnalysis` is not used any more. Thus, we need to remove it.

This PR also outputs the plan. Without the fix, the analysis error is like
```
cannot resolve '`k1`' given input columns: [k, v]; line 1 pos 12
```

After the fix, the analysis error becomes:
```
org.apache.spark.sql.AnalysisException: cannot resolve '`k1`' given input columns: [k, v]; line 1 pos 12;
'Project [unresolvedalias(CASE WHEN ('k1 = 2) THEN 22 WHEN ('k1 = 4) THEN 44 ELSE 0 END, None), v#6]
+- SubqueryAlias t
   +- Project [_1#2 AS k#5, _2#3 AS v#6]
      +- LocalRelation [_1#2, _2#3]
```

### How was this patch tested?
N/A

Author: gatorsmile <[email protected]>

Closes #15529 from gatorsmile/eagerAnalysis20.
robert3005 pushed a commit to palantir/spark that referenced this pull request Nov 1, 2016
… if Existed in AnalysisException

### What changes were proposed in this pull request?
Dataset always does eager analysis now. Thus, `spark.sql.eagerAnalysis` is not used any more. Thus, we need to remove it.

This PR also outputs the plan. Without the fix, the analysis error is like
```
cannot resolve '`k1`' given input columns: [k, v]; line 1 pos 12
```

After the fix, the analysis error becomes:
```
org.apache.spark.sql.AnalysisException: cannot resolve '`k1`' given input columns: [k, v]; line 1 pos 12;
'Project [unresolvedalias(CASE WHEN ('k1 = 2) THEN 22 WHEN ('k1 = 4) THEN 44 ELSE 0 END, None), v#6]
+- SubqueryAlias t
   +- Project [_1#2 AS k#5, _2#3 AS v#6]
      +- LocalRelation [_1#2, _2#3]
```

### How was this patch tested?
N/A

Author: gatorsmile <[email protected]>

Closes apache#15316 from gatorsmile/eagerAnalysis.
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
… if Existed in AnalysisException

### What changes were proposed in this pull request?
Dataset always does eager analysis now. Thus, `spark.sql.eagerAnalysis` is not used any more. Thus, we need to remove it.

This PR also outputs the plan. Without the fix, the analysis error is like
```
cannot resolve '`k1`' given input columns: [k, v]; line 1 pos 12
```

After the fix, the analysis error becomes:
```
org.apache.spark.sql.AnalysisException: cannot resolve '`k1`' given input columns: [k, v]; line 1 pos 12;
'Project [unresolvedalias(CASE WHEN ('k1 = 2) THEN 22 WHEN ('k1 = 4) THEN 44 ELSE 0 END, None), v#6]
+- SubqueryAlias t
   +- Project [_1#2 AS k#5, _2#3 AS v#6]
      +- LocalRelation [_1#2, _2#3]
```

### How was this patch tested?
N/A

Author: gatorsmile <[email protected]>

Closes apache#15316 from gatorsmile/eagerAnalysis.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants