[SPARK-11624][SPARK-11972][SQL]fix commands that need hive to exec #9589

adrian-wang · 2015-11-10T07:51:46Z

In SparkSQLCLI, we have created a CliSessionState, but then we call SparkSQLEnv.init(), which will start another SessionState. This would lead to exception because processCmd need to get the CliSessionState instance by calling SessionState.get(), but the return value would be a instance of SessionState. See the exception below.

spark-sql> !echo "test";
Exception in thread "main" java.lang.ClassCastException: org.apache.hadoop.hive.ql.session.SessionState cannot be cast to org.apache.hadoop.hive.cli.CliSessionState
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:112)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:301)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:242)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:691)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

SparkQA · 2015-11-10T09:58:08Z

Test build #45508 has finished for PR 9589 at commit e1716e2.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-11-12T08:14:48Z

Test build #45710 has finished for PR 9589 at commit 6f4e0e8.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

marmbrus · 2015-11-12T20:26:51Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/client/ClientWrapper.scala

-        } else {
-          logDebug(s"Hive Config: $k=$v")
+      val registeredState = SessionState.get
+      if (registeredState != null && registeredState.isInstanceOf[CliSessionState]) {


Please add a comment about why we need special handling here.

Note, it would be better if we didn't have to handle this specially. It make the control flow even more confusing than it already is.

I thought of adding flags to these function, but that seems too complicate. @marmbrus do you have a better idea here?

I don't understand why we need to have special handling for CLISessionState. It is complicating the dependencies and making the already complex control flow even harder to follow.

I guess the idea is to check if it contain user setting(i.e from .hiverc) or not. If we create new SessionState from scratch those info would be lost.

I'm not arguing against the change to avoid replacing the SessionState if it already exists. I'm asking why it has to check to see if it is a CliSessionState or not in order for this logic to work. Ideally we would not couple these components this closely. If we have to do so, then you need to explain why.

When we have a CliSessionState, we are using Spark SQL CLI, in this case we never need second SessionState here. Creating another SessionState would fail some cases since CliSessionState is inherited from SessionState, which could lead to ClassCastException.

In what case does a plain SessionState exist, where you need to create a new one?

SparkQA · 2015-11-13T06:33:00Z

Test build #45834 has finished for PR 9589 at commit 6203bc7.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

adrian-wang · 2015-11-13T06:38:51Z

retest this please.

SparkQA · 2015-11-13T09:38:48Z

Test build #45839 has finished for PR 9589 at commit 6203bc7.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jameszhouyi · 2015-11-25T06:20:34Z

Hi @adrian-wang ,
For SPARK-11972, the case passed now after applying the patch.Thanks !

jameszhouyi · 2015-12-02T08:09:53Z

This is critical bug. Strong hopefully it can be reviewed and merged in 1.6.0. Thanks !

marmbrus · 2015-12-02T18:17:08Z

Unfortunately RC1 has been cut already and changes to initialization are too likely to break things at this point.

jameszhouyi · 2015-12-04T02:37:45Z

Thanks @marmbrus for your response. This is regression bug(not found in 1.5.X) so hopefully it can be fixed in 1.6.0.

SparkQA · 2016-02-05T10:37:30Z

Test build #50813 has finished for PR 9589 at commit 88eef1f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

adrian-wang · 2016-02-14T03:13:21Z

@marmbrus We have instantiated and started a instance of CliSessionState, and when we init SparkSQLEnv, we will create a SessionState.

SparkQA · 2016-02-16T13:49:46Z

Test build #51358 has finished for PR 9589 at commit dfe248d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

adrian-wang · 2016-02-18T03:12:55Z

@marmbrus

marmbrus · 2016-02-19T21:04:39Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala

-          logDebug(s"Hive Config: $k=$v")
+      // originState will be created if not exists, will never be null
+      val originalState = SessionState.get()
+      if (originalState.isInstanceOf[CliSessionState]) {


What happens if you don't special case this? Why is this dependent on the type of the session state and not just on the fact that a session has already been started?

the SessionState.get() method would create a instance of SessionState if not exists.

yhuai · 2016-02-22T23:53:49Z

test this please

In SparkSQLCLI, we have created a `CliSessionState`, but then we call `SparkSQLEnv.init()`, which will start another `SessionState`. This would lead to exception because `processCmd` need to get the `CliSessionState` instance by calling `SessionState.get()`, but the return value would be a instance of `SessionState`. See the exception below. spark-sql> !echo "test"; Exception in thread "main" java.lang.ClassCastException: org.apache.hadoop.hive.ql.session.SessionState cannot be cast to org.apache.hadoop.hive.cli.CliSessionState at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:112) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:301) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:242) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:691) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Author: Daoyuan Wang <[email protected]> Closes #9589 from adrian-wang/clicommand. (cherry picked from commit 5d80fac) Signed-off-by: Michael Armbrust <[email protected]> Conflicts: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/ClientWrapper.scala

marmbrus · 2016-02-23T02:27:03Z

Merged to master and 1.6

SparkQA · 2016-02-23T02:38:48Z

Test build #51702 has finished for PR 9589 at commit dfe248d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

The `hive` subproject currently depends on `hive-cli` in order to perform a check to see whether a `SessionState` is an instance of `org.apache.hadoop.hive.cli.CliSessionState` (see #9589). The introduction of this `hive-cli` dependency has caused problems for users whose Hive metastore JAR classpaths don't include the `hive-cli` classes (such as in #11495). This patch removes this dependency on `hive-cli` and replaces the `isInstanceOf` check by reflection. I added a Maven Enforcer rule to ban `hive-cli` from the `hive` subproject in order to make sure that this dependency is not accidentally reintroduced. /cc rxin yhuai adrian-wang preecet Author: Josh Rosen <[email protected]> Closes #12551 from JoshRosen/remove-hive-cli-dep-from-hive-subproject.

marmbrus reviewed Nov 12, 2015
View reviewed changes

adrian-wang changed the title ~~[SPARK-11624][SQL]fix commands that need hive to exec~~ [SPARK-11624][SPARK-11972][SQL]fix commands that need hive to exec Nov 25, 2015

adrian-wang force-pushed the clicommand branch from 6203bc7 to 88eef1f Compare February 5, 2016 08:04

fix commands that need hive to exec

e8f1846

adrian-wang force-pushed the clicommand branch from 88eef1f to e8f1846 Compare February 16, 2016 10:53

refine tests

dfe248d

marmbrus reviewed Feb 19, 2016
View reviewed changes

asfgit closed this in 5d80fac Feb 23, 2016

JoshRosen mentioned this pull request Apr 21, 2016

[SPARK-14786] Remove hive-cli dependency from hive subproject #12551

Closed

yhuai mentioned this pull request Apr 27, 2016

[SPARK-14783] [SPARK-14786] [BRANCH-1.6] Preserve full exception stacktrace in IsolatedClientLoader and Remove hive-cli dependency from hive subproject #12724

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-11624][SPARK-11972][SQL]fix commands that need hive to exec #9589

[SPARK-11624][SPARK-11972][SQL]fix commands that need hive to exec #9589

adrian-wang commented Nov 10, 2015

SparkQA commented Nov 10, 2015

SparkQA commented Nov 12, 2015

marmbrus Nov 12, 2015

marmbrus Nov 12, 2015

adrian-wang Nov 13, 2015

marmbrus Dec 2, 2015

zhichao-li Dec 24, 2015

marmbrus Jan 4, 2016

adrian-wang Jan 5, 2016

marmbrus Jan 5, 2016

SparkQA commented Nov 13, 2015

adrian-wang commented Nov 13, 2015

SparkQA commented Nov 13, 2015

jameszhouyi commented Nov 25, 2015

jameszhouyi commented Dec 2, 2015

marmbrus commented Dec 2, 2015

jameszhouyi commented Dec 4, 2015

SparkQA commented Feb 5, 2016

adrian-wang commented Feb 14, 2016

SparkQA commented Feb 16, 2016

adrian-wang commented Feb 18, 2016

marmbrus Feb 19, 2016

adrian-wang Feb 20, 2016

yhuai commented Feb 22, 2016

marmbrus commented Feb 23, 2016

SparkQA commented Feb 23, 2016

[SPARK-11624][SPARK-11972][SQL]fix commands that need hive to exec #9589

[SPARK-11624][SPARK-11972][SQL]fix commands that need hive to exec #9589

Conversation

adrian-wang commented Nov 10, 2015

SparkQA commented Nov 10, 2015

SparkQA commented Nov 12, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Nov 13, 2015

adrian-wang commented Nov 13, 2015

SparkQA commented Nov 13, 2015

jameszhouyi commented Nov 25, 2015

jameszhouyi commented Dec 2, 2015

marmbrus commented Dec 2, 2015

jameszhouyi commented Dec 4, 2015

SparkQA commented Feb 5, 2016

adrian-wang commented Feb 14, 2016

SparkQA commented Feb 16, 2016

adrian-wang commented Feb 18, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yhuai commented Feb 22, 2016

marmbrus commented Feb 23, 2016

SparkQA commented Feb 23, 2016