Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-11624][SPARK-11972][SQL]fix commands that need hive to exec #9589

Closed
wants to merge 2 commits into from

Conversation

adrian-wang
Copy link
Contributor

In SparkSQLCLI, we have created a CliSessionState, but then we call SparkSQLEnv.init(), which will start another SessionState. This would lead to exception because processCmd need to get the CliSessionState instance by calling SessionState.get(), but the return value would be a instance of SessionState. See the exception below.

spark-sql> !echo "test";
Exception in thread "main" java.lang.ClassCastException: org.apache.hadoop.hive.ql.session.SessionState cannot be cast to org.apache.hadoop.hive.cli.CliSessionState
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:112)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:301)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:242)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:691)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

@SparkQA
Copy link

SparkQA commented Nov 10, 2015

Test build #45508 has finished for PR 9589 at commit e1716e2.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 12, 2015

Test build #45710 has finished for PR 9589 at commit 6f4e0e8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

} else {
logDebug(s"Hive Config: $k=$v")
val registeredState = SessionState.get
if (registeredState != null && registeredState.isInstanceOf[CliSessionState]) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment about why we need special handling here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, it would be better if we didn't have to handle this specially. It make the control flow even more confusing than it already is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought of adding flags to these function, but that seems too complicate. @marmbrus do you have a better idea here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why we need to have special handling for CLISessionState. It is complicating the dependencies and making the already complex control flow even harder to follow.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the idea is to check if it contain user setting(i.e from .hiverc) or not. If we create new SessionState from scratch those info would be lost.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not arguing against the change to avoid replacing the SessionState if it already exists. I'm asking why it has to check to see if it is a CliSessionState or not in order for this logic to work. Ideally we would not couple these components this closely. If we have to do so, then you need to explain why.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we have a CliSessionState, we are using Spark SQL CLI, in this case we never need second SessionState here. Creating another SessionState would fail some cases since CliSessionState is inherited from SessionState, which could lead to ClassCastException.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what case does a plain SessionState exist, where you need to create a new one?

@SparkQA
Copy link

SparkQA commented Nov 13, 2015

Test build #45834 has finished for PR 9589 at commit 6203bc7.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@adrian-wang
Copy link
Contributor Author

retest this please.

@SparkQA
Copy link

SparkQA commented Nov 13, 2015

Test build #45839 has finished for PR 9589 at commit 6203bc7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@adrian-wang adrian-wang changed the title [SPARK-11624][SQL]fix commands that need hive to exec [SPARK-11624][SPARK-11972][SQL]fix commands that need hive to exec Nov 25, 2015
@jameszhouyi
Copy link

Hi @adrian-wang ,
For SPARK-11972, the case passed now after applying the patch.Thanks !

@jameszhouyi
Copy link

This is critical bug. Strong hopefully it can be reviewed and merged in 1.6.0. Thanks !

@marmbrus
Copy link
Contributor

marmbrus commented Dec 2, 2015

Unfortunately RC1 has been cut already and changes to initialization are too likely to break things at this point.

@jameszhouyi
Copy link

Thanks @marmbrus for your response. This is regression bug(not found in 1.5.X) so hopefully it can be fixed in 1.6.0.

@SparkQA
Copy link

SparkQA commented Feb 5, 2016

Test build #50813 has finished for PR 9589 at commit 88eef1f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@adrian-wang
Copy link
Contributor Author

@marmbrus We have instantiated and started a instance of CliSessionState, and when we init SparkSQLEnv, we will create a SessionState.

@SparkQA
Copy link

SparkQA commented Feb 16, 2016

Test build #51358 has finished for PR 9589 at commit dfe248d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@adrian-wang
Copy link
Contributor Author

@marmbrus

logDebug(s"Hive Config: $k=$v")
// originState will be created if not exists, will never be null
val originalState = SessionState.get()
if (originalState.isInstanceOf[CliSessionState]) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if you don't special case this? Why is this dependent on the type of the session state and not just on the fact that a session has already been started?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the SessionState.get() method would create a instance of SessionState if not exists.

@yhuai
Copy link
Contributor

yhuai commented Feb 22, 2016

test this please

@asfgit asfgit closed this in 5d80fac Feb 23, 2016
asfgit pushed a commit that referenced this pull request Feb 23, 2016
In SparkSQLCLI, we have created a `CliSessionState`, but then we call `SparkSQLEnv.init()`, which will start another `SessionState`. This would lead to exception because `processCmd` need to get the `CliSessionState` instance by calling `SessionState.get()`, but the return value would be a instance of `SessionState`. See the exception below.

spark-sql> !echo "test";
Exception in thread "main" java.lang.ClassCastException: org.apache.hadoop.hive.ql.session.SessionState cannot be cast to org.apache.hadoop.hive.cli.CliSessionState
	at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:112)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:301)
	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:242)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:691)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Author: Daoyuan Wang <[email protected]>

Closes #9589 from adrian-wang/clicommand.

(cherry picked from commit 5d80fac)
Signed-off-by: Michael Armbrust <[email protected]>

Conflicts:
	sql/hive/src/main/scala/org/apache/spark/sql/hive/client/ClientWrapper.scala
@marmbrus
Copy link
Contributor

Merged to master and 1.6

@SparkQA
Copy link

SparkQA commented Feb 23, 2016

Test build #51702 has finished for PR 9589 at commit dfe248d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

asfgit pushed a commit that referenced this pull request Apr 21, 2016
The `hive` subproject currently depends on `hive-cli` in order to perform a check to see whether a `SessionState` is an instance of `org.apache.hadoop.hive.cli.CliSessionState` (see #9589). The introduction of this `hive-cli` dependency has caused problems for users whose Hive metastore JAR classpaths don't include the `hive-cli` classes (such as in #11495).

This patch removes this dependency on `hive-cli` and replaces the `isInstanceOf` check by reflection. I added a Maven Enforcer rule to ban `hive-cli` from the `hive` subproject in order to make sure that this dependency is not accidentally reintroduced.

/cc rxin yhuai adrian-wang preecet

Author: Josh Rosen <[email protected]>

Closes #12551 from JoshRosen/remove-hive-cli-dep-from-hive-subproject.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants