-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-3791][SQL] Provides Spark version and Hive version in HiveThriftServer2 #2843
Conversation
.setAppName(s"SparkSQL::${java.net.InetAddress.getLocalHost.getHostName}")) | ||
val sparkConf = new SparkConf() | ||
.setAppName(s"SparkSQL::${java.net.InetAddress.getLocalHost.getHostName}") | ||
.set("spark.sql.hive.version", "0.12.0-protobuf-2.5") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This need to be generalized.
QA tests have started for PR 2843 at commit
|
QA tests have finished for PR 2843 at commit
|
Test FAILed. |
@@ -37,35 +43,81 @@ import org.apache.spark.sql.catalyst.util.getTempFilePath | |||
|
|||
/** | |||
* Tests for the HiveThriftServer2 using JDBC. | |||
* | |||
* NOTE: SPARK_PREPEND_CLASSES is explicitly disabled in this test suite. Assembly jar must be | |||
* rebuilt after changing HiveThriftServer2 related code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This requirement should be OK for Jenkins, since Jenkins always build the assembly jar before executing any test suites.
QA tests have started for PR 2843 at commit
|
QA tests have started for PR 2843 at commit
|
QA tests have finished for PR 2843 at commit
|
QA tests have started for PR 2843 at commit
|
Hm, 3 consecutive random build failures, embarrassing... For the first one, unit tests are not started at all, seems that the build process was interrupted somehow. The second failure is bit weird, although we're already using random port to avoid port conflict, it still failed to open the listening port. Checked the TCP port range in Jenkins master node, which should be valid. But I don't have access to the Jenkins slave node that executed this build. The cause of the third failure is a known bug fixed in the master branch, just rebased to the most recent master. |
Tests timed out for PR 2843 at commit |
QA tests have finished for PR 2843 at commit
|
Test PASSed. |
} | ||
|
||
sql(s"SET ${testKey + testKey}=${testVal + testVal}") | ||
assert(hiveconf.get(testKey + testKey, "") == testVal + testVal) | ||
assertResult(Set(testKey -> testVal, (testKey + testKey) -> (testVal + testVal))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These lines are removed because they were originally for testing the deprecated hql
call. At that time sql
and hql
have different code paths. Later on those hql
calls were changed to sql
to avoid compile time deprecation warning, and this makes them absolutely duplicated code.
Test build #22610 has started for PR 2843 at commit
|
Test build #22610 has finished for PR 2843 at commit
|
Test PASSed. |
retest this please |
@marmbrus This should be ready to go once Jenkins says OK. Simba ODBC driver needs this change for the |
Test build #22661 has started for PR 2843 at commit
|
Test build #22661 has finished for PR 2843 at commit
|
Test FAILed. |
Fixed failed tests and rebased to the most recent master (with full Hive 0.13.1 support). |
Test build #22691 has started for PR 2843 at commit
|
Can you please rebase? |
Done rebasing. |
Test build #22728 has started for PR 2843 at commit
|
Test build #22728 has finished for PR 2843 at commit
|
Test FAILed. |
retest this please |
Test build #22748 has started for PR 2843 at commit
|
Test build #22748 has finished for PR 2843 at commit
|
Test FAILed. |
retest this please |
Test build #22759 has started for PR 2843 at commit
|
The previous test failure are caused by the flaky |
Test build #22759 has finished for PR 2843 at commit
|
Test PASSed. |
Thanks! Merged to master. |
This PR backports apache#2843 to branch-1.1. The key difference is that this one doesn't support Hive 0.13.1 and thus always returns `0.12.0` when `spark.sql.hive.version` is queried. 6 other commits on which apache#2843 depends were also backported, they are: - apache#2887 for `SessionState` lifecycle control - apache#2675, apache#2823 & apache#3060 for major test suite refactoring and bug fixes - apache#2164, for Parquet test suites updates - apache#2493, for reading `spark.sql.*` configurations Author: Cheng Lian <[email protected]> Author: Cheng Lian <[email protected]> Author: Michael Armbrust <[email protected]> Closes apache#3113 from liancheng/get-info-for-1.1 and squashes the following commits: d354161 [Cheng Lian] Provides Spark and Hive version in HiveThriftServer2 for branch-1.1 0c2a244 [Michael Armbrust] [SPARK-3646][SQL] Copy SQL configuration from SparkConf when a SQLContext is created. 3202a36 [Michael Armbrust] [SQL] Decrease partitions when testing 7f395b7 [Cheng Lian] [SQL] Fixes race condition in CliSuite 0dd28ec [Cheng Lian] [SQL] Fixes the race condition that may cause test failure 5928b39 [Cheng Lian] [SPARK-3809][SQL] Fixes test suites in hive-thriftserver faeca62 [Cheng Lian] [SPARK-4037][SQL] Removes the SessionState instance created in HiveThriftServer2
…perty ## What changes were proposed in this pull request? At the beginning #2843 added `spark.sql.hive.version` to reveal underlying hive version for jdbc connections. For some time afterwards, it was used as a version identifier for the execution hive client. Actually there is no hive client for executions in spark now and there are no usages of HIVE_EXECUTION_VERSION found in whole spark project. HIVE_EXECUTION_VERSION is set by `spark.sql.hive.version`, which is still set internally in some places or by users, this may confuse developers and users with HIVE_METASTORE_VERSION(spark.sql.hive.metastore.version). It might better to be removed. ## How was this patch tested? modify some existing ut cc cloud-fan gatorsmile Author: Kent Yao <[email protected]> Closes #19712 from yaooqinn/SPARK-22487.
This PR overrides the
GetInfo
Hive Thrift API to provide correct Spark version information. Another propertyspark.sql.hive.version
is added to reveal the underlying Hive version. These are generally useful for Spark SQL ODBC driver providers. Also took the chance to remove theSET -v
hack, which was a workaround for Simba ODBC driver connectivity.TODO
Find a general way to figure out Hive (or even any dependency) version.
This blog post suggests several methods to inspect application version. In the case of Spark, this can be tricky because the chosen method:
must applies to both Maven build and SBT build
For Maven builds, we can retrieve the version information from the META-INF/maven directory within the assembly jar. But this doesn't work for SBT builds.
must not rely on the original jars of dependencies to extract specific dependency version, because Spark uses assembly jar.
This implies we can't read Hive version from Hive jar files since standard Spark distribution doesn't include them.
should play well with
SPARK_PREPEND_CLASSES
to ease local testing during development.SPARK_PREPEND_CLASSES
prevents classes to be loaded from the assembly jar, thus we can't locate the jar file and read its manifest.Given these, maybe the only reliable method is to generate a source file containing version information at build time. @pwendell Do you have any suggestions from the perspective of the build process?
Update Hive version is now retrieved from the newly introduced
HiveShim
object.