Skip to content

Commit

Permalink
[SPARK-35102][SQL] Make spark.sql.hive.version read-only, not depreca…
Browse files Browse the repository at this point in the history
…ted and meaningful

### What changes were proposed in this pull request?

Firstly let's take a look at the definition and comment.

```
// A fake config which is only here for backward compatibility reasons. This config has no effect
// to Spark, just for reporting the builtin Hive version of Spark to existing applications that
// already rely on this config.
val FAKE_HIVE_VERSION = buildConf("spark.sql.hive.version")
  .doc(s"deprecated, please use ${HIVE_METASTORE_VERSION.key} to get the Hive version in Spark.")
  .version("1.1.1")
  .fallbackConf(HIVE_METASTORE_VERSION)
```
It is used for reporting the built-in Hive version but the current status is unsatisfactory, as it is could be changed in many ways e.g. --conf/SET syntax.

It is marked as deprecated but kept a long way until now. I guess it is hard for us to remove it and not even necessary.

On second thought, it's actually good for us to keep it to work with the `spark.sql.hive.metastore.version`. As when `spark.sql.hive.metastore.version` is changed, it could be used to report the compiled hive version statically, it's useful when an error occurs in this case. So this parameter should be fixed to compiled hive version.

### Why are the changes needed?

`spark.sql.hive.version` is useful in certain cases and should be read-only

### Does this PR introduce _any_ user-facing change?

`spark.sql.hive.version` now is read-only

### How was this patch tested?

new test cases

Closes #32200 from yaooqinn/SPARK-35102.

Authored-by: Kent Yao <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
  • Loading branch information
yaooqinn authored and cloud-fan committed Apr 19, 2021
1 parent 425dc58 commit 2d161cb
Show file tree
Hide file tree
Showing 5 changed files with 33 additions and 14 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,10 @@ private[hive] object SparkSQLEnv extends Logging {
.set(SQLConf.DATETIME_JAVA8API_ENABLED, true)


val sparkSession = SparkSession.builder.config(sparkConf).enableHiveSupport().getOrCreate()
val sparkSession = SparkSession.builder()
.config(sparkConf)
.config(HiveUtils.BUILTIN_HIVE_VERSION.key, HiveUtils.builtinHiveVersion)
.enableHiveSupport().getOrCreate()
sparkContext = sparkSession.sparkContext
sqlContext = sparkSession.sqlContext

Expand All @@ -63,7 +66,6 @@ private[hive] object SparkSQLEnv extends Logging {
metadataHive.setOut(new PrintStream(System.out, true, UTF_8.name()))
metadataHive.setInfo(new PrintStream(System.err, true, UTF_8.name()))
metadataHive.setError(new PrintStream(System.err, true, UTF_8.name()))
sparkSession.conf.set(HiveUtils.FAKE_HIVE_VERSION.key, HiveUtils.builtinHiveVersion)
}
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,6 @@ import org.apache.hive.service.server.HiveServer2

import org.apache.spark.internal.Logging
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.hive.HiveUtils
import org.apache.spark.sql.hive.thriftserver.ReflectionUtils._
import org.apache.spark.sql.hive.thriftserver.server.SparkSQLOperationManager
import org.apache.spark.sql.internal.SQLConf
Expand Down Expand Up @@ -64,7 +63,6 @@ private[hive] class SparkSQLSessionManager(hiveServer: HiveServer2, sqlContext:
} else {
sqlContext.newSession()
}
ctx.setConf(HiveUtils.FAKE_HIVE_VERSION.key, HiveUtils.builtinHiveVersion)
ctx.setConf(SQLConf.DATETIME_JAVA8API_ENABLED, true)
val hiveSessionState = session.getSessionState
setConfMap(ctx, hiveSessionState.getOverriddenConfigurations)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ import org.scalatest.BeforeAndAfterAll

import org.apache.spark.SparkFunSuite
import org.apache.spark.internal.Logging
import org.apache.spark.sql.hive.HiveUtils._
import org.apache.spark.sql.hive.test.HiveTestJars
import org.apache.spark.sql.internal.StaticSQLConf
import org.apache.spark.sql.test.ProcessTestUtils.ProcessOutputCapturer
Expand Down Expand Up @@ -601,4 +602,13 @@ class CliSuite extends SparkFunSuite with BeforeAndAfterAll with Logging {
"SELECT 'SPARK-35086' AS c1, '--verbose' AS c2"
)
}

test("SPARK-35102: Make spark.sql.hive.version meaningful and not deprecated") {
runCliWithin(1.minute,
Seq("--conf", "spark.sql.hive.version=0.1"),
Seq(s"please use ${HIVE_METASTORE_VERSION.key}"))("" -> "")
runCliWithin(2.minute,
Seq("--conf", s"${BUILTIN_HIVE_VERSION.key}=$builtinHiveVersion"))(
s"set ${BUILTIN_HIVE_VERSION.key};" -> builtinHiveVersion, "SET -v;" -> builtinHiveVersion)
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -547,7 +547,7 @@ class HiveThriftBinaryServerSuite extends HiveThriftServer2Test {
conf += resultSet.getString(1) -> resultSet.getString(2)
}

assert(conf.get(HiveUtils.FAKE_HIVE_VERSION.key) === Some(HiveUtils.builtinHiveVersion))
assert(conf.get(HiveUtils.BUILTIN_HIVE_VERSION.key) === Some(HiveUtils.builtinHiveVersion))
}
}

Expand All @@ -560,7 +560,7 @@ class HiveThriftBinaryServerSuite extends HiveThriftServer2Test {
conf += resultSet.getString(1) -> resultSet.getString(2)
}

assert(conf.get(HiveUtils.FAKE_HIVE_VERSION.key) === Some(HiveUtils.builtinHiveVersion))
assert(conf.get(HiveUtils.BUILTIN_HIVE_VERSION.key) === Some(HiveUtils.builtinHiveVersion))
}
}

Expand Down
25 changes: 17 additions & 8 deletions sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ import java.util.concurrent.TimeUnit

import scala.collection.JavaConverters._
import scala.collection.mutable.HashMap
import scala.util.Try

import org.apache.commons.lang3.{JavaVersion, SystemUtils}
import org.apache.hadoop.conf.Configuration
Expand Down Expand Up @@ -53,22 +54,30 @@ private[spark] object HiveUtils extends Logging {
/** The version of hive used internally by Spark SQL. */
val builtinHiveVersion: String = HiveVersionInfo.getVersion

val BUILTIN_HIVE_VERSION = buildStaticConf("spark.sql.hive.version")
.doc("The compiled, a.k.a, builtin Hive version of the Spark distribution bundled with." +
" Note that, this a read-only conf and only used to report the built-in hive version." +
" If you want a different metastore client for Spark to call, please refer to" +
" spark.sql.hive.metastore.version.")
.version("1.1.1")
.stringConf
.checkValue(_ == builtinHiveVersion,
"The builtin Hive version is read-only, please use spark.sql.hive.metastore.version")
.createWithDefault(builtinHiveVersion)

private def isCompatibleHiveVersion(hiveVersionStr: String): Boolean = {
Try { IsolatedClientLoader.hiveVersion(hiveVersionStr) }.isSuccess
}

val HIVE_METASTORE_VERSION = buildStaticConf("spark.sql.hive.metastore.version")
.doc("Version of the Hive metastore. Available options are " +
"<code>0.12.0</code> through <code>2.3.8</code> and " +
"<code>3.0.0</code> through <code>3.1.2</code>.")
.version("1.4.0")
.stringConf
.checkValue(isCompatibleHiveVersion, "Unsupported Hive Metastore version")
.createWithDefault(builtinHiveVersion)

// A fake config which is only here for backward compatibility reasons. This config has no effect
// to Spark, just for reporting the builtin Hive version of Spark to existing applications that
// already rely on this config.
val FAKE_HIVE_VERSION = buildConf("spark.sql.hive.version")
.doc(s"deprecated, please use ${HIVE_METASTORE_VERSION.key} to get the Hive version in Spark.")
.version("1.1.1")
.fallbackConf(HIVE_METASTORE_VERSION)

val HIVE_METASTORE_JARS = buildStaticConf("spark.sql.hive.metastore.jars")
.doc(s"""
| Location of the jars that should be used to instantiate the HiveMetastoreClient.
Expand Down

0 comments on commit 2d161cb

Please sign in to comment.