Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-35102][SQL] Make spark.sql.hive.version read-only, not deprecated and meaningful #32200

Closed
wants to merge 4 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,10 @@ private[hive] object SparkSQLEnv extends Logging {
.set(SQLConf.DATETIME_JAVA8API_ENABLED, true)


val sparkSession = SparkSession.builder.config(sparkConf).enableHiveSupport().getOrCreate()
val sparkSession = SparkSession.builder()
.config(sparkConf)
.config(HiveUtils.BUILTIN_HIVE_VERSION.key, HiveUtils.builtinHiveVersion)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to set it here? it's read-only now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like another problem at the read-side that we need to look into. this PR just change it to read-only but still need to explicitly set the default value like before (https://github.com/apache/spark/pull/32200/files#diff-d6fa3cebb959651e721093f193f2c3b7197d7d0d1c503e34860aa048164b6b0eL66 and https://github.com/apache/spark/pull/32200/files#diff-4ade9276f3d22d5c47a42e1cd96dd6125e8ecc7d38e3f5f6841b8109936463b4L67). Otherwise, the existing and newly added tests would fail to get it.

.enableHiveSupport().getOrCreate()
sparkContext = sparkSession.sparkContext
sqlContext = sparkSession.sqlContext

Expand All @@ -63,7 +66,6 @@ private[hive] object SparkSQLEnv extends Logging {
metadataHive.setOut(new PrintStream(System.out, true, UTF_8.name()))
metadataHive.setInfo(new PrintStream(System.err, true, UTF_8.name()))
metadataHive.setError(new PrintStream(System.err, true, UTF_8.name()))
sparkSession.conf.set(HiveUtils.FAKE_HIVE_VERSION.key, HiveUtils.builtinHiveVersion)
}
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,6 @@ import org.apache.hive.service.server.HiveServer2

import org.apache.spark.internal.Logging
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.hive.HiveUtils
import org.apache.spark.sql.hive.thriftserver.ReflectionUtils._
import org.apache.spark.sql.hive.thriftserver.server.SparkSQLOperationManager
import org.apache.spark.sql.internal.SQLConf
Expand Down Expand Up @@ -64,7 +63,6 @@ private[hive] class SparkSQLSessionManager(hiveServer: HiveServer2, sqlContext:
} else {
sqlContext.newSession()
}
ctx.setConf(HiveUtils.FAKE_HIVE_VERSION.key, HiveUtils.builtinHiveVersion)
ctx.setConf(SQLConf.DATETIME_JAVA8API_ENABLED, true)
val hiveSessionState = session.getSessionState
setConfMap(ctx, hiveSessionState.getOverriddenConfigurations)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ import org.scalatest.BeforeAndAfterAll

import org.apache.spark.SparkFunSuite
import org.apache.spark.internal.Logging
import org.apache.spark.sql.hive.HiveUtils._
import org.apache.spark.sql.hive.test.HiveTestJars
import org.apache.spark.sql.internal.StaticSQLConf
import org.apache.spark.sql.test.ProcessTestUtils.ProcessOutputCapturer
Expand Down Expand Up @@ -601,4 +602,13 @@ class CliSuite extends SparkFunSuite with BeforeAndAfterAll with Logging {
"SELECT 'SPARK-35086' AS c1, '--verbose' AS c2"
)
}

test("SPARK-35102: Make spark.sql.hive.version meaningful and not deprecated") {
runCliWithin(1.minute,
Seq("--conf", "spark.sql.hive.version=0.1"),
Seq(s"please use ${HIVE_METASTORE_VERSION.key}"))("" -> "")
runCliWithin(2.minute,
Seq("--conf", s"${BUILTIN_HIVE_VERSION.key}=$builtinHiveVersion"))(
s"set ${BUILTIN_HIVE_VERSION.key};" -> builtinHiveVersion, "SET -v;" -> builtinHiveVersion)
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -547,7 +547,7 @@ class HiveThriftBinaryServerSuite extends HiveThriftServer2Test {
conf += resultSet.getString(1) -> resultSet.getString(2)
}

assert(conf.get(HiveUtils.FAKE_HIVE_VERSION.key) === Some(HiveUtils.builtinHiveVersion))
assert(conf.get(HiveUtils.BUILTIN_HIVE_VERSION.key) === Some(HiveUtils.builtinHiveVersion))
}
}

Expand All @@ -560,7 +560,7 @@ class HiveThriftBinaryServerSuite extends HiveThriftServer2Test {
conf += resultSet.getString(1) -> resultSet.getString(2)
}

assert(conf.get(HiveUtils.FAKE_HIVE_VERSION.key) === Some(HiveUtils.builtinHiveVersion))
assert(conf.get(HiveUtils.BUILTIN_HIVE_VERSION.key) === Some(HiveUtils.builtinHiveVersion))
}
}

Expand Down
25 changes: 17 additions & 8 deletions sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ import java.util.concurrent.TimeUnit

import scala.collection.JavaConverters._
import scala.collection.mutable.HashMap
import scala.util.Try

import org.apache.commons.lang3.{JavaVersion, SystemUtils}
import org.apache.hadoop.conf.Configuration
Expand Down Expand Up @@ -53,22 +54,30 @@ private[spark] object HiveUtils extends Logging {
/** The version of hive used internally by Spark SQL. */
val builtinHiveVersion: String = HiveVersionInfo.getVersion

val BUILTIN_HIVE_VERSION = buildStaticConf("spark.sql.hive.version")
.doc("The compiled, a.k.a, builtin Hive version of the Spark distribution bundled with." +
" Note that, this a read-only conf and only used to report the built-in hive version." +
" If you want a different metastore client for Spark to call, please refer to" +
" spark.sql.hive.metastore.version.")
.version("1.1.1")
.stringConf
.checkValue(_ == builtinHiveVersion,
"The builtin Hive version is read-only, please use spark.sql.hive.metastore.version")
.createWithDefault(builtinHiveVersion)

private def isCompatibleHiveVersion(hiveVersionStr: String): Boolean = {
Try { IsolatedClientLoader.hiveVersion(hiveVersionStr) }.isSuccess
}

val HIVE_METASTORE_VERSION = buildStaticConf("spark.sql.hive.metastore.version")
.doc("Version of the Hive metastore. Available options are " +
"<code>0.12.0</code> through <code>2.3.8</code> and " +
"<code>3.0.0</code> through <code>3.1.2</code>.")
.version("1.4.0")
.stringConf
.checkValue(isCompatibleHiveVersion, "Unsupported Hive Metastore version")
.createWithDefault(builtinHiveVersion)

// A fake config which is only here for backward compatibility reasons. This config has no effect
// to Spark, just for reporting the builtin Hive version of Spark to existing applications that
// already rely on this config.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to keep this comment above?

Copy link
Member Author

@yaooqinn yaooqinn Apr 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's out of date and the new document of the config is more detailed and specific.

val FAKE_HIVE_VERSION = buildConf("spark.sql.hive.version")
.doc(s"deprecated, please use ${HIVE_METASTORE_VERSION.key} to get the Hive version in Spark.")
.version("1.1.1")
.fallbackConf(HIVE_METASTORE_VERSION)

val HIVE_METASTORE_JARS = buildStaticConf("spark.sql.hive.metastore.jars")
.doc(s"""
| Location of the jars that should be used to instantiate the HiveMetastoreClient.
Expand Down