-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-19129] [SQL] SessionCatalog: Disallow empty part col values in partition spec #16583
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -471,6 +471,7 @@ private[hive] class HiveClientImpl( | |
// do the check at first and collect all the matching partitions | ||
val matchingParts = | ||
specs.flatMap { s => | ||
assert(s.values.forall(_.nonEmpty), s"partition spec '$s' is invalid") | ||
// The provided spec here can be a partial spec, i.e. it will match all partitions | ||
// whose specs are supersets of this partial spec. E.g. If a table has partitions | ||
// (b='1', c='1') and (b='1', c='2'), a partial spec of (b='1') will match both. | ||
|
@@ -545,6 +546,7 @@ private[hive] class HiveClientImpl( | |
// -1 for result limit means "no limit/return all" | ||
client.getPartitionNames(table.database, table.identifier.table, -1) | ||
case Some(s) => | ||
assert(s.values.forall(_.nonEmpty), s"partition spec '$s' is invalid") | ||
client.getPartitionNames(table.database, table.identifier.table, s.asJava, -1) | ||
} | ||
hivePartitionNames.asScala.sorted | ||
|
@@ -568,7 +570,9 @@ private[hive] class HiveClientImpl( | |
val hiveTable = toHiveTable(table) | ||
val parts = spec match { | ||
case None => shim.getAllPartitions(client, hiveTable).map(fromHivePartition) | ||
case Some(s) => client.getPartitions(hiveTable, s.asJava).asScala.map(fromHivePartition) | ||
case Some(s) => | ||
assert(s.values.forall(_.nonEmpty), s"partition spec '$s' is invalid") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. shall we also add the assert in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, it has the same issue. |
||
client.getPartitions(hiveTable, s.asJava).asScala.map(fromHivePartition) | ||
} | ||
HiveCatalogMetrics.incrementFetchedPartitions(parts.length) | ||
parts | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -247,6 +247,16 @@ class HiveDDLSuite | |
} | ||
} | ||
|
||
test("SPARK-19129: drop partition with a empty string will drop the whole table") { | ||
val df = spark.createDataFrame(Seq((0, "a"), (1, "b"))).toDF("partCol1", "name") | ||
df.write.mode("overwrite").partitionBy("partCol1").saveAsTable("partitionedTable") | ||
val e = intercept[AnalysisException] { | ||
spark.sql("alter table partitionedTable drop partition(partCol1='')") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what's the behavior of hive? also throw exception? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hive (v2.1.1) does not throw exception / error message here.
Given that (creating / inserting / querying) partitions with empty string is not allowed, DROP PARTITIONS going through seems inconsistent behavior to me. It might have made sense for supporting regexes but as per Hive language specification, partition spec has to be a plain string. If there is no way to create partitions with empty partition column name, allowing DROP seems werid. +1 for throwing exception ..... unless the general consensus about hive compatibility is to be exact same behavior (including such weirdness).
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @tejasapatil Thank you for your research So far, we are not completely following Hive in the partition-related DDL commands. Thus, this PR is to follow the same way to block the invalid inputs. That is, throwing an exception when the input partition spec is not valid. |
||
}.getMessage | ||
assert(e.contains("Partition spec is invalid. The spec ([partCol1=]) contains an empty " + | ||
"partition column value")) | ||
} | ||
|
||
test("add/drop partitions - external table") { | ||
val catalog = spark.sessionState.catalog | ||
withTempDir { tmpDir => | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we merge this test with https://github.com/apache/spark/pull/16583/files#diff-68b981fa0a91ef20dc032d93ad0fdc52R949?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The above one is verifying the
catalog.listPartitionNames
and this one is verifyingcatalog.listPartitions
. Should we keep them separate?