-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-17324] [SQL] Remove Direct Usage of HiveClient in InsertIntoHiveTable #14888
[SPARK-17324] [SQL] Remove Direct Usage of HiveClient in InsertIntoHiveTable #14888
Conversation
Test build #64694 has finished for PR 14888 at commit
|
cc @cloud-fan @yhuai |
overwrite, | ||
numDynamicPartitions, | ||
holdDDLTime = holdDDLTime, | ||
listBucketingEnabled = false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is isSkewedStoreAsSubdir
parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The last parameter of loadDynamicPartitions
is listBucketingEnabled
. loadDynamicPartitions
does not have a parameter for isSkewedStoreAsSubdir
. Here, we fix a bug. : )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's listBucketingEnabled
used for? Will we need it in the future? If so, we also need to add a TODO here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will answer the difference between listBucketingEnabled
and isSkewedStoreAsSubdir
tomorrrow. Need more time to do the investigation. : )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uh... isSkewedStoreAsSubdir
and listBucketingEnabled
are the same!!! loadDynamicPartitions
is using listBucketingEnabled
, but loadPartition
is using isSkewedStoreAsSubdir
.
loadDynamicPartitions
eventually passes the value of listBucketingEnabled
to isSkewedStoreAsSubdir
when calling loadPartition
.
To differ list bucketing tables from normal skewed tables, they use an optional parameter store as DIRECTORIES
. See the JIRA HIVE-3649.
Actually, we can remove TODO for either isSkewedStoreAsSubdir
or loadDynamicPartitions
, because we always ignore this feature in getTableOption
. We can set it in HiveClientImpl
, and then we do not need to pollute the externalCatalog APIs
// inheritTableSpecs is set to true. It should be set to false for an IMPORT query | ||
// which is currently considered as a Hive native command. | ||
val inheritTableSpecs = true | ||
// TODO: Correctly set isSkewedStoreAsSubdir. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This TODO is removed. We set it in HiveClientImpl
.
cc @cloud-fan @yhuai
Test build #64884 has finished for PR 14888 at commit
|
inheritTableSpecs: Boolean, | ||
isSkewedStoreAsSubdir: Boolean): Unit = withHiveState { | ||
inheritTableSpecs: Boolean): Unit = withHiveState { | ||
val hiveTable = client.getTable(dbName, tableName, true /* throw exception */) | ||
shim.loadPartition( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the table is not exist, will we throw exception in shim.loadPartition
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. Actually, we throw the exception earlier. Before insert or load data, we always ensure that the target table exists at first; otherwise, we will issue an exception.
LGTM, merging to master! |
What changes were proposed in this pull request?
This is another step to get rid of HiveClient from
HiveSessionState
. All the metastore interactions should be throughExternalCatalog
interface. However, the existing implementation ofInsertIntoHiveTable
still requires Hive clients. This PR is to remove HiveClient by moving the metastore interactions intoExternalCatalog
.How was this patch tested?
Existing test cases