Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-17324] [SQL] Remove Direct Usage of HiveClient in InsertIntoHiveTable #14888

Closed

Conversation

gatorsmile
Copy link
Member

What changes were proposed in this pull request?

This is another step to get rid of HiveClient from HiveSessionState. All the metastore interactions should be through ExternalCatalog interface. However, the existing implementation of InsertIntoHiveTable still requires Hive clients. This PR is to remove HiveClient by moving the metastore interactions into ExternalCatalog.

How was this patch tested?

Existing test cases

@SparkQA
Copy link

SparkQA commented Aug 31, 2016

Test build #64694 has finished for PR 14888 at commit d03e65d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

cc @cloud-fan @yhuai

overwrite,
numDynamicPartitions,
holdDDLTime = holdDDLTime,
listBucketingEnabled = false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is isSkewedStoreAsSubdir parameter?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last parameter of loadDynamicPartitions is listBucketingEnabled. loadDynamicPartitions does not have a parameter for isSkewedStoreAsSubdir. Here, we fix a bug. : )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's listBucketingEnabled used for? Will we need it in the future? If so, we also need to add a TODO here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will answer the difference between listBucketingEnabled and isSkewedStoreAsSubdir tomorrrow. Need more time to do the investigation. : )

Copy link
Member Author

@gatorsmile gatorsmile Sep 2, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uh... isSkewedStoreAsSubdir and listBucketingEnabled are the same!!! loadDynamicPartitions is using listBucketingEnabled, but loadPartition is using isSkewedStoreAsSubdir.

loadDynamicPartitions eventually passes the value of listBucketingEnabled to isSkewedStoreAsSubdir when calling loadPartition.

To differ list bucketing tables from normal skewed tables, they use an optional parameter store as DIRECTORIES. See the JIRA HIVE-3649.

Actually, we can remove TODO for either isSkewedStoreAsSubdir or loadDynamicPartitions, because we always ignore this feature in getTableOption. We can set it in HiveClientImpl, and then we do not need to pollute the externalCatalog APIs

// inheritTableSpecs is set to true. It should be set to false for an IMPORT query
// which is currently considered as a Hive native command.
val inheritTableSpecs = true
// TODO: Correctly set isSkewedStoreAsSubdir.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This TODO is removed. We set it in HiveClientImpl.
cc @cloud-fan @yhuai

@SparkQA
Copy link

SparkQA commented Sep 3, 2016

Test build #64884 has finished for PR 14888 at commit 40c8cc1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

inheritTableSpecs: Boolean,
isSkewedStoreAsSubdir: Boolean): Unit = withHiveState {
inheritTableSpecs: Boolean): Unit = withHiveState {
val hiveTable = client.getTable(dbName, tableName, true /* throw exception */)
shim.loadPartition(
Copy link
Contributor

@cloud-fan cloud-fan Sep 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the table is not exist, will we throw exception in shim.loadPartition?

Copy link
Member Author

@gatorsmile gatorsmile Sep 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. Actually, we throw the exception earlier. Before insert or load data, we always ensure that the target table exists at first; otherwise, we will issue an exception.

@cloud-fan
Copy link
Contributor

LGTM, merging to master!

@asfgit asfgit closed this in 6b156e2 Sep 4, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants