[SPARK-19359][SQL] Revert Clear useless path after rename a partition with upper-case by HiveExternalCatalog #16728

gatorsmile · 2017-01-28T19:33:08Z

What changes were proposed in this pull request?

This PR is to revert the changes made in #16700. It could cause the data loss after partition rename, because we have a bug in the file renaming.

Not all the OSs have the same behaviors. For example, on mac OS, if we renaming a path from .../tbl/a=5/b=6 to .../tbl/A=5/B=6. The result is .../tbl/a=5/B=6. The expected result is .../tbl/A=5/B=6. Thus, renaming on mac OS is not recursive. However, the systems used in Jenkin does not have such an issue. Although this PR is not the root cause, it exposes an existing issue on the code tablePath.getFileSystem(hadoopConf).rename(wrongPath, rightPath)

Hive metastore is not case preserving and keep partition columns with lower case names.

If SparkSQL create a table with upper-case partion name use HiveExternalCatalog, when we rename partition, it first call the HiveClient to renamePartition, which will create a new lower case partition path, then SparkSql rename the lower case path to the upper-case.

while if the renamed partition contains more than one depth partition ,e.g. A=1/B=2, hive renamePartition change to a=1/b=2, then SparkSql rename it to A=1/B=2, but the a=1 still exists in the filesystem, we should also delete it.

How was this patch tested?

N/A

SparkQA · 2017-01-28T21:16:36Z

Test build #72110 has finished for PR 16728 at commit a398036.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2017-01-28T21:32:59Z

Mergint to master.

… with upper-case by HiveExternalCatalog ### What changes were proposed in this pull request? This PR is to revert the changes made in apache#16700. It could cause the data loss after partition rename, because we have a bug in the file renaming. Not all the OSs have the same behaviors. For example, on mac OS, if we renaming a path from `.../tbl/a=5/b=6` to `.../tbl/A=5/B=6`. The result is `.../tbl/a=5/B=6`. The expected result is `.../tbl/A=5/B=6`. Thus, renaming on mac OS is not recursive. However, the systems used in Jenkin does not have such an issue. Although this PR is not the root cause, it exposes an existing issue on the code `tablePath.getFileSystem(hadoopConf).rename(wrongPath, rightPath)` --- Hive metastore is not case preserving and keep partition columns with lower case names. If SparkSQL create a table with upper-case partion name use HiveExternalCatalog, when we rename partition, it first call the HiveClient to renamePartition, which will create a new lower case partition path, then SparkSql rename the lower case path to the upper-case. while if the renamed partition contains more than one depth partition ,e.g. A=1/B=2, hive renamePartition change to a=1/b=2, then SparkSql rename it to A=1/B=2, but the a=1 still exists in the filesystem, we should also delete it. ### How was this patch tested? N/A Author: gatorsmile <[email protected]> Closes apache#16728 from gatorsmile/revert-pr-16700.

revert

a398036

asfgit closed this in cfcfc92 Jan 28, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-19359][SQL] Revert Clear useless path after rename a partition with upper-case by HiveExternalCatalog #16728

[SPARK-19359][SQL] Revert Clear useless path after rename a partition with upper-case by HiveExternalCatalog #16728

gatorsmile commented Jan 28, 2017 •

edited

Loading

SparkQA commented Jan 28, 2017

gatorsmile commented Jan 28, 2017

[SPARK-19359][SQL] Revert Clear useless path after rename a partition with upper-case by HiveExternalCatalog #16728

[SPARK-19359][SQL] Revert Clear useless path after rename a partition with upper-case by HiveExternalCatalog #16728

Conversation

gatorsmile commented Jan 28, 2017 • edited Loading

What changes were proposed in this pull request?

How was this patch tested?

SparkQA commented Jan 28, 2017

gatorsmile commented Jan 28, 2017

gatorsmile commented Jan 28, 2017 •

edited

Loading