[SPARK-19359][SQL] Revert Clear useless path after rename a partition with upper-case by HiveExternalCatalog #16728
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This PR is to revert the changes made in #16700. It could cause the data loss after partition rename, because we have a bug in the file renaming.
Not all the OSs have the same behaviors. For example, on mac OS, if we renaming a path from
.../tbl/a=5/b=6
to.../tbl/A=5/B=6
. The result is.../tbl/a=5/B=6
. The expected result is.../tbl/A=5/B=6
. Thus, renaming on mac OS is not recursive. However, the systems used in Jenkin does not have such an issue. Although this PR is not the root cause, it exposes an existing issue on the codetablePath.getFileSystem(hadoopConf).rename(wrongPath, rightPath)
Hive metastore is not case preserving and keep partition columns with lower case names.
If SparkSQL create a table with upper-case partion name use HiveExternalCatalog, when we rename partition, it first call the HiveClient to renamePartition, which will create a new lower case partition path, then SparkSql rename the lower case path to the upper-case.
while if the renamed partition contains more than one depth partition ,e.g. A=1/B=2, hive renamePartition change to a=1/b=2, then SparkSql rename it to A=1/B=2, but the a=1 still exists in the filesystem, we should also delete it.
How was this patch tested?
N/A