-
Notifications
You must be signed in to change notification settings - Fork 28.5k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-48584][SQL] Perf improvement for unescapePathName
### What changes were proposed in this pull request? This PR improves perf for unescapePathName with algorithms briefly described as: - If a path contains no '%' or contains '%' at `position > path.length-2`, we return the original identity instead of creating a new StringBuilder to append char by char - Otherwise, we loop with 2 indices, `plaintextStartIdx` which starts from 0 and then points to the next char after resolving `%xx`, and `plaintextEndIdx` which points to the next `'%'`. `plaintextStartIdx` moves to `plaintextEndIdx + 3` if `%xx` is valid, or moves to `plaintextEndIdx + 1` if `%xx` is invalid. - Instead of using Integer.parseInt with error capture, we identify the high and low characters manually. ### Why are the changes needed? performance improvement for hotspots ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? - new tests in ExternalCatalogUtilsSuite - Benchmark results (9-11x faster) ### Was this patch authored or co-authored using generative AI tooling? no Closes #46938 from yaooqinn/SPARK-48584. Authored-by: Kent Yao <[email protected]> Signed-off-by: Kent Yao <[email protected]>
- Loading branch information
Showing
5 changed files
with
135 additions
and
27 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters