Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-34794][SQL] Fix lambda variable name issues in nested DataFram…
…e functions ### What changes were proposed in this pull request? To fix lambda variable name issues in nested DataFrame functions, this PR modifies code to use a global counter for `LambdaVariables` names created by higher order functions. This is the rework of apache#31887. Closes apache#31887. ### Why are the changes needed? This moves away from the current hard-coded variable names which break on nested function calls. There is currently a bug where nested transforms in particular fail (the inner variable shadows the outer variable) For this query: ``` val df = Seq( (Seq(1,2,3), Seq("a", "b", "c")) ).toDF("numbers", "letters") df.select( f.flatten( f.transform( $"numbers", (number: Column) => { f.transform( $"letters", (letter: Column) => { f.struct( number.as("number"), letter.as("letter") ) } ) } ) ).as("zipped") ).show(10, false) ``` This is the current (incorrect) output: ``` +------------------------------------------------------------------------+ |zipped | +------------------------------------------------------------------------+ |[{a, a}, {b, b}, {c, c}, {a, a}, {b, b}, {c, c}, {a, a}, {b, b}, {c, c}]| +------------------------------------------------------------------------+ ``` And this is the correct output after fix: ``` +------------------------------------------------------------------------+ |zipped | +------------------------------------------------------------------------+ |[{1, a}, {1, b}, {1, c}, {2, a}, {2, b}, {2, c}, {3, a}, {3, b}, {3, c}]| +------------------------------------------------------------------------+ ``` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added the new test in `DataFrameFunctionsSuite`. Closes apache#32424 from maropu/pr31887. Lead-authored-by: dsolow <[email protected]> Co-authored-by: Takeshi Yamamuro <[email protected]> Co-authored-by: dmsolow <[email protected]> Signed-off-by: Takeshi Yamamuro <[email protected]> (cherry picked from commit f550e03) Signed-off-by: Takeshi Yamamuro <[email protected]> (cherry picked from commit 6df4ec0) Signed-off-by: Dongjoon Hyun <[email protected]>
- Loading branch information