-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-3891][SQL] Add array support to percentile, percentile_approx and constant inspectors support #2802
Closed
Closed
[SPARK-3891][SQL] Add array support to percentile, percentile_approx and constant inspectors support #2802
Changes from all commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
7f94aff
Added foldable support to CreateArray
gvramana cb7c61e
Supported ConstantInspector for UDAF
gvramana 47f6365
fixed test
gvramana f37fd69
Fixed review comments
gvramana 4d39105
Unified inspector creation, style check fixes
gvramana c46db0f
Removed TestHive reset
gvramana a18f917
avoid constant udf expression re-evaluation
gvramana a0182e5
fixed review comment
gvramana File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably unnecessary, as constant folding rule in
Optimizer
will replace thefoldable
expression withLiteral
. Please correct me if there is exceptional case. (and above propertyconstantreturnValue
)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Constant check and returning value is required for two reasons:
But if the same are called with "function.evaluate" they will not return the same constant value type. There will be mismatch in the datatype expected by constantReturnInspector datatype vs datatype returned by function.evaluate.(Ex: org.apache.hadoop.io.Text vs String). This fails unwrap.
So if return Inspector is constant we don't need to call "function.evaluate" as the expression is already evaluated and return value is already present in constant iterator.
This I have uncovered, when I made CreateArrayExression as foldable, then test fails. So modified as part of current defect fix only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation, I guess there is probably a bug in master as you described, can you paste a query to reproduce the failure? (Text V.S. String).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In HiveQuerySuite, "constant array" testcase was failing
SELECT sort_array(
sort_array(
array("hadoop distributed file system",
"enterprise databases", "hadoop map-reduce")))
FROM src LIMIT 1
[info] - constant array *** FAILED *** (596 milliseconds)
[info] Failed to execute query using catalyst:
[info] Error: java.lang.String cannot be cast to org.apache.hadoop.io.Text
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chenghao-intel, I think you understand this code better than I do. Are you satisfied with the explanation? Does this approach seem reasonable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this failure is due to the bug of nested constant Expression <-> ObjectInspector in
HiveInspectors
, and I will fix that in #3429.@gvramana , how about revert the changes in
CreateArray
andHiveGenericUDF
? I think we can merge the others first. And you can create a new PR forCreateArray.foldable
which depends on #3429, since it currently doesn't break anything.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chenghao-intel This PR cannot be separately merged without CreateArray, as percentile_approx accepts only constant array iterator and fails otherwise. I think we can go ahead and merge all these changes as they don't break build or tests, and are not directly dependent on #3429 in order of merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the #3429, I think this PR can be more simpler after #3429 merged. Besides, invoking the
foldable
wineval
probably too heavy, which is supposed to be eliminated inOptimizer
.