-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-23911][SQL] Add aggregate function. #21982
Conversation
Isn't this PR related to the Jira ticket SPARK-23911? |
Oops, yes, I wrote a wrong jira-id. Fixed. Thanks! |
Test build #94118 has finished for PR 21982 at commit
|
Jenkins, retest this please. |
Test build #94130 has finished for PR 21982 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
retest this please |
Test build #94169 has finished for PR 21982 at commit
|
retest this please |
@ueshin You need to address the conflicts again. :) |
Test build #94205 has finished for PR 21982 at commit
|
Test build #94207 has finished for PR 21982 at commit
|
Thanks! merging to master. |
""", | ||
examples = """ | ||
Examples: | ||
> SELECT _FUNC_(array(1, 2, 3), (acc, x) -> acc + x); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ueshin, would you mind if I ask to kindly double check if the example works? seems not in my local:
spark-sql> SELECT aggregate(array(1, 2, 3), (acc, x) -> acc + x);
2018-08-08 16:08:25 ERROR SparkSQLDriver:91 - Failed in [SELECT aggregate(array(1, 2, 3), (acc, x) -> acc + x)]
org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to dataType on unresolved object, tree: 'acc
at org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.dataType(unresolved.scala:105)
at org.apache.spark.sql.catalyst.expressions.BinaryArithmetic.dataType(arithmetic.scala:119)
at org.apache.spark.sql.catalyst.expressions.LambdaFunction.dataType(higherOrderFunctions.scala:72)
at org.apache.spark.sql.hive.HiveSessionCatalog$$anonfun$1.apply(HiveSessionCatalog.scala:122)
at org.apache.spark.sql.hive.HiveSessionCatalog$$anonfun$1.apply(HiveSessionCatalog.scala:121)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, sorry, we need the second argument as an initial value.
SELECT aggregate(array(1, 2, 3), 0, (acc, x) -> acc + x);
I'll submit a follow-up pr soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Submitted #22035. Thanks!
## What changes were proposed in this pull request? This pr is a follow-up pr of #21982 and fixes the examples. ## How was this patch tested? Existing tests. Closes #22035 from ueshin/issues/SPARK-23911/fup1. Authored-by: Takuya UESHIN <[email protected]> Signed-off-by: Takuya UESHIN <[email protected]>
What changes were proposed in this pull request?
This pr adds
aggregate
function which applies a binary operator to an initial state and all elements in the array, and reduces this to a single state. The final state is converted into the final result by applying a finish function.How was this patch tested?
Added tests.