[SPARK-23911][SQL] Add aggregate function. #21982

ueshin · 2018-08-03T08:16:56Z

What changes were proposed in this pull request?

This pr adds aggregate function which applies a binary operator to an initial state and all elements in the array, and reduces this to a single state. The final state is converted into the final result by applying a finish function.

> SELECT aggregate(array(1, 2, 3), (acc, x) -> acc + x);
 6
> SELECT aggregate(array(1, 2, 3), (acc, x) -> acc + x, acc -> acc * 10);
 60

How was this patch tested?

Added tests.

ueshin · 2018-08-03T08:17:30Z

cc @hvanhovell @gatorsmile @cloud-fan

mn-mikke · 2018-08-03T08:33:51Z

Isn't this PR related to the Jira ticket SPARK-23911?

ueshin · 2018-08-03T08:36:41Z

Oops, yes, I wrote a wrong jira-id. Fixed. Thanks!

SparkQA · 2018-08-03T10:36:40Z

Test build #94118 has finished for PR 21982 at commit 26bf379.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class ArrayAggregate(

ueshin · 2018-08-03T11:27:51Z

Jenkins, retest this please.

SparkQA · 2018-08-03T14:54:19Z

Test build #94130 has finished for PR 21982 at commit 26bf379.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class ArrayAggregate(

hvanhovell

LGTM

hvanhovell · 2018-08-03T19:52:44Z

retest this please

SparkQA · 2018-08-03T23:27:24Z

Test build #94169 has finished for PR 21982 at commit 26bf379.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class ArrayAggregate(

gatorsmile · 2018-08-04T07:09:17Z

retest this please

gatorsmile · 2018-08-04T07:09:44Z

@ueshin You need to address the conflicts again. :)

SparkQA · 2018-08-04T11:03:51Z

Test build #94205 has finished for PR 21982 at commit 26bf379.

This patch fails Spark unit tests.
This patch does not merge cleanly.
This patch adds the following public classes (experimental):
case class ArrayAggregate(

SparkQA · 2018-08-04T11:22:56Z

Test build #94207 has finished for PR 21982 at commit 4290f55.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class ArrayFilter(

ueshin · 2018-08-04T23:57:55Z

Thanks! merging to master.

HyukjinKwon · 2018-08-08T08:09:07Z

...catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala

+    """,
+  examples = """
+    Examples:
+      > SELECT _FUNC_(array(1, 2, 3), (acc, x) -> acc + x);


@ueshin, would you mind if I ask to kindly double check if the example works? seems not in my local:

spark-sql> SELECT aggregate(array(1, 2, 3), (acc, x) -> acc + x); 2018-08-08 16:08:25 ERROR SparkSQLDriver:91 - Failed in [SELECT aggregate(array(1, 2, 3), (acc, x) -> acc + x)] org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to dataType on unresolved object, tree: 'acc at org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.dataType(unresolved.scala:105) at org.apache.spark.sql.catalyst.expressions.BinaryArithmetic.dataType(arithmetic.scala:119) at org.apache.spark.sql.catalyst.expressions.LambdaFunction.dataType(higherOrderFunctions.scala:72) at org.apache.spark.sql.hive.HiveSessionCatalog$$anonfun$1.apply(HiveSessionCatalog.scala:122) at org.apache.spark.sql.hive.HiveSessionCatalog$$anonfun$1.apply(HiveSessionCatalog.scala:121) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.AbstractTraversable.map(Traversable.scala:104)

Oops, sorry, we need the second argument as an initial value.

SELECT aggregate(array(1, 2, 3), 0, (acc, x) -> acc + x);

I'll submit a follow-up pr soon.

Submitted #22035. Thanks!

## What changes were proposed in this pull request? This pr is a follow-up pr of #21982 and fixes the examples. ## How was this patch tested? Existing tests. Closes #22035 from ueshin/issues/SPARK-23911/fup1. Authored-by: Takuya UESHIN <[email protected]> Signed-off-by: Takuya UESHIN <[email protected]>

Add ArrayAggregate.

26bf379

ueshin changed the title ~~[SPARK-23909][SQL] Add aggregate function.~~ [SPARK-23911][SQL] Add aggregate function. Aug 3, 2018

hvanhovell approved these changes Aug 3, 2018

View reviewed changes

Merge branch 'master' into issues/SPARK-23911/aggregate

4290f55

asfgit closed this in 327bb30 Aug 5, 2018

ueshin mentioned this pull request Aug 7, 2018

[SPARK-23908][SQL] Add transform function. #21954

Closed

HyukjinKwon reviewed Aug 8, 2018

View reviewed changes

ueshin mentioned this pull request Aug 8, 2018

[SPARK-23911][SQL][FOLLOW-UP] Fix examples of aggregate function. #22035

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-23911][SQL] Add aggregate function. #21982

[SPARK-23911][SQL] Add aggregate function. #21982

ueshin commented Aug 3, 2018

ueshin commented Aug 3, 2018

mn-mikke commented Aug 3, 2018

ueshin commented Aug 3, 2018

SparkQA commented Aug 3, 2018

ueshin commented Aug 3, 2018

SparkQA commented Aug 3, 2018

hvanhovell left a comment

hvanhovell commented Aug 3, 2018

SparkQA commented Aug 3, 2018

gatorsmile commented Aug 4, 2018

gatorsmile commented Aug 4, 2018

SparkQA commented Aug 4, 2018

SparkQA commented Aug 4, 2018

ueshin commented Aug 4, 2018

HyukjinKwon Aug 8, 2018

ueshin Aug 8, 2018

ueshin Aug 8, 2018

[SPARK-23911][SQL] Add aggregate function. #21982

[SPARK-23911][SQL] Add aggregate function. #21982

Conversation

ueshin commented Aug 3, 2018

What changes were proposed in this pull request?

How was this patch tested?

ueshin commented Aug 3, 2018

mn-mikke commented Aug 3, 2018

ueshin commented Aug 3, 2018

SparkQA commented Aug 3, 2018

ueshin commented Aug 3, 2018

SparkQA commented Aug 3, 2018

hvanhovell left a comment

Choose a reason for hiding this comment

hvanhovell commented Aug 3, 2018

SparkQA commented Aug 3, 2018

gatorsmile commented Aug 4, 2018

gatorsmile commented Aug 4, 2018

SparkQA commented Aug 4, 2018

SparkQA commented Aug 4, 2018

ueshin commented Aug 4, 2018

HyukjinKwon Aug 8, 2018

Choose a reason for hiding this comment

ueshin Aug 8, 2018

Choose a reason for hiding this comment

ueshin Aug 8, 2018

Choose a reason for hiding this comment