Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-23913][SQL] Add array_intersect function #21102

Closed
wants to merge 21 commits into from

Conversation

kiszk
Copy link
Member

@kiszk kiszk commented Apr 18, 2018

What changes were proposed in this pull request?

The PR adds the SQL function array_intersect. The behavior of the function is based on Presto's one.

This function returns returns an array of the elements in the intersection of array1 and array2.

Note: The order of elements in the result is not defined.

How was this patch tested?

Added UTs

@SparkQA
Copy link

SparkQA commented Apr 18, 2018

Test build #89517 has finished for PR 21102 at commit 548a4b8.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • abstract class ArraySetUtils extends BinaryExpression with ExpectsInputTypes
  • case class ArrayIntersect(left: Expression, right: Expression) extends ArraySetUtils

@SparkQA
Copy link

SparkQA commented Apr 18, 2018

Test build #89527 has finished for PR 21102 at commit 2602f8e.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 19, 2018

Test build #89538 has finished for PR 21102 at commit 038e98c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

cc @ueshin

@SparkQA
Copy link

SparkQA commented Apr 21, 2018

Test build #89671 has finished for PR 21102 at commit cd56b7d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • abstract class ArraySetUtils extends BinaryExpression with ExpectsInputTypes

@SparkQA
Copy link

SparkQA commented Jun 26, 2018

Test build #92323 has finished for PR 21102 at commit cd56b7d.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds the following public classes (experimental):
  • abstract class ArraySetUtils extends BinaryExpression with ExpectsInputTypes

@SparkQA
Copy link

SparkQA commented Jul 10, 2018

Test build #92813 has finished for PR 21102 at commit 59ea8e2.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • abstract class ArraySetLike extends BinaryArrayExpressionWithImplicitCast
  • case class ArrayIntersect(left: Expression, right: Expression) extends ArraySetLike

@SparkQA
Copy link

SparkQA commented Jul 10, 2018

Test build #92814 has finished for PR 21102 at commit 346274d.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 10, 2018

Test build #92818 has finished for PR 21102 at commit cf7a27c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 11, 2018

Test build #92833 has finished for PR 21102 at commit 905c31a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 12, 2018

Test build #92943 has finished for PR 21102 at commit fdc2f6c.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 12, 2018

Test build #92947 has finished for PR 21102 at commit 7d789e2.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 13, 2018

Test build #92970 has finished for PR 21102 at commit fce9eb0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 14, 2018

Test build #92998 has finished for PR 21102 at commit 5492572.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Member Author

kiszk commented Jul 14, 2018

retest this please

@SparkQA
Copy link

SparkQA commented Jul 14, 2018

Test build #93002 has finished for PR 21102 at commit 5492572.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Member Author

kiszk commented Jul 16, 2018

cc @ueshin

assert(!set.contains(1132))
assert(!set.contains(10000))

set.remove(999)
Copy link
Member

@ueshin ueshin Jul 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we remove 999 before 1132? Can we still find 1132?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, I addressed this.

@SparkQA
Copy link

SparkQA commented Jul 16, 2018

Test build #93097 has finished for PR 21102 at commit 05a612b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class ArrayUnion(left: Expression, right: Expression) extends ArraySetLike

@SparkQA
Copy link

SparkQA commented Jul 16, 2018

Test build #93129 has finished for PR 21102 at commit 28e0c45.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Member Author

kiszk commented Jul 17, 2018

retest this please

@SparkQA
Copy link

SparkQA commented Jul 17, 2018

Test build #93146 has finished for PR 21102 at commit 28e0c45.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Member Author

kiszk commented Jul 17, 2018

retest this please

@SparkQA
Copy link

SparkQA commented Jul 17, 2018

Test build #93155 has finished for PR 21102 at commit 28e0c45.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Member Author

kiszk commented Jul 17, 2018

retest this please

@SparkQA
Copy link

SparkQA commented Jul 17, 2018

Test build #93157 has finished for PR 21102 at commit 28e0c45.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 4, 2018

Test build #94213 has finished for PR 21102 at commit 6fba1ee.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class ArrayIntersect(left: Expression, right: Expression) extends ArraySetLike

@kiszk
Copy link
Member Author

kiszk commented Aug 4, 2018

retest this please

@SparkQA
Copy link

SparkQA commented Aug 4, 2018

Test build #94221 has finished for PR 21102 at commit 6fba1ee.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class ArrayIntersect(left: Expression, right: Expression) extends ArraySetLike

@kiszk
Copy link
Member Author

kiszk commented Aug 5, 2018

retest this please

@SparkQA
Copy link

SparkQA commented Aug 5, 2018

Test build #94231 has finished for PR 21102 at commit 6fba1ee.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class ArrayIntersect(left: Expression, right: Expression) extends ArraySetLike

@kiszk
Copy link
Member Author

kiszk commented Aug 5, 2018

cc @ueshin @cloud-fan

checkAnswer(df5.selectExpr("array_intersect(a, b)"), ans5)

val df6 = Seq((null, null)).toDF("a", "b")
intercept[AnalysisException] {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also check the error message?

Copy link
Member

@ueshin ueshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except for comments and #21102 (comment) and #21102 (comment).

@transient lazy val evalIntersect: (ArrayData, ArrayData) => ArrayData = {
if (elementTypeSupportEquals) {
(array1, array2) =>
val hs = new OpenHashSet[Any]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about shortcutting to return an empty array when we find one of the two is empty?

| $writeArray2ToHashSet
|}
|$arrayBuilderClass $builder =
| ($arrayBuilderClass)$arrayBuilder.make($arrayBuilderClassTag);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: new $arrayBuilderClass() should work?

val classTag = s"scala.reflect.ClassTag$$.MODULE$$.$hsTypeName()"
val hashSet = ctx.freshName("hashSet")
val hashSetResult = ctx.freshName("hashSetResult")
val arrayBuilder = "scala.collection.mutable.ArrayBuilder"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: classOf[mutable.ArrayBuilder[_]].getName?

}
new GenericArrayData(arrayBuffer)
} else {
new GenericArrayData(Seq.empty)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Array.empty or Array.emptyObjectArray?

@SparkQA
Copy link

SparkQA commented Aug 6, 2018

Test build #94266 has finished for PR 21102 at commit 33781b6.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 6, 2018

Test build #94264 has finished for PR 21102 at commit ce755e2.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ueshin
Copy link
Member

ueshin commented Aug 6, 2018

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Aug 6, 2018

Test build #94267 has finished for PR 21102 at commit 33781b6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ueshin
Copy link
Member

ueshin commented Aug 6, 2018

Thanks! merging to master.

@asfgit asfgit closed this in 1a5e460 Aug 6, 2018
}

override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
val arrayData = classOf[ArrayData].getName
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ArrayData is imported by default in codegen.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I will address at #21937

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants