-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-23913][SQL] Add array_intersect function #21102
Conversation
Test build #89517 has finished for PR 21102 at commit
|
Test build #89527 has finished for PR 21102 at commit
|
Test build #89538 has finished for PR 21102 at commit
|
cc @ueshin |
Test build #89671 has finished for PR 21102 at commit
|
Test build #92323 has finished for PR 21102 at commit
|
Test build #92813 has finished for PR 21102 at commit
|
Test build #92814 has finished for PR 21102 at commit
|
Test build #92818 has finished for PR 21102 at commit
|
Test build #92833 has finished for PR 21102 at commit
|
Test build #92943 has finished for PR 21102 at commit
|
Test build #92947 has finished for PR 21102 at commit
|
Test build #92970 has finished for PR 21102 at commit
|
Test build #92998 has finished for PR 21102 at commit
|
retest this please |
Test build #93002 has finished for PR 21102 at commit
|
cc @ueshin |
assert(!set.contains(1132)) | ||
assert(!set.contains(10000)) | ||
|
||
set.remove(999) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we remove 999
before 1132
? Can we still find 1132
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, I addressed this.
Test build #93097 has finished for PR 21102 at commit
|
Test build #93129 has finished for PR 21102 at commit
|
retest this please |
Test build #93146 has finished for PR 21102 at commit
|
retest this please |
Test build #93155 has finished for PR 21102 at commit
|
retest this please |
Test build #93157 has finished for PR 21102 at commit
|
Test build #94213 has finished for PR 21102 at commit
|
retest this please |
Test build #94221 has finished for PR 21102 at commit
|
retest this please |
Test build #94231 has finished for PR 21102 at commit
|
checkAnswer(df5.selectExpr("array_intersect(a, b)"), ans5) | ||
|
||
val df6 = Seq((null, null)).toDF("a", "b") | ||
intercept[AnalysisException] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also check the error message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except for comments and #21102 (comment) and #21102 (comment).
@transient lazy val evalIntersect: (ArrayData, ArrayData) => ArrayData = { | ||
if (elementTypeSupportEquals) { | ||
(array1, array2) => | ||
val hs = new OpenHashSet[Any] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about shortcutting to return an empty array when we find one of the two is empty?
| $writeArray2ToHashSet | ||
|} | ||
|$arrayBuilderClass $builder = | ||
| ($arrayBuilderClass)$arrayBuilder.make($arrayBuilderClassTag); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: new $arrayBuilderClass()
should work?
val classTag = s"scala.reflect.ClassTag$$.MODULE$$.$hsTypeName()" | ||
val hashSet = ctx.freshName("hashSet") | ||
val hashSetResult = ctx.freshName("hashSetResult") | ||
val arrayBuilder = "scala.collection.mutable.ArrayBuilder" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: classOf[mutable.ArrayBuilder[_]].getName
?
} | ||
new GenericArrayData(arrayBuffer) | ||
} else { | ||
new GenericArrayData(Seq.empty) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Array.empty
or Array.emptyObjectArray
?
Test build #94266 has finished for PR 21102 at commit
|
Test build #94264 has finished for PR 21102 at commit
|
Jenkins, retest this please. |
Test build #94267 has finished for PR 21102 at commit
|
Thanks! merging to master. |
} | ||
|
||
override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { | ||
val arrayData = classOf[ArrayData].getName |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: ArrayData
is imported by default in codegen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I will address at #21937
What changes were proposed in this pull request?
The PR adds the SQL function
array_intersect
. The behavior of the function is based on Presto's one.This function returns returns an array of the elements in the intersection of array1 and array2.
Note: The order of elements in the result is not defined.
How was this patch tested?
Added UTs