-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-13996][SQL] Add more not null attributes for Filter codegen #11810
Conversation
Test build #53498 has finished for PR 11810 at commit
|
Test build #53511 has finished for PR 11810 at commit
|
Test build #53523 has finished for PR 11810 at commit
|
… attributes from conditions.
case _ => false | ||
} | ||
|
||
// The columns that will filtered out by `IsNotNull` could be considered as not nullable. | ||
private val notNullAttributes = notNullPreds.flatMap(_.references) | ||
private val notNullAttributes = notNullPreds.flatMap(_.references).distinct.map(_.exprId) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The attributes from conditions can have different qualifiers than child.output, which makes later indexOf
call failed. So here use exprId instead.
Test build #53592 has finished for PR 11810 at commit
|
cc @davies |
ping @davies Please take a look at this to see if it is good for you. Thanks. |
@@ -79,16 +79,16 @@ case class Filter(condition: Expression, child: SparkPlan) | |||
|
|||
// Split out all the IsNotNulls from condition. | |||
private val (notNullPreds, otherPreds) = splitConjunctivePredicates(condition).partition { | |||
case IsNotNull(a) if child.output.contains(a) => true | |||
case IsNotNull(a) if a.references.subsetOf(child.outputSet) => true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not correct, for example, IsNotNull(IsNull(a))
does not means a
can not be null
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, you are right. With the NullIntolerant
trait introduced in #11809, we can solve this easily. Wait for that PR to be ready then we can re-visit this. Thanks!
…-attrs Conflicts: sql/core/src/main/scala/org/apache/spark/sql/execution/basicOperators.scala
Test build #54767 has finished for PR 11810 at commit
|
retest this please. |
Test build #54771 has finished for PR 11810 at commit
|
retest this please. |
Looks like |
Test build #54774 has finished for PR 11810 at commit
|
retest this please. |
Test build #54786 has finished for PR 11810 at commit
|
LGTM |
Merging this into master, thanks! |
Thanks for reviewing this. |
What changes were proposed in this pull request?
JIRA: https://issues.apache.org/jira/browse/SPARK-13996
Filter codegen finds the attributes not null by checking IsNotNull(a) expression with a condition if child.output.contains(a). However, the current approach to checking it is not comprehensive. We can improve it.
E.g., for this plan:
The code snippet generated without this patch:
With this patch:
How was this patch tested?
Existing tests.