-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-12957][SQL] Initial support for constraint propagation in SparkSQL #10844
Conversation
Test build #49770 has finished for PR 10844 at commit
|
f7251dd
to
04ff99a
Compare
Test build #49775 has finished for PR 10844 at commit
|
Could you add the id of the JIRA ticket to the titlle? Could you also add a description, explaining why we want this? Seems cool though! |
@hvanhovell added, thanks! |
@sameeragarwal I just saw it. Thank you! I will do the code changes after this is merged. |
Test build #50083 has finished for PR 10844 at commit
|
Test build #50161 has finished for PR 10844 at commit
|
self: PlanType => | ||
|
||
def output: Seq[Attribute] | ||
|
||
/** | ||
* Extracts the output property from a given child. | ||
*/ | ||
def extractConstraintsFromChild(child: QueryPlan[PlanType]): Set[Expression] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
protected
?
Also I'm not sure I get the scala doc. Maybe getReleventContraints
is a better name? It is taking the constraints and removing those that don't apply anymore because we removed columns right?
Test build #50371 has finished for PR 10844 at commit
|
Thanks @marmbrus, all comments addressed! |
Test build #50404 has finished for PR 10844 at commit
|
test this please |
* operator. For example, if the output of this operator is column `a`, an example `constraints` | ||
* can be `Set(a > 10, a < 20)`. | ||
*/ | ||
lazy val constraints: Set[Expression] = validConstraints |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would consider making this getRelevantConstraints(validConstraints)
so that each implementor of validContraints
does't have to remember to do the filter / canonicalization. They can just focus on augmenting or passing through constraints from children based on the operators semantics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great idea, added.
Test build #50417 has finished for PR 10844 at commit
|
Thanks @marmbrus, all comments addressed! |
Test build #50429 has finished for PR 10844 at commit
|
} | ||
} | ||
|
||
def selfJoinResolved: Boolean = left.outputSet.intersect(right.outputSet).isEmpty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this was a merging mistake as its duplicated with the method below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, fixed!
@marmbrus comments addressed! |
test this please |
Test build #50518 has finished for PR 10844 at commit
|
@@ -301,10 +301,12 @@ abstract class LeafNode extends LogicalPlan { | |||
/** | |||
* A logical plan node with single child. | |||
*/ | |||
abstract class UnaryNode extends LogicalPlan { | |||
abstract class UnaryNode extends LogicalPlan with PredicateHelper { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we do not need PredicateHelper
at here? Maybe it is better to just with PredicateHelper
for Filter
and Join
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, added this only for filter and join that needed splitConjunctivePredicates
c1a0128
to
2bd2735
Compare
comments addressed! |
Test build #50603 has finished for PR 10844 at commit
|
Thanks! Merging to master. |
Based on the semantics of a query, we can derive a number of data constraints on output of each (logical or physical) operator. For instance, if a filter defines
‘a > 10
, we know that the output data of this filter satisfies 2 constraints:‘a > 10
isNotNull(‘a)
This PR proposes a possible way of keeping track of these constraints and propagating them in the logical plan, which can then help us build more advanced optimizations (such as pruning redundant filters, optimizing joins, among others). We define constraints as a set of (implicitly conjunctive) expressions. For e.g., if a filter operator has constraints =
Set(‘a > 10, ‘b < 100)
, it’s implied that the outputs satisfy both individual constraints (i.e.,‘a > 10
AND‘b < 100
).Design Document: https://docs.google.com/a/databricks.com/document/d/1WQRgDurUBV9Y6CWOBS75PQIqJwT-6WftVa18xzm7nCo/edit?usp=sharing