-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-21351][SQL] Update nullability based on children's output #18576
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -153,7 +153,9 @@ abstract class Optimizer(sessionCatalog: SessionCatalog) | |
RewritePredicateSubquery, | ||
ColumnPruning, | ||
CollapseProject, | ||
RemoveRedundantProject) | ||
RemoveRedundantProject) :+ | ||
Batch("UpdateAttributeReferences", Once, | ||
UpdateNullabilityInAttributeReferences) | ||
} | ||
|
||
/** | ||
|
@@ -1309,3 +1311,18 @@ object RemoveRepetitionFromGroupExpressions extends Rule[LogicalPlan] { | |
} | ||
} | ||
} | ||
|
||
/** | ||
* Updates nullability in [[AttributeReference]]s if nullability is different between | ||
* non-leaf plan's expressions and the children output. | ||
*/ | ||
object UpdateNullabilityInAttributeReferences extends Rule[LogicalPlan] { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this rule itself is useful without the filter optimization. In general, I don't think it's useful to optimize filter using the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok, I'll make this pr simpler (I'll drop some changes for filters). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I dropped the changes of I basically agree that the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we think more about how these There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok, I'll drop it in this pr. just a sec. |
||
def apply(plan: LogicalPlan): LogicalPlan = plan transformUp { | ||
case p if !p.isInstanceOf[LeafNode] => | ||
val nullabilityMap = AttributeMap(p.children.flatMap(_.output).map { x => x -> x.nullable }) | ||
p transformExpressions { | ||
case ar: AttributeReference if nullabilityMap.contains(ar) => | ||
ar.withNullability(nullabilityMap(ar)) | ||
} | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,57 @@ | ||
/* | ||
* Licensed to the Apache Software Foundation (ASF) under one or more | ||
* contributor license agreements. See the NOTICE file distributed with | ||
* this work for additional information regarding copyright ownership. | ||
* The ASF licenses this file to You under the Apache License, Version 2.0 | ||
* (the "License"); you may not use this file except in compliance with | ||
* the License. You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
|
||
package org.apache.spark.sql.catalyst.optimizer | ||
|
||
import org.apache.spark.sql.catalyst.dsl.expressions._ | ||
import org.apache.spark.sql.catalyst.dsl.plans._ | ||
import org.apache.spark.sql.catalyst.expressions.{CreateArray, GetArrayItem} | ||
import org.apache.spark.sql.catalyst.plans.PlanTest | ||
import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, LogicalPlan} | ||
import org.apache.spark.sql.catalyst.rules.RuleExecutor | ||
|
||
|
||
class UpdateNullabilityInAttributeReferencesSuite extends PlanTest { | ||
|
||
object Optimizer extends RuleExecutor[LogicalPlan] { | ||
val batches = | ||
Batch("Constant Folding", FixedPoint(10), | ||
NullPropagation, | ||
ConstantFolding, | ||
BooleanSimplification, | ||
SimplifyConditionals, | ||
SimplifyBinaryComparison, | ||
SimplifyExtractValueOps) :: | ||
Batch("UpdateAttributeReferences", Once, | ||
UpdateNullabilityInAttributeReferences) :: Nil | ||
} | ||
|
||
test("update nullability in AttributeReference") { | ||
val rel = LocalRelation('a.long.notNull) | ||
// In the 'original' plans below, the Aggregate node produced by groupBy() has a | ||
// nullable AttributeReference to `b`, because both array indexing and map lookup are | ||
// nullable expressions. After optimization, the same attribute is now non-nullable, | ||
// but the AttributeReference is not updated to reflect this. So, we need to update nullability | ||
// by the `UpdateNullabilityInAttributeReferences` rule. | ||
val original = rel | ||
.select(GetArrayItem(CreateArray(Seq('a, 'a + 1L)), 0) as "b") | ||
.groupBy($"b")("1") | ||
val expected = rel.select('a as "b").groupBy($"b")("1").analyze | ||
val optimized = Optimizer.execute(original.analyze) | ||
comparePlans(optimized, expected) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that this rule is very similar to
FixNullability
. Do we have to keep both of them? cc @maropu @gatorsmileThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UpdateNullabilityInAttributeReferences
is an optimizer rule that was added after we introduced the analyzer ruleFixNullability
. Do we have any end-to-end test case for showing the benefit of this optimizer rule?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, I have the same opinion with @gatorsmile;
FixNullability
is related to the correctness, butUpdateNullabilityInAttributeReferences
is not (just for optimization).@gatorsmile I think we haven't checked actual performance benefits (e.g., wall time) from the rule now. I just assume that this rule could generate better code from the code generator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But these 2 rules are almost same, except that
FixNullability
skips resolved plan.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I see... But, I have no idea about the way to resolve the two issue (the correctness issue and the optimization issue) simultaneously in a single place. I think there is one greedy & temporary solution that merges the current two rule into one, and then put it in the two places: the analyzer and the optimizer.