Skip to content

Commit

Permalink
[SPARK-38030][SQL] Canonicalization should not remove nullability of …
Browse files Browse the repository at this point in the history
…AttributeReference dataType

### What changes were proposed in this pull request?
Canonicalization of AttributeReference should not remove nullability information of its dataType.

### Why are the changes needed?
SPARK-38030 lists an issue where canonicalization of cast resulted in an unresolved expression, thus causing query failure. The issue was that the child AttributeReference's dataType was converted to nullable during canonicalization and hence the Cast's `checkInputDataTypes` fails. Although the exact repro listed in SPARK-38030 no longer works in master due to an unrelated change (details in the JIRA), some other codepaths which depend on canonicalized representations can trigger the same issue.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Added unit test to ensure that canonicalization preserves nullability of AttributeReference and does not result in an unresolved cast

Closes #35332 from shardulm94/SPARK-38030.

Authored-by: Shardul Mahadik <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
  • Loading branch information
shardulm94 authored and cloud-fan committed Feb 8, 2022
1 parent d4f275b commit 2e703ae
Show file tree
Hide file tree
Showing 2 changed files with 15 additions and 2 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -294,7 +294,7 @@ case class AttributeReference(
}

override lazy val preCanonicalized: Expression = {
AttributeReference("none", dataType.asNullable)(exprId)
AttributeReference("none", dataType)(exprId)
}

override def newInstance(): AttributeReference =
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ import org.apache.spark.SparkFunSuite
import org.apache.spark.sql.catalyst.dsl.expressions._
import org.apache.spark.sql.catalyst.dsl.plans._
import org.apache.spark.sql.catalyst.plans.logical.Range
import org.apache.spark.sql.types.{IntegerType, LongType, StructField, StructType}
import org.apache.spark.sql.types.{IntegerType, LongType, StringType, StructField, StructType}

class CanonicalizeSuite extends SparkFunSuite {

Expand Down Expand Up @@ -177,4 +177,17 @@ class CanonicalizeSuite extends SparkFunSuite {
assert(expr.semanticEquals(attr))
assert(attr.semanticEquals(expr))
}

test("SPARK-38030: Canonicalization should not remove nullability of AttributeReference" +
" dataType") {
val structType = StructType(Seq(StructField("name", StringType, nullable = false)))
val attr = AttributeReference("col", structType)()
// AttributeReference dataType should not be converted to nullable
assert(attr.canonicalized.dataType === structType)

val cast = Cast(attr, structType)
assert(cast.resolved)
// canonicalization should not converted resolved cast to unresolved
assert(cast.canonicalized.resolved)
}
}

0 comments on commit 2e703ae

Please sign in to comment.