Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-5649][SQL] added a rule to check datatypes cast #4425

Closed
wants to merge 6 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -69,13 +69,28 @@ class Analyzer(catalog: Catalog,
typeCoercionRules ++
extendedRules : _*),
Batch("Check Analysis", Once,
CheckCast ::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to check casts after we check resolution. Otherwise casts of invalid attributes will still give confusing TreeNodeExceptions.

scala> sql("SELECT CAST(x AS STRING) FROM src").collect()
org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to dataType on unresolved object, tree: 'x
    at org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.dataType(unresolved.scala:48)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we check resolution first,
1 check cast will be invalid since legitimate cast expression must be not resolved.
2 select cast(key as binary) from src will give a scala.NotImplementedError

how about not change the order but catch exception of child.dataType to make it pass this rule? then your case will go into check resolution

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scala.NotImplementedError should be a bug, filed PR #4552 for this

CheckResolution ::
CheckAggregation ::
Nil: _*),
Batch("AnalysisOperators", fixedPoint,
EliminateAnalysisOperators)
)

/**
* Makes sure datatype cast is legitimate, if not throw an exception
*/
object CheckCast extends Rule[LogicalPlan] {
def apply(plan: LogicalPlan): LogicalPlan = plan.transform {
case q: LogicalPlan =>
q transformExpressions {
case cast @ Cast(child, dataType) if !cast.resolve(child.dataType, dataType) =>
throw new AnalysisException(s"can not cast from ${child.dataType} to $dataType!")
case p => p
}
}
}

/**
* Makes sure all attributes and logical plans have been resolved.
*/
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ case class Cast(child: Expression, dataType: DataType) extends UnaryExpression w

private[this] def resolvableNullability(from: Boolean, to: Boolean) = !from || to

private[this] def resolve(from: DataType, to: DataType): Boolean = {
private[sql] def resolve(from: DataType, to: DataType): Boolean = {
(from, to) match {
case (from, to) if from == to => true

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ import scala.util.Try
import org.apache.hadoop.hive.conf.HiveConf.ConfVars

import org.apache.spark.{SparkFiles, SparkException}
import org.apache.spark.sql.{DataFrame, Row}
import org.apache.spark.sql.{AnalysisException, DataFrame, Row}
import org.apache.spark.sql.catalyst.plans.logical.Project
import org.apache.spark.sql.Dsl._
import org.apache.spark.sql.hive._
Expand Down Expand Up @@ -66,6 +66,12 @@ class HiveQuerySuite extends HiveComparisonTest with BeforeAndAfter {
}
}

test("SPARK-5649: added a rule to check datatypes cast") {
intercept[AnalysisException] {
sql("select cast(key as binary) from src").collect()
}
}

createQueryTest("! operator",
"""
|SELECT a FROM (
Expand Down