Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-7562][SPARK-6444][SQL] Improve error reporting for expression data type mismatch #6405

Closed
wants to merge 17 commits into from

Conversation

cloud-fan
Copy link
Contributor

It seems hard to find a common pattern of checking types in Expression. Sometimes we know what input types we need(like And, we know we need two booleans), sometimes we just have some rules(like Add, we need 2 numeric types which are equal). So I defined a general interface checkInputDataTypes in Expression which returns a TypeCheckResult. TypeCheckResult can tell whether this expression passes the type checking or what the type mismatch is.

This PR mainly works on apply input types checking for arithmetic and predicate expressions.

TODO: apply type checking interface to more expressions.

@cloud-fan cloud-fan changed the title [SPARK-7562][SPARK-6444][SQL] Improve error reporting for expression data type mismatch [WIP][SPARK-7562][SPARK-6444][SQL] Improve error reporting for expression data type mismatch May 26, 2015
@cloud-fan
Copy link
Contributor Author

cc @rxin @marmbrus @liancheng

@SparkQA
Copy link

SparkQA commented May 26, 2015

Test build #33510 has finished for PR 6405 at commit 6141b1b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • trait TypeConstraint
    • trait ExpectsInputTypes extends TypeConstraint
    • trait TypeEqualConstraint extends TypeConstraint
    • abstract class UnaryNumeric extends UnaryExpression with NumericHolder with TypeConstraint
    • case class UnaryMinus(child: Expression) extends UnaryNumeric
    • case class Sqrt(child: Expression) extends UnaryNumeric
    • case class Abs(child: Expression) extends UnaryNumeric
    • abstract class NumericBinaryArithmetic extends BinaryArithmetic
    • case class Add(left: Expression, right: Expression) extends NumericBinaryArithmetic
    • case class Subtract(left: Expression, right: Expression) extends NumericBinaryArithmetic
    • case class Multiply(left: Expression, right: Expression) extends NumericBinaryArithmetic
    • case class Divide(left: Expression, right: Expression) extends NumericBinaryArithmetic
    • case class Remainder(left: Expression, right: Expression) extends NumericBinaryArithmetic
    • abstract class BitwiseBinaryArithmetic extends BinaryArithmetic
    • case class BitwiseAnd(left: Expression, right: Expression) extends BitwiseBinaryArithmetic
    • case class BitwiseOr(left: Expression, right: Expression) extends BitwiseBinaryArithmetic
    • case class BitwiseXor(left: Expression, right: Expression) extends BitwiseBinaryArithmetic
    • case class BitwiseNot(child: Expression) extends UnaryExpression with TypeConstraint
    • abstract class OrderingBinaryArithmetic extends BinaryArithmetic with OrderingHolder
    • case class MaxOf(left: Expression, right: Expression) extends OrderingBinaryArithmetic
    • case class MinOf(left: Expression, right: Expression) extends OrderingBinaryArithmetic
    • abstract class BinaryMathExpression(f: (Double, Double) => Double, name: String)
    • case class And(left: Expression, right: Expression) extends BinaryExpression
    • case class Or(left: Expression, right: Expression) extends BinaryExpression
    • case class LessThan(left: Expression, right: Expression) extends OrderingBinaryComparison
    • case class LessThanOrEqual(left: Expression, right: Expression) extends OrderingBinaryComparison
    • case class GreaterThan(left: Expression, right: Expression) extends OrderingBinaryComparison
    • case class GreaterThanOrEqual(left: Expression, right: Expression)
    • trait CaseWhenLike extends Expression with TypeConstraint
    • trait NumericHolder
    • trait OrderingHolder

@SparkQA
Copy link

SparkQA commented May 26, 2015

Test build #33512 has finished for PR 6405 at commit fcb63a6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • trait TypeConstraint
    • trait ExpectsInputTypes extends TypeConstraint
    • trait TypeEqualConstraint extends TypeConstraint
    • abstract class UnaryNumeric extends UnaryExpression with NumericHolder with TypeConstraint
    • case class UnaryMinus(child: Expression) extends UnaryNumeric
    • case class Sqrt(child: Expression) extends UnaryNumeric
    • case class Abs(child: Expression) extends UnaryNumeric
    • abstract class NumericBinaryArithmetic extends BinaryArithmetic
    • case class Add(left: Expression, right: Expression) extends NumericBinaryArithmetic
    • case class Subtract(left: Expression, right: Expression) extends NumericBinaryArithmetic
    • case class Multiply(left: Expression, right: Expression) extends NumericBinaryArithmetic
    • case class Divide(left: Expression, right: Expression) extends NumericBinaryArithmetic
    • case class Remainder(left: Expression, right: Expression) extends NumericBinaryArithmetic
    • abstract class BitwiseBinaryArithmetic extends BinaryArithmetic
    • case class BitwiseAnd(left: Expression, right: Expression) extends BitwiseBinaryArithmetic
    • case class BitwiseOr(left: Expression, right: Expression) extends BitwiseBinaryArithmetic
    • case class BitwiseXor(left: Expression, right: Expression) extends BitwiseBinaryArithmetic
    • case class BitwiseNot(child: Expression) extends UnaryExpression with TypeConstraint
    • abstract class OrderingBinaryArithmetic extends BinaryArithmetic with OrderingHolder
    • case class MaxOf(left: Expression, right: Expression) extends OrderingBinaryArithmetic
    • case class MinOf(left: Expression, right: Expression) extends OrderingBinaryArithmetic
    • abstract class BinaryMathExpression(f: (Double, Double) => Double, name: String)
    • case class And(left: Expression, right: Expression) extends BinaryExpression
    • case class Or(left: Expression, right: Expression) extends BinaryExpression
    • case class LessThan(left: Expression, right: Expression) extends OrderingBinaryComparison
    • case class LessThanOrEqual(left: Expression, right: Expression) extends OrderingBinaryComparison
    • case class GreaterThan(left: Expression, right: Expression) extends OrderingBinaryComparison
    • case class GreaterThanOrEqual(left: Expression, right: Expression)
    • trait CaseWhenLike extends Expression with TypeConstraint
    • trait NumericHolder
    • trait OrderingHolder

@@ -62,6 +62,10 @@ trait CheckAnalysis {
val from = operator.inputSet.map(_.name).mkString(", ")
a.failAnalysis(s"cannot resolve '${a.prettyString}' given input columns $from")

case t: TypeConstraint if !t.isTypeMatch =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with this, we can remove the binary expression check two cases down right?

@rxin
Copy link
Contributor

rxin commented May 27, 2015

@cloud-fan my main feedback here is to simplify the type hierarchy. I think we are introducing too many concepts that are hard to understand here.

Additionally, we should add some unit tests for error reporting.

def errorMessage: String = throw new UnsupportedOperationException
}

trait TypeEqualConstraint extends TypeConstraint {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd remove this and just put it in the 3 places that use this right now. It is not that big of deal to duplicate this tiny amount of code.

@rxin
Copy link
Contributor

rxin commented May 27, 2015

BTW for error messages - we can look into what postgres shows in these cases. I think postgres has pretty good messages in general.

@SparkQA
Copy link

SparkQA commented May 27, 2015

Test build #33592 has finished for PR 6405 at commit 029c5b6.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • abstract class UnaryArithmetic extends UnaryExpression
    • case class UnaryMinus(child: Expression) extends UnaryArithmetic
    • case class Sqrt(child: Expression) extends UnaryArithmetic
    • case class Abs(child: Expression) extends UnaryArithmetic
    • case class BitwiseNot(child: Expression) extends UnaryArithmetic
    • case class MaxOf(left: Expression, right: Expression) extends BinaryExpression
    • case class MinOf(left: Expression, right: Expression) extends BinaryExpression

@cloud-fan
Copy link
Contributor Author

Retest it please.

@@ -408,7 +408,7 @@ trait HiveTypeCoercion {
Union(newLeft, newRight)

// fix decimal precision for expressions
case q => q.transformExpressions {
case q => q.transformExpressionsUp {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we separate type checking logic from resolved, we need to ensure all children have valid input types before we do type coercion(think about Add(Add(decimal1, decimal2), decimal3)). Here I just use transformExpressionsUp as a quick fix(and only for DecimalPrecision rule as it breaks a test...), should we skip not only !childrenResolved but also !children.forall(_.validInputTypes) during all type coercion, or still checking types in resolved? @rxin @marmbrus

@cloud-fan
Copy link
Contributor Author

Hi @rxin, now we define how to check input types at concrete expressions and define how to do type coercion at HiveTypeCoercion, is this the right logic?

Hi @marmbrus , according to this line of code, is BinaryExpression designed to have 2 equal type children? IMO BinaryExpression should not have natural type constraint.

@@ -106,6 +110,10 @@ abstract class LeafExpression extends Expression with trees.LeafNode[Expression]

abstract class UnaryExpression extends Expression with trees.UnaryNode[Expression] {
self: Product =>

override def foldable: Boolean = child.foldable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not always correct, is it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same for nulalble

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should just revert this change, and push it into the individual expressions or their base classes. Otherwise it is too error prone. If we add an expression that has different behavior, we will likely forget to change foldable/nullable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also remove the foldable and nullable methods from BinaryExpressions?

@rxin
Copy link
Contributor

rxin commented May 28, 2015

Regarding one of your todo: "When detect data types mismatch, we may need to do type casting(ExpectsInputTypes defines the casting behavior). Should we generalize it?"

I think it is okay to just have expression-specific type casting rules for these stuff, similar to what we do with BinaryExpression. The only "general" rule is ExpectedDataTypes.

override def dataType: DataType = child.dataType
override def toString: String = s"-$child"

override def typeMismatchErrorMessage: Option[String] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can push all typeMismatchErrorMessage for UnaryArithmetic into UnaryArithmetic?

@SparkQA
Copy link

SparkQA commented May 28, 2015

Test build #33648 has finished for PR 6405 at commit 4b7d082.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • abstract class UnaryArithmetic extends UnaryExpression
    • case class UnaryMinus(child: Expression) extends UnaryArithmetic
    • case class Sqrt(child: Expression) extends UnaryArithmetic
    • case class Abs(child: Expression) extends UnaryArithmetic
    • case class BitwiseNot(child: Expression) extends UnaryArithmetic
    • case class MaxOf(left: Expression, right: Expression) extends BinaryArithmetic
    • case class MinOf(left: Expression, right: Expression) extends BinaryArithmetic
    • case class Atan2(left: Expression, right: Expression)
    • case class Hypot(left: Expression, right: Expression)
    • case class EqualTo(left: Expression, right: Expression) extends BinaryComparison

@SparkQA
Copy link

SparkQA commented Jun 1, 2015

Test build #33882 has finished for PR 6405 at commit 364248f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • trait TypeCheckResult
    • case class TypeCheckFailure(message: String) extends TypeCheckResult
    • abstract class UnaryArithmetic extends UnaryExpression
    • case class UnaryMinus(child: Expression) extends UnaryArithmetic
    • case class Sqrt(child: Expression) extends UnaryArithmetic
    • case class Abs(child: Expression) extends UnaryArithmetic
    • case class BitwiseNot(child: Expression) extends UnaryArithmetic
    • case class MaxOf(left: Expression, right: Expression) extends BinaryArithmetic
    • case class MinOf(left: Expression, right: Expression) extends BinaryArithmetic
    • case class Atan2(left: Expression, right: Expression)
    • case class Hypot(left: Expression, right: Expression)
    • case class EqualTo(left: Expression, right: Expression) extends BinaryComparison

@SparkQA
Copy link

SparkQA commented Jun 1, 2015

Test build #33883 has finished for PR 6405 at commit b5ff31b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • trait TypeCheckResult
    • case class TypeCheckFailure(message: String) extends TypeCheckResult
    • abstract class UnaryArithmetic extends UnaryExpression
    • case class UnaryMinus(child: Expression) extends UnaryArithmetic
    • case class Sqrt(child: Expression) extends UnaryArithmetic
    • case class Abs(child: Expression) extends UnaryArithmetic
    • case class BitwiseNot(child: Expression) extends UnaryArithmetic
    • case class MaxOf(left: Expression, right: Expression) extends BinaryArithmetic
    • case class MinOf(left: Expression, right: Expression) extends BinaryArithmetic
    • case class Atan2(left: Expression, right: Expression)
    • case class Hypot(left: Expression, right: Expression)
    • case class EqualTo(left: Expression, right: Expression) extends BinaryComparison

@liancheng
Copy link
Contributor

LGTM

@rxin
Copy link
Contributor

rxin commented Jun 3, 2015

Thanks - I've merged this. Are there any other expressions that we should add the type checking to?

@asfgit asfgit closed this in d38cf21 Jun 3, 2015
adrian-wang added a commit to adrian-wang/spark that referenced this pull request Jun 8, 2015
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request Jun 12, 2015
…data type mismatch

It seems hard to find a common pattern of checking types in `Expression`. Sometimes we know what input types we need(like `And`, we know we need two booleans), sometimes we just have some rules(like `Add`, we need 2 numeric types which are equal). So I defined a general interface `checkInputDataTypes` in `Expression` which returns a `TypeCheckResult`. `TypeCheckResult` can tell whether this expression passes the type checking or what the type mismatch is.

This PR mainly works on apply input types checking for arithmetic and predicate expressions.

TODO: apply type checking interface to more expressions.

Author: Wenchen Fan <[email protected]>

Closes apache#6405 from cloud-fan/6444 and squashes the following commits:

b5ff31b [Wenchen Fan] address comments
b917275 [Wenchen Fan] rebase
39929d9 [Wenchen Fan] add todo
0808fd2 [Wenchen Fan] make constrcutor of TypeCheckResult private
3bee157 [Wenchen Fan] and decimal type coercion rule for binary comparison
8883025 [Wenchen Fan] apply type check interface to CaseWhen
cffb67c [Wenchen Fan] to have resolved call the data type check function
6eaadff [Wenchen Fan] add equal type constraint to EqualTo
3affbd8 [Wenchen Fan] more fixes
654d46a [Wenchen Fan] improve tests
e0a3628 [Wenchen Fan] improve error message
1524ff6 [Wenchen Fan] fix style
69ca3fe [Wenchen Fan] add error message and tests
c71d02c [Wenchen Fan] fix hive tests
6491721 [Wenchen Fan] use value class TypeCheckResult
7ae76b9 [Wenchen Fan] address comments
cb77e4f [Wenchen Fan] Improve error reporting for expression data type mismatch
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
…data type mismatch

It seems hard to find a common pattern of checking types in `Expression`. Sometimes we know what input types we need(like `And`, we know we need two booleans), sometimes we just have some rules(like `Add`, we need 2 numeric types which are equal). So I defined a general interface `checkInputDataTypes` in `Expression` which returns a `TypeCheckResult`. `TypeCheckResult` can tell whether this expression passes the type checking or what the type mismatch is.

This PR mainly works on apply input types checking for arithmetic and predicate expressions.

TODO: apply type checking interface to more expressions.

Author: Wenchen Fan <[email protected]>

Closes apache#6405 from cloud-fan/6444 and squashes the following commits:

b5ff31b [Wenchen Fan] address comments
b917275 [Wenchen Fan] rebase
39929d9 [Wenchen Fan] add todo
0808fd2 [Wenchen Fan] make constrcutor of TypeCheckResult private
3bee157 [Wenchen Fan] and decimal type coercion rule for binary comparison
8883025 [Wenchen Fan] apply type check interface to CaseWhen
cffb67c [Wenchen Fan] to have resolved call the data type check function
6eaadff [Wenchen Fan] add equal type constraint to EqualTo
3affbd8 [Wenchen Fan] more fixes
654d46a [Wenchen Fan] improve tests
e0a3628 [Wenchen Fan] improve error message
1524ff6 [Wenchen Fan] fix style
69ca3fe [Wenchen Fan] add error message and tests
c71d02c [Wenchen Fan] fix hive tests
6491721 [Wenchen Fan] use value class TypeCheckResult
7ae76b9 [Wenchen Fan] address comments
cb77e4f [Wenchen Fan] Improve error reporting for expression data type mismatch
asfgit pushed a commit that referenced this pull request Jun 24, 2015
a follow up of #6405.
Note: It's not a big change, a lot of changing is due to I swap some code in `aggregates.scala` to make aggregate functions right below its corresponding aggregate expressions.

Author: Wenchen Fan <[email protected]>

Closes #6723 from cloud-fan/type-check and squashes the following commits:

2124301 [Wenchen Fan] fix tests
5a658bb [Wenchen Fan] add tests
287d3bb [Wenchen Fan] apply type check interface to more expressions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants