-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-7562][SPARK-6444][SQL] Improve error reporting for expression data type mismatch #6405
Conversation
Test build #33510 has finished for PR 6405 at commit
|
Test build #33512 has finished for PR 6405 at commit
|
@@ -62,6 +62,10 @@ trait CheckAnalysis { | |||
val from = operator.inputSet.map(_.name).mkString(", ") | |||
a.failAnalysis(s"cannot resolve '${a.prettyString}' given input columns $from") | |||
|
|||
case t: TypeConstraint if !t.isTypeMatch => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with this, we can remove the binary expression check two cases down right?
@cloud-fan my main feedback here is to simplify the type hierarchy. I think we are introducing too many concepts that are hard to understand here. Additionally, we should add some unit tests for error reporting. |
def errorMessage: String = throw new UnsupportedOperationException | ||
} | ||
|
||
trait TypeEqualConstraint extends TypeConstraint { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd remove this and just put it in the 3 places that use this right now. It is not that big of deal to duplicate this tiny amount of code.
BTW for error messages - we can look into what postgres shows in these cases. I think postgres has pretty good messages in general. |
Test build #33592 has finished for PR 6405 at commit
|
Retest it please. |
@@ -408,7 +408,7 @@ trait HiveTypeCoercion { | |||
Union(newLeft, newRight) | |||
|
|||
// fix decimal precision for expressions | |||
case q => q.transformExpressions { | |||
case q => q.transformExpressionsUp { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we separate type checking logic from resolved
, we need to ensure all children have valid input types before we do type coercion(think about Add(Add(decimal1, decimal2), decimal3)
). Here I just use transformExpressionsUp
as a quick fix(and only for DecimalPrecision
rule as it breaks a test...), should we skip not only !childrenResolved
but also !children.forall(_.validInputTypes)
during all type coercion, or still checking types in resolved
? @rxin @marmbrus
Hi @rxin, now we define how to check input types at concrete expressions and define how to do type coercion at Hi @marmbrus , according to this line of code, is |
@@ -106,6 +110,10 @@ abstract class LeafExpression extends Expression with trees.LeafNode[Expression] | |||
|
|||
abstract class UnaryExpression extends Expression with trees.UnaryNode[Expression] { | |||
self: Product => | |||
|
|||
override def foldable: Boolean = child.foldable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not always correct, is it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same for nulalble
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should just revert this change, and push it into the individual expressions or their base classes. Otherwise it is too error prone. If we add an expression that has different behavior, we will likely forget to change foldable/nullable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also remove the foldable
and nullable
methods from BinaryExpressions
?
Regarding one of your todo: "When detect data types mismatch, we may need to do type casting(ExpectsInputTypes defines the casting behavior). Should we generalize it?" I think it is okay to just have expression-specific type casting rules for these stuff, similar to what we do with BinaryExpression. The only "general" rule is ExpectedDataTypes. |
override def dataType: DataType = child.dataType | ||
override def toString: String = s"-$child" | ||
|
||
override def typeMismatchErrorMessage: Option[String] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can push all typeMismatchErrorMessage for UnaryArithmetic into UnaryArithmetic?
Test build #33648 has finished for PR 6405 at commit
|
Test build #33882 has finished for PR 6405 at commit
|
Test build #33883 has finished for PR 6405 at commit
|
LGTM |
Thanks - I've merged this. Are there any other expressions that we should add the type checking to? |
…data type mismatch It seems hard to find a common pattern of checking types in `Expression`. Sometimes we know what input types we need(like `And`, we know we need two booleans), sometimes we just have some rules(like `Add`, we need 2 numeric types which are equal). So I defined a general interface `checkInputDataTypes` in `Expression` which returns a `TypeCheckResult`. `TypeCheckResult` can tell whether this expression passes the type checking or what the type mismatch is. This PR mainly works on apply input types checking for arithmetic and predicate expressions. TODO: apply type checking interface to more expressions. Author: Wenchen Fan <[email protected]> Closes apache#6405 from cloud-fan/6444 and squashes the following commits: b5ff31b [Wenchen Fan] address comments b917275 [Wenchen Fan] rebase 39929d9 [Wenchen Fan] add todo 0808fd2 [Wenchen Fan] make constrcutor of TypeCheckResult private 3bee157 [Wenchen Fan] and decimal type coercion rule for binary comparison 8883025 [Wenchen Fan] apply type check interface to CaseWhen cffb67c [Wenchen Fan] to have resolved call the data type check function 6eaadff [Wenchen Fan] add equal type constraint to EqualTo 3affbd8 [Wenchen Fan] more fixes 654d46a [Wenchen Fan] improve tests e0a3628 [Wenchen Fan] improve error message 1524ff6 [Wenchen Fan] fix style 69ca3fe [Wenchen Fan] add error message and tests c71d02c [Wenchen Fan] fix hive tests 6491721 [Wenchen Fan] use value class TypeCheckResult 7ae76b9 [Wenchen Fan] address comments cb77e4f [Wenchen Fan] Improve error reporting for expression data type mismatch
…data type mismatch It seems hard to find a common pattern of checking types in `Expression`. Sometimes we know what input types we need(like `And`, we know we need two booleans), sometimes we just have some rules(like `Add`, we need 2 numeric types which are equal). So I defined a general interface `checkInputDataTypes` in `Expression` which returns a `TypeCheckResult`. `TypeCheckResult` can tell whether this expression passes the type checking or what the type mismatch is. This PR mainly works on apply input types checking for arithmetic and predicate expressions. TODO: apply type checking interface to more expressions. Author: Wenchen Fan <[email protected]> Closes apache#6405 from cloud-fan/6444 and squashes the following commits: b5ff31b [Wenchen Fan] address comments b917275 [Wenchen Fan] rebase 39929d9 [Wenchen Fan] add todo 0808fd2 [Wenchen Fan] make constrcutor of TypeCheckResult private 3bee157 [Wenchen Fan] and decimal type coercion rule for binary comparison 8883025 [Wenchen Fan] apply type check interface to CaseWhen cffb67c [Wenchen Fan] to have resolved call the data type check function 6eaadff [Wenchen Fan] add equal type constraint to EqualTo 3affbd8 [Wenchen Fan] more fixes 654d46a [Wenchen Fan] improve tests e0a3628 [Wenchen Fan] improve error message 1524ff6 [Wenchen Fan] fix style 69ca3fe [Wenchen Fan] add error message and tests c71d02c [Wenchen Fan] fix hive tests 6491721 [Wenchen Fan] use value class TypeCheckResult 7ae76b9 [Wenchen Fan] address comments cb77e4f [Wenchen Fan] Improve error reporting for expression data type mismatch
a follow up of #6405. Note: It's not a big change, a lot of changing is due to I swap some code in `aggregates.scala` to make aggregate functions right below its corresponding aggregate expressions. Author: Wenchen Fan <[email protected]> Closes #6723 from cloud-fan/type-check and squashes the following commits: 2124301 [Wenchen Fan] fix tests 5a658bb [Wenchen Fan] add tests 287d3bb [Wenchen Fan] apply type check interface to more expressions
It seems hard to find a common pattern of checking types in
Expression
. Sometimes we know what input types we need(likeAnd
, we know we need two booleans), sometimes we just have some rules(likeAdd
, we need 2 numeric types which are equal). So I defined a general interfacecheckInputDataTypes
inExpression
which returns aTypeCheckResult
.TypeCheckResult
can tell whether this expression passes the type checking or what the type mismatch is.This PR mainly works on apply input types checking for arithmetic and predicate expressions.
TODO: apply type checking interface to more expressions.