-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-5649][SQL] added a rule to check datatypes cast #4425
Conversation
Test build #26908 has started for PR 4425 at commit
|
Test build #26908 has finished for PR 4425 at commit
|
Test FAILed. |
Test build #26910 has started for PR 4425 at commit
|
Test build #26911 has started for PR 4425 at commit
|
Test build #26910 has finished for PR 4425 at commit
|
Test PASSed. |
Test build #26911 has finished for PR 4425 at commit
|
Test PASSed. |
It should be either a bug of the Analyzer or the bad semantic of the user. As I think all of the children should be resolved before the checking, can you also give an example or unit test that will throw exception? |
unit test added |
Test build #26917 has started for PR 4425 at commit
|
Test build #26917 has finished for PR 4425 at commit
|
Test PASSed. |
we can throw an |
Test build #27260 has started for PR 4425 at commit
|
Test build #27260 has finished for PR 4425 at commit
|
Test FAILed. |
Retest this please |
Test build #27266 has started for PR 4425 at commit
|
Test build #27266 has finished for PR 4425 at commit
|
Test FAILed. |
retest this please |
Test build #27272 has started for PR 4425 at commit
|
Test build #27272 has finished for PR 4425 at commit
|
Test PASSed. |
Updated, @marmbrus any comment here? |
@@ -69,6 +69,7 @@ class Analyzer(catalog: Catalog, | |||
typeCoercionRules ++ | |||
extendedRules : _*), | |||
Batch("Check Analysis", Once, | |||
CheckCast :: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want to check casts after we check resolution. Otherwise casts of invalid attributes will still give confusing TreeNodeExceptions.
scala> sql("SELECT CAST(x AS STRING) FROM src").collect()
org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to dataType on unresolved object, tree: 'x
at org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.dataType(unresolved.scala:48)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we check resolution first,
1 check cast will be invalid since legitimate cast expression must be not resolved.
2 select cast(key as binary) from src
will give a scala.NotImplementedError
how about not change the order but catch exception of child.dataType to make it pass this rule? then your case will go into check resolution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scala.NotImplementedError should be a bug, filed PR #4552 for this
Thanks for bringing this up! I rolled the fix into #4558. |
Throw an exception when datatypes cast is illegitimate, this patch info user the cast issue in the sqls.
Before this patch, we only get an attribute unresolved exception which is really hard to understand the problems in users' sql