-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-4856] [SQL] NullType instead of StringType when sampling against empty string or nul... #3708
Conversation
Test FAILed. |
retest this please |
Test build #24481 has started for PR 3708 at commit
|
Test build #24481 has finished for PR 3708 at commit
|
Test PASSed. |
@@ -263,6 +263,8 @@ private[sql] object JsonRDD extends Logging { | |||
val elementType = typeOfArray(array) | |||
buildKeyPathForInnerStructs(array, elementType) :+ (key, elementType) | |||
} | |||
// we couldn't tell what the type is if the value is null or empty string | |||
case (key: String, value) if value == "" || value == null => (key, NullType) :: Nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The null case makes sense to me, but why ""
as well? That seems to be unequivocally a String
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the unit test, which probably make more sense.
In some cases (as shown in the unit test), ""
is equivalent to null
for struct type, so we'd better not to say "it's MUST be StringType
if we meet an empty string".
In the meantime, the NullType
is the minimum
data type, and it can be promoted to any other data type in JsonRDD
(e.g. promote to StructType), however, it's impossible to promote a StringType
to StructType
.
It's safe to make it as NullType
here, as we can promote it as StringType
in the last promote rules, see https://github.com/chenghao-intel/spark/blob/json/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala#L231
Test build #24519 has started for PR 3708 at commit
|
Test build #24519 has finished for PR 3708 at commit
|
Test PASSed. |
Thanks! Merged to master. |
As empty string (the "headers") will be considered as String in the beginning (in line 2 and 3), it ignores the real nested data type (struct type "headers" in line 1), and also take the line 1 (the "headers") as String Type, which is not our expected.