-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-23773][SQL] JacksonGenerator does not include keys that have null value for StructTypes #20884
Conversation
…ull value for StructTypes
ok to test |
Test build #4143 has finished for PR 20884 at commit
|
@@ -56,7 +56,7 @@ class JacksonGeneratorSuite extends SparkFunSuite { | |||
val gen = new JacksonGenerator(dataType, writer, option) | |||
gen.write(input) | |||
gen.flush() | |||
assert(writer.toString === """[{}]""") | |||
assert(writer.toString === """[{"a":null}]""") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think previous result was a valid test case ..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HyukjinKwon actually, it looks like the result should be [null]
rather than [{}]
.
Look at the following repro from spark-shell (downloaded binaries):
scala> val df = sqlContext.sql(""" select array(cast(null as struct<k:string>)) as my_array""")
df: org.apache.spark.sql.DataFrame = [my_array: array<struct<k:string>>]
scala> df.printSchema
root
|-- my_array: array (nullable = false)
| |-- element: struct (containsNull = true)
| | |-- k: string (nullable = true)
scala> df.toJSON.collect().foreach(println)
{"my_array":[null]}
scala> df.select(to_json($"my_array")).collect().foreach(x => println(x(0)))
[null]
In older version of JacksonGenerator
, we had a filter by element value, and if it was null
, gen.writeNull()
was called no matter what the type was (old implementation). But currently, we're calling gen.writeStartObject()...gen.writeEndObject()
no matter if the value is null.
I couldn't repro this with a query, but when StructsToJson
is called from this unit test, it goes through JacksonGenerator.arrElementWriter
which has lines
case st: StructType =>
(arr: SpecializedGetters, i: Int) => {
writeObject(writeFields(arr.getStruct(i, st.length), st, rootFieldWriters))
}
that makes it print json object even there is null
.
I'll look into this later and will try to find the easy workaround.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should compare this:
scala> sql(""" select array(cast(null as struct<k:string>)) as my_array""").toJSON.collect().foreach(println)
{"my_array":[null]}
scala> sql(""" select array(struct(cast(null as string))) as my_array""").toJSON.collect().foreach(println)
{"my_array":[{}]}
Shall we add a configuration or an option to control its behaviour if this is something we need to support? |
@@ -1229,7 +1229,7 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData { | |||
val df2 = df1.toDF | |||
val result = df2.toJSON.collect() | |||
// scalastyle:off | |||
assert(result(0) === "{\"f1\":1,\"f2\":\"A1\",\"f3\":true,\"f4\":[\"1\",\" A1\",\" true\",\" null\"]}") | |||
assert(result(0) === "{\"f1\":1,\"f2\":\"A1\",\"f3\":true,\"f4\":[\"1\",\" A1\",\" true\",\" null\"],\"f5\":null}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we go the current way, it'd write out every null
with every field:
{"a":null,"b":null,"c":null}
{"a":null,"b":null,"c":1}
{"a":1,"b":null,"c":1}
{"a":1,"b":2,"c":3}
which I think's quite inefficient. Does that fix actually use case to be clear?
Can one of the admins verify this patch? |
@sameeragarwal, do you see some values on this? FWIW, for me don't have preference. If you see some values on this, probably we could go with a configuration .. otherwise I would just like to suggest to close it if you feel in the same way with me. |
@HyukjinKwon I think it should be okay to close this at least for now. Just to add a little context behind this change, Facebook relies on the |
Thank you so much for sharing it. |
Can we reopen this issue or let me open a new one? |
Ah, I just saw this. Okay, thanks for fixing it. |
What changes were proposed in this pull request?
As stated in Jira, when
toJSON
is called on a dataset, the result JSON string will not have keys displayed forStructType
s that have null value. This PR fixes the issue and writes field with "null" value.How was this patch tested?
Added a unit test to
JsonSuite.scala