-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-29444] Add configuration to support JacksonGenrator to keep fields with null values #26098
Changes from all commits
0b90344
4f3c83f
143b9e7
f57bea2
eb123bf
f8aea25
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1153,6 +1153,12 @@ object SQLConf { | |
.booleanConf | ||
.createWithDefault(true) | ||
|
||
val JSON_GENERATOR_IGNORE_NULL_FIELDS = | ||
buildConf("spark.sql.jsonGenerator.ignoreNullFields") | ||
.doc("If false, JacksonGenerator will generate null for null fields in Struct.") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If I were a user, I would have no idea what There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe I can describe this in a better way. |
||
.stringConf | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is it a string conf? shouldn't it be a boolean conf? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is only used in JasonOptions, where we need string value, and use toBoolean to get boolean conf |
||
.createWithDefault("true") | ||
|
||
val FILE_SINK_LOG_DELETION = buildConf("spark.sql.streaming.fileSink.log.deletion") | ||
.internal() | ||
.doc("Whether to delete the expired log files in file stream sink.") | ||
|
@@ -2323,6 +2329,8 @@ class SQLConf extends Serializable with Logging { | |
|
||
def sessionLocalTimeZone: String = getConf(SQLConf.SESSION_LOCAL_TIMEZONE) | ||
|
||
def jsonGeneratorIgnoreNullFields: String = getConf(SQLConf.JSON_GENERATOR_IGNORE_NULL_FIELDS) | ||
|
||
def parallelFileListingInStatsComputation: Boolean = | ||
getConf(SQLConf.PARALLEL_FILE_LISTING_IN_STATS_COMPUTATION) | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -39,6 +39,33 @@ class JacksonGeneratorSuite extends SparkFunSuite { | |
assert(writer.toString === """{"a":1}""") | ||
} | ||
|
||
test("SPARK-29444: initial with StructType and write out an empty row " + | ||
"with ignoreNullFields=false") { | ||
val dataType = StructType(StructField("a", IntegerType) :: Nil) | ||
val input = InternalRow(null) | ||
val writer = new CharArrayWriter() | ||
val allowNullOption = | ||
new JSONOptions(Map("ignoreNullFields" -> "false"), gmtId) | ||
val gen = new JacksonGenerator(dataType, writer, allowNullOption) | ||
gen.write(input) | ||
gen.flush() | ||
assert(writer.toString === """{"a":null}""") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can we also test null inner field? e.g. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure, I have added a test for this |
||
} | ||
|
||
test("SPARK-29444: initial with StructType field and write out a row " + | ||
"with ignoreNullFields=false and struct inner null") { | ||
val fieldType = StructType(StructField("b", IntegerType) :: Nil) | ||
val dataType = StructType(StructField("a", fieldType) :: Nil) | ||
val input = InternalRow(InternalRow(null)) | ||
val writer = new CharArrayWriter() | ||
val allowNullOption = | ||
new JSONOptions(Map("ignoreNullFields" -> "false"), gmtId) | ||
val gen = new JacksonGenerator(dataType, writer, allowNullOption) | ||
gen.write(input) | ||
gen.flush() | ||
assert(writer.toString === """{"a":{"b":null}}""") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit but I would call |
||
} | ||
|
||
test("initial with StructType and write out rows") { | ||
val dataType = StructType(StructField("a", IntegerType) :: Nil) | ||
val input = new GenericArrayData(InternalRow(1) :: InternalRow(2) :: Nil) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, you should document this in
DataFrameWrtier
,DataStreamWrtier
,readwriter.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okey, I'll try to follow up this PR, and add this to document