-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-29444][FOLLOWUP] add doc and python parameter for ignoreNullFields in json generating #26227
Conversation
ok to test |
Test build #112543 has finished for PR 26227 at commit
|
.doc("If false, JacksonGenerator will generate null for null fields in Struct.") | ||
.stringConf | ||
.createWithDefault("true") | ||
.doc("Whether to ignore null fields in column/struct during json generating. " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just write like this:
Whether to ignore null fields when generating JSON objects in JSON data source and
JSON functions such as to_json.
If false, it generates null for null fields in JSON objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good, it's better.
@@ -687,6 +687,8 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) { | |||
* <li>`encoding` (by default it is not set): specifies encoding (charset) of saved json | |||
* files. If it is not set, the UTF-8 charset will be used. </li> | |||
* <li>`lineSep` (default `\n`): defines the line separator that should be used for writing.</li> | |||
* <li>`ignoreNullFields` (default `true`): whether to ignore null fields in column/struct | |||
* during json generating. </li> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good otherwise.
Test build #112600 has finished for PR 26227 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Merged to master.
* [SPARK-29444] Add configuration to support JacksonGenrator to keep fields with null values As mentioned in jira, sometimes we need to be able to support the retention of null columns when writing JSON. For example, sparkmagic(used widely in jupyter with livy) will generate sql query results based on DataSet.toJSON and parse JSON to pandas DataFrame to display. If there is a null column, it is easy to have some column missing or even the query result is empty. The loss of the null column in the first row, may cause parsing exceptions or loss of entire column data. Example in spark-shell. scala> spark.sql("select null as a, 1 as b").toJSON.collect.foreach(println) {"b":1} scala> spark.sql("set spark.sql.jsonGenerator.struct.ignore.null=false") res2: org.apache.spark.sql.DataFrame = [key: string, value: string] scala> spark.sql("select null as a, 1 as b").toJSON.collect.foreach(println) {"a":null,"b":1} Add new test to JacksonGeneratorSuite Lead-authored-by: stczwd <[email protected]> Co-authored-by: Jackey Lee <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit with id 78b0cbe) * [SPARK-29444][FOLLOWUP] add doc and python parameter for ignoreNullFields in json generating # What changes were proposed in this pull request? Add description for ignoreNullFields, which is commited in apache#26098 , in DataFrameWriter and readwriter.py. Enable user to use ignoreNullFields in pyspark. ### Does this PR introduce any user-facing change? No ### How was this patch tested? run unit tests Closes apache#26227 from stczwd/json-generator-doc. Authored-by: stczwd <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> Co-authored-by: stczwd <[email protected]>
* [SPARK-29444] Add configuration to support JacksonGenrator to keep fields with null values As mentioned in jira, sometimes we need to be able to support the retention of null columns when writing JSON. For example, sparkmagic(used widely in jupyter with livy) will generate sql query results based on DataSet.toJSON and parse JSON to pandas DataFrame to display. If there is a null column, it is easy to have some column missing or even the query result is empty. The loss of the null column in the first row, may cause parsing exceptions or loss of entire column data. Example in spark-shell. scala> spark.sql("select null as a, 1 as b").toJSON.collect.foreach(println) {"b":1} scala> spark.sql("set spark.sql.jsonGenerator.struct.ignore.null=false") res2: org.apache.spark.sql.DataFrame = [key: string, value: string] scala> spark.sql("select null as a, 1 as b").toJSON.collect.foreach(println) {"a":null,"b":1} Add new test to JacksonGeneratorSuite Lead-authored-by: stczwd <[email protected]> Co-authored-by: Jackey Lee <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit with id 78b0cbe) * [SPARK-29444][FOLLOWUP] add doc and python parameter for ignoreNullFields in json generating # What changes were proposed in this pull request? Add description for ignoreNullFields, which is commited in apache#26098 , in DataFrameWriter and readwriter.py. Enable user to use ignoreNullFields in pyspark. ### Does this PR introduce any user-facing change? No ### How was this patch tested? run unit tests Closes apache#26227 from stczwd/json-generator-doc. Authored-by: stczwd <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> Co-authored-by: stczwd <[email protected]>
What changes were proposed in this pull request?
Add description for ignoreNullFields, which is commited in #26098 , in DataFrameWriter and readwriter.py.
Enable user to use ignoreNullFields in pyspark.
Does this PR introduce any user-facing change?
No
How was this patch tested?
run unit tests