[SPARK-29444][FOLLOWUP] add doc and python parameter for ignoreNullFields in json generating #26227

jackylee-ch · 2019-10-23T12:28:38Z

What changes were proposed in this pull request?

Add description for ignoreNullFields, which is commited in #26098 , in DataFrameWriter and readwriter.py.
Enable user to use ignoreNullFields in pyspark.

Does this PR introduce any user-facing change?

No

How was this patch tested?

run unit tests

jackylee-ch · 2019-10-23T12:30:29Z

cc @HyukjinKwon @cloud-fan

HyukjinKwon · 2019-10-23T12:39:58Z

ok to test

SparkQA · 2019-10-23T16:29:21Z

Test build #112543 has finished for PR 26227 at commit 41384d4.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

python/pyspark/sql/readwriter.py

HyukjinKwon · 2019-10-24T06:47:32Z

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

-      .doc("If false, JacksonGenerator will generate null for null fields in Struct.")
-      .stringConf
-      .createWithDefault("true")
+      .doc("Whether to ignore null fields in column/struct during json generating. " +


I would just write like this:

Whether to ignore null fields when generating JSON objects in JSON data source and JSON functions such as to_json. If false, it generates null for null fields in JSON objects.

Good, it's better.

HyukjinKwon · 2019-10-24T06:47:46Z

sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala

@@ -687,6 +687,8 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) {
   * <li>`encoding` (by default it is not set): specifies encoding (charset) of saved json
   * files. If it is not set, the UTF-8 charset will be used. </li>
   * <li>`lineSep` (default `\n`): defines the line separator that should be used for writing.</li>
+   * <li>`ignoreNullFields` (default `true`): whether to ignore null fields in column/struct
+   * during json generating. </li>


HyukjinKwon

Looks good otherwise.

SparkQA · 2019-10-24T12:44:05Z

Test build #112600 has finished for PR 26227 at commit 40bb515.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun

+1, LGTM. Merged to master.

* [SPARK-29444] Add configuration to support JacksonGenrator to keep fields with null values As mentioned in jira, sometimes we need to be able to support the retention of null columns when writing JSON. For example, sparkmagic(used widely in jupyter with livy) will generate sql query results based on DataSet.toJSON and parse JSON to pandas DataFrame to display. If there is a null column, it is easy to have some column missing or even the query result is empty. The loss of the null column in the first row, may cause parsing exceptions or loss of entire column data. Example in spark-shell. scala> spark.sql("select null as a, 1 as b").toJSON.collect.foreach(println) {"b":1} scala> spark.sql("set spark.sql.jsonGenerator.struct.ignore.null=false") res2: org.apache.spark.sql.DataFrame = [key: string, value: string] scala> spark.sql("select null as a, 1 as b").toJSON.collect.foreach(println) {"a":null,"b":1} Add new test to JacksonGeneratorSuite Lead-authored-by: stczwd <[email protected]> Co-authored-by: Jackey Lee <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit with id 78b0cbe) * [SPARK-29444][FOLLOWUP] add doc and python parameter for ignoreNullFields in json generating # What changes were proposed in this pull request? Add description for ignoreNullFields, which is commited in apache#26098 , in DataFrameWriter and readwriter.py. Enable user to use ignoreNullFields in pyspark. ### Does this PR introduce any user-facing change? No ### How was this patch tested? run unit tests Closes apache#26227 from stczwd/json-generator-doc. Authored-by: stczwd <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> Co-authored-by: stczwd <[email protected]>

[SPARK-29444][FOLLOWUP] add doc for ignoreNullFields in json generating

41384d4

dongjoon-hyun added the SQL label Oct 23, 2019

dongjoon-hyun reviewed Oct 24, 2019

View reviewed changes

python/pyspark/sql/readwriter.py Show resolved Hide resolved

jackylee-ch changed the title ~~[SPARK-29444][FOLLOWUP] add doc for ignoreNullFields in json generating~~ [SPARK-29444][FOLLOWUP] add doc and python parameter for ignoreNullFields in json generating Oct 24, 2019

HyukjinKwon reviewed Oct 24, 2019

View reviewed changes

HyukjinKwon approved these changes Oct 24, 2019

View reviewed changes

[SPARK-29444] change ignoreNullFields doc

40bb515

dongjoon-hyun approved these changes Oct 24, 2019

View reviewed changes

dongjoon-hyun closed this in dcf5eaf Oct 24, 2019

jackylee-ch deleted the json-generator-doc branch October 25, 2019 00:21

zero323 mentioned this pull request Jan 7, 2020

Sync with changes merged after 6378d4bc06cd1bb1a209bd5fb63d10ef52d75eb4 zero323/pyspark-stubs#230

Closed

47 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-29444][FOLLOWUP] add doc and python parameter for ignoreNullFields in json generating #26227

[SPARK-29444][FOLLOWUP] add doc and python parameter for ignoreNullFields in json generating #26227

jackylee-ch commented Oct 23, 2019 •

edited

Loading

jackylee-ch commented Oct 23, 2019

HyukjinKwon commented Oct 23, 2019

SparkQA commented Oct 23, 2019

HyukjinKwon Oct 24, 2019

jackylee-ch Oct 24, 2019

HyukjinKwon Oct 24, 2019

HyukjinKwon left a comment

SparkQA commented Oct 24, 2019

dongjoon-hyun left a comment

[SPARK-29444][FOLLOWUP] add doc and python parameter for ignoreNullFields in json generating #26227

[SPARK-29444][FOLLOWUP] add doc and python parameter for ignoreNullFields in json generating #26227

Conversation

jackylee-ch commented Oct 23, 2019 • edited Loading

What changes were proposed in this pull request?

Does this PR introduce any user-facing change?

How was this patch tested?

jackylee-ch commented Oct 23, 2019

HyukjinKwon commented Oct 23, 2019

SparkQA commented Oct 23, 2019

HyukjinKwon Oct 24, 2019

Choose a reason for hiding this comment

jackylee-ch Oct 24, 2019

Choose a reason for hiding this comment

HyukjinKwon Oct 24, 2019

Choose a reason for hiding this comment

HyukjinKwon left a comment

Choose a reason for hiding this comment

SparkQA commented Oct 24, 2019

dongjoon-hyun left a comment

Choose a reason for hiding this comment

jackylee-ch commented Oct 23, 2019 •

edited

Loading