Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-29444][FOLLOWUP] add doc and python parameter for ignoreNullFields in json generating #26227

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions python/pyspark/sql/readwriter.py
Original file line number Diff line number Diff line change
Expand Up @@ -788,7 +788,7 @@ def saveAsTable(self, name, format=None, mode=None, partitionBy=None, **options)

@since(1.4)
def json(self, path, mode=None, compression=None, dateFormat=None, timestampFormat=None,
lineSep=None, encoding=None):
lineSep=None, encoding=None, ignoreNullFields=None):
"""Saves the content of the :class:`DataFrame` in JSON format
(`JSON Lines text format or newline-delimited JSON <http://jsonlines.org/>`_) at the
specified path.
Expand Down Expand Up @@ -817,13 +817,16 @@ def json(self, path, mode=None, compression=None, dateFormat=None, timestampForm
the default UTF-8 charset will be used.
:param lineSep: defines the line separator that should be used for writing. If None is
set, it uses the default value, ``\\n``.
:param ignoreNullFields: whether to ignore null fields in column/struct
during json generating. If None is set,
it uses the default value, ``true``.

>>> df.write.json(os.path.join(tempfile.mkdtemp(), 'data'))
"""
self.mode(mode)
self._set_opts(
compression=compression, dateFormat=dateFormat, timestampFormat=timestampFormat,
lineSep=lineSep, encoding=encoding)
lineSep=lineSep, encoding=encoding, ignoreNullFields=ignoreNullFields)
self._jwrite.json(path)

@since(1.4)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -78,8 +78,8 @@ private[sql] class JSONOptions(
val dropFieldIfAllNull = parameters.get("dropFieldIfAllNull").map(_.toBoolean).getOrElse(false)

// Whether to ignore null fields during json generating
val ignoreNullFields = parameters.getOrElse("ignoreNullFields",
SQLConf.get.jsonGeneratorIgnoreNullFields).toBoolean
val ignoreNullFields = parameters.get("ignoreNullFields").map(_.toBoolean)
.getOrElse(SQLConf.get.jsonGeneratorIgnoreNullFields)

// A language tag in IETF BCP 47 format
val locale: Locale = parameters.get("locale").map(Locale.forLanguageTag).getOrElse(Locale.US)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1189,9 +1189,10 @@ object SQLConf {

val JSON_GENERATOR_IGNORE_NULL_FIELDS =
buildConf("spark.sql.jsonGenerator.ignoreNullFields")
.doc("If false, JacksonGenerator will generate null for null fields in Struct.")
.stringConf
.createWithDefault("true")
.doc("Whether to ignore null fields in column/struct during json generating. " +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just write like this:

Whether to ignore null fields when generating JSON objects in JSON data source and 
JSON functions such as to_json.
If false, it generates null for null fields in JSON objects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good, it's better.

"If false, json generator will generate null in Column/Struct.")
.booleanConf
.createWithDefault(true)

val FILE_SINK_LOG_DELETION = buildConf("spark.sql.streaming.fileSink.log.deletion")
.internal()
Expand Down Expand Up @@ -2385,7 +2386,7 @@ class SQLConf extends Serializable with Logging {

def sessionLocalTimeZone: String = getConf(SQLConf.SESSION_LOCAL_TIMEZONE)

def jsonGeneratorIgnoreNullFields: String = getConf(SQLConf.JSON_GENERATOR_IGNORE_NULL_FIELDS)
def jsonGeneratorIgnoreNullFields: Boolean = getConf(SQLConf.JSON_GENERATOR_IGNORE_NULL_FIELDS)

def parallelFileListingInStatsComputation: Boolean =
getConf(SQLConf.PARALLEL_FILE_LISTING_IN_STATS_COMPUTATION)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -687,6 +687,8 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) {
* <li>`encoding` (by default it is not set): specifies encoding (charset) of saved json
* files. If it is not set, the UTF-8 charset will be used. </li>
* <li>`lineSep` (default `\n`): defines the line separator that should be used for writing.</li>
* <li>`ignoreNullFields` (default `true`): whether to ignore null fields in column/struct
* during json generating. </li>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here too

* </ul>
*
* @since 1.4.0
Expand Down