Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-13509][SPARK-13507][SQL] Support for writing CSV with a single function call #11389

Closed
wants to merge 9 commits into from
29 changes: 29 additions & 0 deletions sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
Original file line number Diff line number Diff line change
Expand Up @@ -453,6 +453,12 @@ final class DataFrameWriter private[sql](df: DataFrame) {
* format("json").save(path)
* }}}
*
* You can set the following JSON-specific options for writing JSON files:
* <li>`compression` or `codec` (default `null`): compression codec to use when saving to file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just say compression, and don't mention codec.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually i'd remove codec support from the underlying source code, and only keep it for csv as an undocumented option for backward compatibility.

* This should be the fully qualified name of a class implementing
* [[org.apache.hadoop.io.compress.CompressionCodec]] or one of the known case-insensitive
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd remove the mention of org.apache.hadoop.io.compress.CompressionCodec

The hadoop API is so difficult to use that I don't think it deserves to be user-facing here. It adds too much complexity to this end-user api.

* shorten names(`bzip2`, `gzip`, `lz4`, and `snappy`). </li>
*
* @since 1.4.0
*/
def json(path: String): Unit = format("json").save(path)
Expand Down Expand Up @@ -492,10 +498,33 @@ final class DataFrameWriter private[sql](df: DataFrame) {
* df.write().text("/path/to/output")
* }}}
*
* You can set the following options for writing text files:
* <li>`compression` or `codec` (default `null`): compression codec to use when saving to file.
* This should be the fully qualified name of a class implementing
* [[org.apache.hadoop.io.compress.CompressionCodec]] or one of the known case-insensitive
* shorten names(`bzip2`, `gzip`, `lz4`, and `snappy`). </li>
*
* @since 1.6.0
*/
def text(path: String): Unit = format("text").save(path)

/**
* Saves the content of the [[DataFrame]] in CSV format at the specified path.
* This is equivalent to:
* {{{
* format("csv").save(path)
* }}}
*
* You can set the following CSV-specific options for writing CSV files:
* <li>`compression` or `codec` (default `null`): compression codec to use when saving to file.
* This should be the fully qualified name of a class implementing
* [[org.apache.hadoop.io.compress.CompressionCodec]] or one of the known case-insensitive
* shorten names(`bzip2`, `gzip`, `lz4`, and `snappy`). </li>
*
* @since 2.0.0
*/
def csv(path: String): Unit = format("csv").save(path)

///////////////////////////////////////////////////////////////////////////////////////
// Builder pattern config options
///////////////////////////////////////////////////////////////////////////////////////
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -268,9 +268,8 @@ class CSVSuite extends QueryTest with SharedSQLContext with SQLTestUtils {
.load(testFile(carsFile))

cars.coalesce(1).write
.format("csv")
.option("header", "true")
.save(csvDir)
.csv(csvDir)

val carsCopy = sqlContext.read
.format("csv")
Expand Down