Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-13509][SPARK-13507][SQL] Support for writing CSV with a single function call #11389

Closed
wants to merge 9 commits into from

Conversation

HyukjinKwon
Copy link
Member

https://issues.apache.org/jira/browse/SPARK-13507
https://issues.apache.org/jira/browse/SPARK-13509

What changes were proposed in this pull request?

This PR adds the support to write CSV data directly by a single call to the given path.

Several unitests were added for each functionality.

How was this patch tested?

This was tested with unittests and with dev/run_tests for coding style

@HyukjinKwon
Copy link
Member Author

@rxin I opened this PR because it looks writing csv() should be added anyway.
If I got the documentation stuff here wrong, I will move that back.

@SparkQA
Copy link

SparkQA commented Feb 26, 2016

Test build #52042 has finished for PR 11389 at commit a97a0a8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -464,6 +464,12 @@ final class DataFrameWriter private[sql](df: DataFrame) {
* format("parquet").save(path)
* }}}
*
* You can set the following JSON-specific options for writing JSON files:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it's in the wrong place?

@SparkQA
Copy link

SparkQA commented Feb 26, 2016

Test build #52047 has finished for PR 11389 at commit f82a2f4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -453,6 +453,12 @@ final class DataFrameWriter private[sql](df: DataFrame) {
* format("json").save(path)
* }}}
*
* You can set the following JSON-specific options for writing JSON files:
* <li>`compression` or `codec` (default `null`): compression codec to use when saving to file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just say compression, and don't mention codec.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually i'd remove codec support from the underlying source code, and only keep it for csv as an undocumented option for backward compatibility.

@HyukjinKwon
Copy link
Member Author

@rxin Actually, do you think we need the compression option for Parquet and ORC as well (although I am not going to deal with them in this PR even if we need them)?

@rxin
Copy link
Contributor

rxin commented Feb 29, 2016

It'd be great to fix in a future pr.

for this one, let's also fix python?

@HyukjinKwon
Copy link
Member Author

@rxin Sure.

@SparkQA
Copy link

SparkQA commented Feb 29, 2016

Test build #52151 has finished for PR 11389 at commit 9fe8fca.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Feb 29, 2016

LGTM pending tests

@SparkQA
Copy link

SparkQA commented Feb 29, 2016

Test build #52158 has finished for PR 11389 at commit 8fbad40.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Feb 29, 2016

Test build #52154 has finished for PR 11389 at commit 8efb0e3.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

Hm... It looks a bit weird. ParquetHadoopFsRelationSuite the test about types (test all data types - ByteType) keeps failing. This is also happening at other PRs I made sometimes.

I thought I wanted to submit a hot-fix but I found it actually works okay in my local.

@HyukjinKwon
Copy link
Member Author

retest this please

@HyukjinKwon
Copy link
Member Author

@SparkQA
Copy link

SparkQA commented Feb 29, 2016

Test build #52160 has finished for PR 11389 at commit 9ca920b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

As this passes sometimes (e.g. #11016), I will restart.

@HyukjinKwon
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Feb 29, 2016

Test build #52161 has finished for PR 11389 at commit 9ca920b.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 29, 2016

Test build #52163 has finished for PR 11389 at commit 9ca920b.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 29, 2016

Test build #52167 has finished for PR 11389 at commit 9ca920b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 29, 2016

Test build #52175 has finished for PR 11389 at commit fea9df8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 29, 2016

Test build #52176 has finished for PR 11389 at commit cec8442.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

I see that's a problem in new vecterizedreader. I missed the exception message. Looking deeper.

@HyukjinKwon
Copy link
Member Author

@rxin Anyway, would you merge this if it looks good?

@rxin
Copy link
Contributor

rxin commented Feb 29, 2016

Thanks - merging this in master.

@asfgit asfgit closed this in 02aa499 Feb 29, 2016
roygao94 pushed a commit to roygao94/spark that referenced this pull request Mar 22, 2016
… function call

https://issues.apache.org/jira/browse/SPARK-13507
https://issues.apache.org/jira/browse/SPARK-13509

## What changes were proposed in this pull request?
This PR adds the support to write CSV data directly by a single call to the given path.

Several unitests were added for each functionality.
## How was this patch tested?

This was tested with unittests and with `dev/run_tests` for coding style

Author: hyukjinkwon <[email protected]>
Author: Hyukjin Kwon <[email protected]>

Closes apache#11389 from HyukjinKwon/SPARK-13507-13509.
@HyukjinKwon HyukjinKwon deleted the SPARK-13507-13509 branch September 23, 2016 18:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants