Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-11691][SQL] Support setting hadoop compression codecs in DataFrameWriter#option #11324

Closed
wants to merge 12 commits into from

Conversation

maropu
Copy link
Member

@maropu maropu commented Feb 23, 2016

What changes were proposed in this pull request?

This pr is to support hadoop compression codecs when saving DataFrame to disk.
This is rework from #9657 because it gets stale.

closes #9657

@SparkQA
Copy link

SparkQA commented Feb 23, 2016

Test build #51775 has finished for PR 11324 at commit b9efe53.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member Author

maropu commented Feb 24, 2016

@rxin Could you check this?

@@ -58,7 +59,8 @@ import org.apache.spark.util.Utils
private[sql] case class InsertIntoHadoopFsRelation(
@transient relation: HadoopFsRelation,
@transient query: LogicalPlan,
mode: SaveMode)
mode: SaveMode,
codec: Option[Class[_ <: CompressionCodec]] = None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

codec -> compressionCodec to make it more clear

@rxin
Copy link
Contributor

rxin commented Feb 24, 2016

@maropu can you add "closes #9657" to your pull request description?

@maropu
Copy link
Member Author

maropu commented Feb 24, 2016

okay, I'm fixing now.

@SparkQA
Copy link

SparkQA commented Feb 24, 2016

Test build #51867 has finished for PR 11324 at commit a31e562.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 24, 2016

Test build #51872 has finished for PR 11324 at commit 7292aa7.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member Author

maropu commented Feb 24, 2016

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Feb 24, 2016

Test build #51875 has finished for PR 11324 at commit 7292aa7.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member Author

maropu commented Feb 24, 2016

ISTM the fail is not related to this...

[info] HiveCompatibilitySuite:
[info] - lateral_view *** FAILED *** (933 milliseconds)
[info]   Failed to execute query using catalyst:
[info]   Error: Job aborted due to stage failure: Task 0 in stage 8347.0 failed 1 times, most recent failure: 
Lost task 0.0 in stage 8347.0 (TID 120049, localhost): java.lang.ClassCastException: 
org.apache.spark.sql.catalyst.expressions.JoinedRow cannot be cast to 
org.apache.spark.sql.catalyst.expressions.UnsafeRow

@rxin
Copy link
Contributor

rxin commented Feb 24, 2016

Take a look at the end of ParquetRelation. It would be great if we can consolidate the two and put in a single place where we specify short compression codec names.

@rxin
Copy link
Contributor

rxin commented Feb 24, 2016

Yea I reverted a patch that broke the test.

@maropu
Copy link
Member Author

maropu commented Feb 25, 2016

Okay, I'll try.

@maropu maropu force-pushed the pr9657 branch 2 times, most recently from 14ef39a to 86c8c0c Compare February 25, 2016 05:53
@SparkQA
Copy link

SparkQA commented Feb 25, 2016

Test build #51934 has finished for PR 11324 at commit 14ef39a.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member Author

maropu commented Feb 25, 2016

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Feb 25, 2016

Test build #51935 has finished for PR 11324 at commit 86c8c0c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 25, 2016

Test build #51941 has finished for PR 11324 at commit 86c8c0c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member Author

maropu commented Feb 25, 2016

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Feb 25, 2016

Test build #51949 has finished for PR 11324 at commit 8cfbaea.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member Author

maropu commented Feb 25, 2016

@rxin okay, ready to review again.

@HyukjinKwon
Copy link
Member

@maropu @zjffdu @rxin I apologize that I carelessly open the same issue and submitted a PR. This is fixed in #11384.

@maropu
Copy link
Member Author

maropu commented Feb 26, 2016

@rxin @HyukjinKwon No problem ;) Is it okay to apply my diffs based on the @HyukjinKwon commit?

@rxin
Copy link
Contributor

rxin commented Feb 26, 2016

Yes - please.

@maropu
Copy link
Member Author

maropu commented Feb 27, 2016

@HyukjinKwon Is it okay to include document descriptions for codec options in this pr?

@maropu
Copy link
Member Author

maropu commented Feb 27, 2016

Close this pr, and move to a new pr.

@maropu maropu closed this Feb 27, 2016
asfgit pushed a commit that referenced this pull request Mar 2, 2016
…ent in ParquetRelation

## What changes were proposed in this pull request?
This pr to make the short names of compression codecs in `ParquetRelation` consistent against other ones. This pr comes from #11324.

## How was this patch tested?
Add more tests in `TextSuite`.

Author: Takeshi YAMAMURO <[email protected]>

Closes #11408 from maropu/SPARK-13528.
roygao94 pushed a commit to roygao94/spark that referenced this pull request Mar 22, 2016
…ent in ParquetRelation

## What changes were proposed in this pull request?
This pr to make the short names of compression codecs in `ParquetRelation` consistent against other ones. This pr comes from apache#11324.

## How was this patch tested?
Add more tests in `TextSuite`.

Author: Takeshi YAMAMURO <[email protected]>

Closes apache#11408 from maropu/SPARK-13528.
@maropu maropu deleted the pr9657 branch July 5, 2017 11:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants