Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-7567] [SQL] [follow-up] Use a new flag to set output committer based on mapreduce apis #6130

Closed
wants to merge 1 commit into from

Conversation

yhuai
Copy link
Contributor

@yhuai yhuai commented May 13, 2015

@@ -294,17 +294,16 @@ private[sql] abstract class BaseWriterContainer(

private def newOutputCommitter(context: TaskAttemptContext): OutputCommitter = {
val committerClass = context.getConfiguration.getClass(
"mapred.output.committer.class", null, classOf[OutputCommitter])
"mapreduce.output.committer.class", null, classOf[OutputCommitter])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we are using APIs in mapreduce package, we cannot use mapred.output.committer.class. I am just creating another conf flag called mapreduce.output.committer.class (it is not defined in hadoop).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a Spark SQL only property, how about renaming it to spark.sql.mapreduce.outputCommitterClass? Current name looks like a genuine Hadoop property name, which is not true.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 13, 2015

Test build #32647 has started for PR 6130 at commit e8254b0.

@SparkQA
Copy link

SparkQA commented May 13, 2015

Test build #32647 has finished for PR 6130 at commit e8254b0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32647/
Test PASSed.

@yhuai yhuai changed the title [SPARK-7567] [SQL] [follow-up] Add an option to FSBasedRelation to indicate if it supports writing data to S3 directly [SPARK-7567] [SQL] [follow-up] Add an option to FSBasedRelation to indicate if it supports custom output committer May 13, 2015
@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 13, 2015

Test build #32656 has started for PR 6130 at commit e0cb523.

@yhuai yhuai closed this May 13, 2015
@yhuai yhuai reopened this May 13, 2015
@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 14, 2015

Test build #32658 has started for PR 6130 at commit cdb0aba.

@SparkQA
Copy link

SparkQA commented May 14, 2015

Test build #32656 has finished for PR 6130 at commit e0cb523.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32656/
Test FAILed.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 14, 2015

Test build #32661 has started for PR 6130 at commit 8870f5e.

@SparkQA
Copy link

SparkQA commented May 14, 2015

Test build #32658 has finished for PR 6130 at commit cdb0aba.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32658/
Test FAILed.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 14, 2015

Test build #32661 has finished for PR 6130 at commit 8870f5e.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32661/
Test FAILed.

@AmplabJenkins
Copy link

Merged build started.

@yhuai yhuai changed the title [SPARK-7567] [SQL] [follow-up] Add an option to FSBasedRelation to indicate if it supports custom output committer [SPARK-7567] [SQL] [follow-up] Use a new flag to set output committer based on mapreduce apis May 14, 2015
@SparkQA
Copy link

SparkQA commented May 14, 2015

Test build #32668 has started for PR 6130 at commit dc9910d.


Option(committerClass).map { clazz =>
val ctor = clazz.getDeclaredConstructor(classOf[Path], classOf[TaskAttemptContext])
ctor.newInstance(new Path(outputPath), context)
if (classOf[FileOutputCommitter].isAssignableFrom(clazz)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be better to have an explicit import alias MapReduceFileOutputCommitter for this.

@SparkQA
Copy link

SparkQA commented May 14, 2015

Test build #32663 has finished for PR 6130 at commit 84feba4.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32663/
Test FAILed.

@liancheng
Copy link
Contributor

Summary of offline discussion with @yhuai for future reference:

For mapred API, users can customize committer class via mapred.output.commit.class, but customized classes should be subclasses of o.a.h.mapred.OutputCommitter. For mapreduce API, output committers are always retrieved from mapreduce.OutputFormat instances, and there seems to be no genuine way to freely customize committer class. That's why @yhuai had to introduce a new property.

@SparkQA
Copy link

SparkQA commented May 14, 2015

Test build #32668 has finished for PR 6130 at commit dc9910d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32668/
Test PASSed.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 14, 2015

Test build #32675 has started for PR 6130 at commit 4406b7a.

@liancheng
Copy link
Contributor

LGTM pending Jenkins.

@SparkQA
Copy link

SparkQA commented May 14, 2015

Test build #32675 has finished for PR 6130 at commit 4406b7a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32675/
Test PASSed.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 18, 2015

Test build #33001 has started for PR 6130 at commit 312b07d.

@SparkQA
Copy link

SparkQA commented May 18, 2015

Test build #33001 has finished for PR 6130 at commit 312b07d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class DecimalConversion(precisionInfo: Option[(Int, Int)]) extends JDBCConversion

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/33001/
Test PASSed.

asfgit pushed a commit that referenced this pull request May 18, 2015
… based on mapreduce apis

cc liancheng marmbrus

Author: Yin Huai <[email protected]>

Closes #6130 from yhuai/directOutput and squashes the following commits:

312b07d [Yin Huai] A data source can use spark.sql.sources.outputCommitterClass to override the output committer.

(cherry picked from commit 530397b)
Signed-off-by: Michael Armbrust <[email protected]>
@asfgit asfgit closed this in 530397b May 18, 2015
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request May 28, 2015
… based on mapreduce apis

cc liancheng marmbrus

Author: Yin Huai <[email protected]>

Closes apache#6130 from yhuai/directOutput and squashes the following commits:

312b07d [Yin Huai] A data source can use spark.sql.sources.outputCommitterClass to override the output committer.
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request Jun 12, 2015
… based on mapreduce apis

cc liancheng marmbrus

Author: Yin Huai <[email protected]>

Closes apache#6130 from yhuai/directOutput and squashes the following commits:

312b07d [Yin Huai] A data source can use spark.sql.sources.outputCommitterClass to override the output committer.
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
… based on mapreduce apis

cc liancheng marmbrus

Author: Yin Huai <[email protected]>

Closes apache#6130 from yhuai/directOutput and squashes the following commits:

312b07d [Yin Huai] A data source can use spark.sql.sources.outputCommitterClass to override the output committer.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants