[SPARK-7567] [SQL] [follow-up] Use a new flag to set output committer based on mapreduce apis #6130

yhuai · 2015-05-13T20:38:07Z

yhuai · 2015-05-13T20:39:08Z

sql/core/src/main/scala/org/apache/spark/sql/sources/commands.scala

@@ -294,17 +294,16 @@ private[sql] abstract class BaseWriterContainer(

  private def newOutputCommitter(context: TaskAttemptContext): OutputCommitter = {
    val committerClass = context.getConfiguration.getClass(
-      "mapred.output.committer.class", null, classOf[OutputCommitter])
+      "mapreduce.output.committer.class", null, classOf[OutputCommitter])


Because we are using APIs in mapreduce package, we cannot use mapred.output.committer.class. I am just creating another conf flag called mapreduce.output.committer.class (it is not defined in hadoop).

Since this is a Spark SQL only property, how about renaming it to spark.sql.mapreduce.outputCommitterClass? Current name looks like a genuine Hadoop property name, which is not true.

AmplabJenkins · 2015-05-13T20:42:10Z

Merged build triggered.

AmplabJenkins · 2015-05-13T20:42:19Z

Merged build started.

SparkQA · 2015-05-13T20:43:22Z

Test build #32647 has started for PR 6130 at commit e8254b0.

SparkQA · 2015-05-13T22:31:26Z

Test build #32647 has finished for PR 6130 at commit e8254b0.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-05-13T22:31:31Z

Merged build finished. Test PASSed.

AmplabJenkins · 2015-05-13T22:31:31Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32647/
Test PASSed.

AmplabJenkins · 2015-05-13T23:52:09Z

Merged build triggered.

AmplabJenkins · 2015-05-13T23:52:16Z

Merged build started.

SparkQA · 2015-05-13T23:54:07Z

Test build #32656 has started for PR 6130 at commit e0cb523.

AmplabJenkins · 2015-05-14T00:02:10Z

Merged build triggered.

AmplabJenkins · 2015-05-14T00:02:15Z

Merged build started.

SparkQA · 2015-05-14T00:04:05Z

Test build #32658 has started for PR 6130 at commit cdb0aba.

SparkQA · 2015-05-14T00:14:21Z

Test build #32656 has finished for PR 6130 at commit e0cb523.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-05-14T00:14:24Z

Merged build finished. Test FAILed.

AmplabJenkins · 2015-05-14T00:14:25Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32656/
Test FAILed.

AmplabJenkins · 2015-05-14T00:22:10Z

Merged build triggered.

AmplabJenkins · 2015-05-14T00:22:16Z

Merged build started.

SparkQA · 2015-05-14T00:23:01Z

Test build #32661 has started for PR 6130 at commit 8870f5e.

SparkQA · 2015-05-14T00:30:50Z

Test build #32658 has finished for PR 6130 at commit cdb0aba.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-05-14T00:30:54Z

Merged build finished. Test FAILed.

AmplabJenkins · 2015-05-14T00:30:54Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32658/
Test FAILed.

AmplabJenkins · 2015-05-14T00:32:11Z

Merged build triggered.

AmplabJenkins · 2015-05-14T00:32:21Z

Merged build started.

SparkQA · 2015-05-14T00:33:00Z

Test build #32661 has finished for PR 6130 at commit 8870f5e.

This patch fails MiMa tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-05-14T00:33:03Z

Merged build finished. Test FAILed.

AmplabJenkins · 2015-05-14T00:33:04Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32661/
Test FAILed.

AmplabJenkins · 2015-05-14T00:57:16Z

Merged build started.

SparkQA · 2015-05-14T00:59:14Z

Test build #32668 has started for PR 6130 at commit dc9910d.

liancheng · 2015-05-14T01:42:44Z

sql/core/src/main/scala/org/apache/spark/sql/sources/commands.scala


    Option(committerClass).map { clazz =>
-      val ctor = clazz.getDeclaredConstructor(classOf[Path], classOf[TaskAttemptContext])
-      ctor.newInstance(new Path(outputPath), context)
+      if (classOf[FileOutputCommitter].isAssignableFrom(clazz)) {


Would be better to have an explicit import alias MapReduceFileOutputCommitter for this.

SparkQA · 2015-05-14T02:19:15Z

Test build #32663 has finished for PR 6130 at commit 84feba4.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-05-14T02:19:20Z

Merged build finished. Test FAILed.

AmplabJenkins · 2015-05-14T02:19:21Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32663/
Test FAILed.

liancheng · 2015-05-14T02:21:04Z

Summary of offline discussion with @yhuai for future reference:

For mapred API, users can customize committer class via mapred.output.commit.class, but customized classes should be subclasses of o.a.h.mapred.OutputCommitter. For mapreduce API, output committers are always retrieved from mapreduce.OutputFormat instances, and there seems to be no genuine way to freely customize committer class. That's why @yhuai had to introduce a new property.

SparkQA · 2015-05-14T02:58:47Z

Test build #32668 has finished for PR 6130 at commit dc9910d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-05-14T02:58:51Z

Merged build finished. Test PASSed.

AmplabJenkins · 2015-05-14T02:58:51Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32668/
Test PASSed.

AmplabJenkins · 2015-05-14T03:47:09Z

Merged build triggered.

AmplabJenkins · 2015-05-14T03:47:17Z

Merged build started.

SparkQA · 2015-05-14T03:49:03Z

Test build #32675 has started for PR 6130 at commit 4406b7a.

liancheng · 2015-05-14T03:54:04Z

LGTM pending Jenkins.

SparkQA · 2015-05-14T05:42:23Z

Test build #32675 has finished for PR 6130 at commit 4406b7a.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-05-14T05:42:27Z

Merged build finished. Test PASSed.

AmplabJenkins · 2015-05-14T05:42:27Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32675/
Test PASSed.

…ide the output committer.

AmplabJenkins · 2015-05-18T17:22:11Z

Merged build triggered.

AmplabJenkins · 2015-05-18T17:22:20Z

Merged build started.

SparkQA · 2015-05-18T17:23:06Z

Test build #33001 has started for PR 6130 at commit 312b07d.

SparkQA · 2015-05-18T19:15:07Z

Test build #33001 has finished for PR 6130 at commit 312b07d.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class DecimalConversion(precisionInfo: Option[(Int, Int)]) extends JDBCConversion

AmplabJenkins · 2015-05-18T19:15:11Z

Merged build finished. Test PASSed.

AmplabJenkins · 2015-05-18T19:15:11Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/33001/
Test PASSed.

… based on mapreduce apis cc liancheng marmbrus Author: Yin Huai <[email protected]> Closes #6130 from yhuai/directOutput and squashes the following commits: 312b07d [Yin Huai] A data source can use spark.sql.sources.outputCommitterClass to override the output committer. (cherry picked from commit 530397b) Signed-off-by: Michael Armbrust <[email protected]>

… based on mapreduce apis cc liancheng marmbrus Author: Yin Huai <[email protected]> Closes apache#6130 from yhuai/directOutput and squashes the following commits: 312b07d [Yin Huai] A data source can use spark.sql.sources.outputCommitterClass to override the output committer.

yhuai reviewed May 13, 2015
View reviewed changes

yhuai changed the title ~~[SPARK-7567] [SQL] [follow-up] Add an option to FSBasedRelation to indicate if it supports writing data to S3 directly~~ [SPARK-7567] [SQL] [follow-up] Add an option to FSBasedRelation to indicate if it supports custom output committer May 13, 2015

yhuai closed this May 13, 2015

yhuai reopened this May 13, 2015

yhuai changed the title ~~[SPARK-7567] [SQL] [follow-up] Add an option to FSBasedRelation to indicate if it supports custom output committer~~ [SPARK-7567] [SQL] [follow-up] Use a new flag to set output committer based on mapreduce apis May 14, 2015

liancheng reviewed May 14, 2015
View reviewed changes

A data source can use spark.sql.sources.outputCommitterClass to overr…

312b07d

…ide the output committer.

asfgit closed this in 530397b May 18, 2015

[SPARK-7567] [SQL] [follow-up] Use a new flag to set output committer based on mapreduce apis #6130

[SPARK-7567] [SQL] [follow-up] Use a new flag to set output committer based on mapreduce apis #6130

Conversation

yhuai commented May 13, 2015

yhuai May 13, 2015

Choose a reason for hiding this comment

liancheng May 14, 2015

Choose a reason for hiding this comment

AmplabJenkins commented May 13, 2015

AmplabJenkins commented May 13, 2015

SparkQA commented May 13, 2015

SparkQA commented May 13, 2015

AmplabJenkins commented May 13, 2015

AmplabJenkins commented May 13, 2015

AmplabJenkins commented May 13, 2015

AmplabJenkins commented May 13, 2015

SparkQA commented May 13, 2015

AmplabJenkins commented May 14, 2015

AmplabJenkins commented May 14, 2015

SparkQA commented May 14, 2015

SparkQA commented May 14, 2015

AmplabJenkins commented May 14, 2015

AmplabJenkins commented May 14, 2015

AmplabJenkins commented May 14, 2015

AmplabJenkins commented May 14, 2015

SparkQA commented May 14, 2015

SparkQA commented May 14, 2015

AmplabJenkins commented May 14, 2015

AmplabJenkins commented May 14, 2015

AmplabJenkins commented May 14, 2015

AmplabJenkins commented May 14, 2015

SparkQA commented May 14, 2015

AmplabJenkins commented May 14, 2015

AmplabJenkins commented May 14, 2015

AmplabJenkins commented May 14, 2015

SparkQA commented May 14, 2015

liancheng May 14, 2015

Choose a reason for hiding this comment

SparkQA commented May 14, 2015

AmplabJenkins commented May 14, 2015

AmplabJenkins commented May 14, 2015

liancheng commented May 14, 2015

SparkQA commented May 14, 2015

AmplabJenkins commented May 14, 2015

AmplabJenkins commented May 14, 2015

AmplabJenkins commented May 14, 2015

AmplabJenkins commented May 14, 2015

SparkQA commented May 14, 2015

liancheng commented May 14, 2015

SparkQA commented May 14, 2015

AmplabJenkins commented May 14, 2015

AmplabJenkins commented May 14, 2015

AmplabJenkins commented May 18, 2015

AmplabJenkins commented May 18, 2015

SparkQA commented May 18, 2015

SparkQA commented May 18, 2015

AmplabJenkins commented May 18, 2015

AmplabJenkins commented May 18, 2015