-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-7567] [SQL] [follow-up] Use a new flag to set output committer based on mapreduce apis #6130
Conversation
@@ -294,17 +294,16 @@ private[sql] abstract class BaseWriterContainer( | |||
|
|||
private def newOutputCommitter(context: TaskAttemptContext): OutputCommitter = { | |||
val committerClass = context.getConfiguration.getClass( | |||
"mapred.output.committer.class", null, classOf[OutputCommitter]) | |||
"mapreduce.output.committer.class", null, classOf[OutputCommitter]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we are using APIs in mapreduce package, we cannot use mapred.output.committer.class
. I am just creating another conf flag called mapreduce.output.committer.class
(it is not defined in hadoop).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is a Spark SQL only property, how about renaming it to spark.sql.mapreduce.outputCommitterClass
? Current name looks like a genuine Hadoop property name, which is not true.
Merged build triggered. |
Merged build started. |
Test build #32647 has started for PR 6130 at commit |
Test build #32647 has finished for PR 6130 at commit
|
Merged build finished. Test PASSed. |
Test PASSed. |
Merged build triggered. |
Merged build started. |
Test build #32656 has started for PR 6130 at commit |
Merged build triggered. |
Merged build started. |
Test build #32658 has started for PR 6130 at commit |
Test build #32656 has finished for PR 6130 at commit
|
Merged build finished. Test FAILed. |
Test FAILed. |
Merged build triggered. |
Merged build started. |
Test build #32661 has started for PR 6130 at commit |
Test build #32658 has finished for PR 6130 at commit
|
Merged build finished. Test FAILed. |
Test FAILed. |
Merged build triggered. |
Merged build started. |
Test build #32661 has finished for PR 6130 at commit
|
Merged build finished. Test FAILed. |
Test FAILed. |
Merged build started. |
Test build #32668 has started for PR 6130 at commit |
|
||
Option(committerClass).map { clazz => | ||
val ctor = clazz.getDeclaredConstructor(classOf[Path], classOf[TaskAttemptContext]) | ||
ctor.newInstance(new Path(outputPath), context) | ||
if (classOf[FileOutputCommitter].isAssignableFrom(clazz)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be better to have an explicit import alias MapReduceFileOutputCommitter
for this.
Test build #32663 has finished for PR 6130 at commit
|
Merged build finished. Test FAILed. |
Test FAILed. |
Summary of offline discussion with @yhuai for future reference: For |
Test build #32668 has finished for PR 6130 at commit
|
Merged build finished. Test PASSed. |
Test PASSed. |
Merged build triggered. |
Merged build started. |
Test build #32675 has started for PR 6130 at commit |
LGTM pending Jenkins. |
Test build #32675 has finished for PR 6130 at commit
|
Merged build finished. Test PASSed. |
Test PASSed. |
…ide the output committer.
Merged build triggered. |
Merged build started. |
Test build #33001 has started for PR 6130 at commit |
Test build #33001 has finished for PR 6130 at commit
|
Merged build finished. Test PASSed. |
Test PASSed. |
… based on mapreduce apis cc liancheng marmbrus Author: Yin Huai <[email protected]> Closes #6130 from yhuai/directOutput and squashes the following commits: 312b07d [Yin Huai] A data source can use spark.sql.sources.outputCommitterClass to override the output committer. (cherry picked from commit 530397b) Signed-off-by: Michael Armbrust <[email protected]>
… based on mapreduce apis cc liancheng marmbrus Author: Yin Huai <[email protected]> Closes apache#6130 from yhuai/directOutput and squashes the following commits: 312b07d [Yin Huai] A data source can use spark.sql.sources.outputCommitterClass to override the output committer.
… based on mapreduce apis cc liancheng marmbrus Author: Yin Huai <[email protected]> Closes apache#6130 from yhuai/directOutput and squashes the following commits: 312b07d [Yin Huai] A data source can use spark.sql.sources.outputCommitterClass to override the output committer.
… based on mapreduce apis cc liancheng marmbrus Author: Yin Huai <[email protected]> Closes apache#6130 from yhuai/directOutput and squashes the following commits: 312b07d [Yin Huai] A data source can use spark.sql.sources.outputCommitterClass to override the output committer.
cc @liancheng @marmbrus