-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-14056] Appends s3 specific configurations and spark.hadoop con… #11876
Conversation
|
||
/** | ||
* Appends s3 specific configurations and spark.hadoop configurations to hadoop |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nits: "S3-specific", "a Hadoop". It doesn't need to return a Configuration
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
|
||
/** | ||
* Appends S3-specific configurations and spark.hadoop.* configurations to a Hadoop |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe also mention the buffer keys that are also added that don't fit into either of those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! changed it accordingly.
|
||
/** | ||
* Appends S3-specific, spark.hadoop.*, and spark.spark.buffer.size configurations to a Hadoop |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra "spark." here in "spark.spark.buffer.size".
This might be a dumb question, but it feels kind of funny that we set these things when a Configuration
is created everywhere except one place. Is it because the HiveConf
necessarily comes from somewhere else that Spark isn't initializing? Just trying to rationalize treating it specially in this one case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, removed extra "spark." .
You are right, HiveConf
is being initialized in a separate code path which Spark isn't initializing properly. I am not very familiar with hive side of things to comment on why it was done that way. But the TODO
in TableReader.scala
suggests that it is the right place to initialize the HiveConf
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a very old TODO, but if I remember correctly, it was a result of me omitting code as I copied logic from Shark. It would be good to understand what the job that was tested on a cluster is doing, and why it needs info in the hive conf. Just because long term we are moving further and further away from using hive code.
No objections though if this is fixing something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marmbrus - the job run a simple hive query to create a table. From what I understand from the code is the hiveConf is being initialized which is does not include spark.hadoo.* configurations and that hiveConf is being used to initialized the HadoopRDD. So the HadoopRDD does not contain any spark.hadoop.* configurations. This fix is meant to resolve that issue.
Jenkins test this please |
Test build #54447 has finished for PR 11876 at commit
|
Jenkins retest this please |
Test build #54515 has finished for PR 11876 at commit
|
Thanks @srowen, not sure why the test failed, will take a look. |
…figurations to hive configuration.
@srowen - Thanks for taking a look, updated the diff to fix the test case. |
Jenkins retest this please |
Test build #54784 has finished for PR 11876 at commit
|
Merged to master |
What changes were proposed in this pull request?
Appends s3 specific configurations and spark.hadoop configurations to hive configuration.
How was this patch tested?
Tested by running a job on cluster.
…figurations to hive configuration.