-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-24518][CORE] Using Hadoop credential provider API to store password #21548
Conversation
Change-Id: Ie774eeb9376f8b5d7379f1976826e12e9c529be3
Test build #91747 has finished for PR 21548 at commit
|
Jenkins, retest this please. |
Test build #91757 has finished for PR 21548 at commit
|
CC @vanzin @tgravescs would you please help to review, thanks! |
we would definitely want to update the docs on how user would do this. I don't see a test that actually tests reading from the hadoopConf either, so we should add one. I need to look at the hadoop api in more depth to do a full review |
Thanks @tgravescs I will add the docs about how to use it. I was thinking to add a test case, but it looks like may not be easy to add one, I will try to add one. |
Change-Id: I38146ee45a4565295fa6fd297f591c368d6b250a
Test build #92120 has finished for PR 21548 at commit
|
Jenkins, retest this please. |
Test build #92129 has finished for PR 21548 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks OK. I'm not a huge fan of the hadoop conf in the SecurityManager, but that's the easiest way to achieve this...
I just have an issue with your summary. It makes it sound like there's no way to securely configure these things today, and that's not true.
You can have the password in the config file and have it only readable by authorized users, which is basically what the credential provider does.
Or you can have the password in an env variable, and reference the env variable in the Spark config.
This just adds a third way.
@@ -179,9 +185,11 @@ private[spark] object SSLOptions extends Logging { | |||
.orElse(defaults.flatMap(_.keyStore)) | |||
|
|||
val keyStorePassword = conf.getWithSubstitution(s"$ns.keyStorePassword") | |||
.orElse(Option(hadoopConf.getPassword(s"$ns.keyStorePassword")).map(new String(_))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs charset (also in others).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vanzin the return value of hadoopConf#getPassword
is char array, so there's no way to specify the charset here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new String
takes a charset. (In fact the constructor you're calling should be deprecated...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @vanzin , I checked jdk8 doc again, I don't find a String constructor which takes both char array and charset as parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, my bad, that's a char array, not a byte array. All is good then.
docs/security.md
Outdated
-provider jceks://[email protected]:9001/user/backup/ssl.jceks | ||
``` | ||
|
||
In the meantime, adding configuration "hadoop.security.credential.provider.path=jceks://[email protected]:9001/user/backup/ssl.jceks" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rephrase:
"To configure the location of the credential provider, set the hadoop.security.credential.provider.path
config option in the Hadoop configuration used by Spark."
Your example also kinda looks like a Spark config (which would be "spark.hadoop.blah"), since Hadoop configs are generally in XML.
I see, thanks for explanation @vanzin . This might be the problem in our distribution, because we don't do such fine-grained access control of config file, also configuration file is world readable shared between different components, that's why we're seeking to use Hadoop credential provide to secure this thing. |
Test build #92164 has finished for PR 21548 at commit
|
Could you update the summary so that it doesn't sound like this is an existing security issue? |
Merging to master. |
What changes were proposed in this pull request?
In our distribution, because we don't do such fine-grained access control of config file, also configuration file is world readable shared between different components, so password may leak to different users.
Hadoop credential provider API support storing password in a secure way, in which Spark could read it in a secure way, so here propose to add support of using credential provider API to get password.
How was this patch tested?
Adding tests and verified locally.