-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-23572][docs] Bring "security.md" up to date. #20742
Conversation
This change basically rewrites the security documentation so that it's up to date with new features, more correct, and more complete. Because security is such an important feature, I chose to move all the relevant configuration documentation to the security page, instead of having them peppered all over the place in the configuration page. This allows an almost one-stop shop for security configuration in Spark. The only exceptions are some YARN-specific minor features which I left in the YARN page. I also re-organized the page's topics, since they didn't make a lot of sense. You had kerberos features described inside paragraphs talking about UI access control, and other oddities. It should be easier now to find information about specific Spark security features. I also enabled TOCs for both the Security and YARN pages, since that makes it easier to see what is covered. I removed most of the comments from the SecurityManager javadoc since they just replicated information in the security doc, with different levels of out-of-dateness.
Test build #87982 has finished for PR 20742 at commit
|
R/pkg/DESCRIPTION
Outdated
@@ -57,6 +57,6 @@ Collate: | |||
'types.R' | |||
'utils.R' | |||
'window.R' | |||
RoxygenNote: 5.0.1 | |||
RoxygenNote: 6.0.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls revert this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Argh, I thought I had removed this one. (It's really annoying when the build changes your working file set.)
Test build #87987 has finished for PR 20742 at commit
|
Test build #88021 has finished for PR 20742 at commit
|
Any takers for reviewing this? |
No volunteers, so I'll volunteer @squito , @jerryshao , @tgravescs |
@@ -2,6 +2,8 @@ | |||
layout: global | |||
title: Running Spark on YARN | |||
--- | |||
* This will become a table of contents (this text will be scraped). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scrapped
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the same wording as other pages, so I'd rather keep it the same. (And "scrape" is also correct.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe I misread this, is it supposed to be this text will be scraped (meaning looked for in order to fill in TOC) or was it supposed to be scrapped (meaning thrown away)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This text will be replaced with the TOC.
@@ -3,47 +3,290 @@ layout: global | |||
displayTitle: Spark Security | |||
title: Security | |||
--- | |||
* This will become a table of contents (this text will be scraped). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scrapped
|
||
A user may want to secure the UI if it has data that other users should not be allowed to see. The javax servlet filter specified by the user can authenticate the user and then once the user is logged in, Spark can compare that user versus the view ACLs to make sure they are authorized to view the UI. The configs `spark.acls.enable`, `spark.ui.view.acls` and `spark.ui.view.acls.groups` control the behavior of the ACLs. Note that the user who started the application always has view access to the UI. On YARN, the Spark UI uses the standard YARN web application proxy mechanism and will authenticate via any installed Hadoop filters. | ||
For other resource managers, `spark.authenticate.secret` must be configured on each of the nodes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you removed the text "All the nodes (Master and Workers) and the applications need to have the same shared secret. This again is not ideal as one user could potentially affect another users application. This should be enhanced in the future to provide better protection."
Personally I would like to see a warning stay in there as I don't consider this really secure for multi-tenant environment
docs/security.md
Outdated
`spark.files` configuration). The files will be placed on the driver's working directory, so the TLS | ||
configuration should just reference the file name with no absolute path. | ||
|
||
When distributing local key stores this way, make sure to configure the underlying distributed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may also add warning that these are distributed via underlying filesystem (--files) so make sure its trusted/secure.
docs/security.md
Outdated
using the `--principal` and `--keytab` parameters. | ||
|
||
The provided keytab will be copied over to the machine running the Application Master via the Hadoop | ||
Distributed Cache. For this reason, it's strongly recommended that both YARN and HDFS encryption are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
secure yarn and hdfs with encryption. Some companies don't consider this secure enough. Not sure if it makes sense for us to say anything more though.
|
||
Spark supports automatically creating new tokens for these applications when running in YARN mode. | ||
Kerberos credentials need to be provided to the Spark application via the `spark-submit` command, | ||
using the `--principal` and `--keytab` parameters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we link the yarn page spark.yarn.keytab and spark.yarn.principal to this section as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the yarn/kerberos configs to a new separate table in the YARN page. That should make them more visible.
Test build #88184 has finished for PR 20742 at commit
|
+1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mostly fine, a couple typos, and perhaps a real doc bug on spark.network.crypto.saslFallback
docs/security.md
Outdated
[Apache Commons Crypto](http://commons.apache.org/proper/commons-crypto/) library, and Spark's | ||
configuration system allows access to that library's configuration for advanced users. | ||
|
||
There is also support for SASL-based encryption, although it should be considered deprectated. It |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: deprecated
docs/security.md
Outdated
<td> | ||
Whether to fall back to SASL authentication if authentication fails using Spark's internal | ||
mechanism. This is useful when the application is connecting to old shuffle services that | ||
do not support the internal Spark authentication protocol. On the shuffle service side, enabling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this part right -- enabling the option (not disabling) makes the shuffle service refuse to fallback to sasl? from a quick scan I expected the opposite:
docs/security.md
Outdated
configuration has Kerberos authentication turned (`hbase.security.authentication=kerberos`). | ||
|
||
Similarly, a Hive token will be obtained if Hive is in the classpath, and the configuration includes | ||
a URIs for remote metastore services (`hive.metastore.uris` is not empty). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: either "a URI" or "URIs"
lgtm |
Test build #88324 has finished for PR 20742 at commit
|
retest this please |
Test build #88358 has finished for PR 20742 at commit
|
No more comments, so merging to master. |
This change basically rewrites the security documentation so that it's up to date with new features, more correct, and more complete. Because security is such an important feature, I chose to move all the relevant configuration documentation to the security page, instead of having them peppered all over the place in the configuration page. This allows an almost one-stop shop for security configuration in Spark. The only exceptions are some YARN-specific minor features which I left in the YARN page. I also re-organized the page's topics, since they didn't make a lot of sense. You had kerberos features described inside paragraphs talking about UI access control, and other oddities. It should be easier now to find information about specific Spark security features. I also enabled TOCs for both the Security and YARN pages, since that makes it easier to see what is covered. I removed most of the comments from the SecurityManager javadoc since they just replicated information in the security doc, with different levels of out-of-dateness. Author: Marcelo Vanzin <[email protected]> Closes apache#20742 from vanzin/SPARK-23572.
This change basically rewrites the security documentation so that it's
up to date with new features, more correct, and more complete.
Because security is such an important feature, I chose to move all the
relevant configuration documentation to the security page, instead of
having them peppered all over the place in the configuration page. This
allows an almost one-stop shop for security configuration in Spark. The
only exceptions are some YARN-specific minor features which I left in
the YARN page.
I also re-organized the page's topics, since they didn't make a lot of
sense. You had kerberos features described inside paragraphs talking
about UI access control, and other oddities. It should be easier now
to find information about specific Spark security features. I also
enabled TOCs for both the Security and YARN pages, since that makes it
easier to see what is covered.
I removed most of the comments from the SecurityManager javadoc since
they just replicated information in the security doc, with different
levels of out-of-dateness.