Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-23572][docs] Bring "security.md" up to date. #20742

Closed
wants to merge 6 commits into from

Conversation

vanzin
Copy link
Contributor

@vanzin vanzin commented Mar 5, 2018

This change basically rewrites the security documentation so that it's
up to date with new features, more correct, and more complete.

Because security is such an important feature, I chose to move all the
relevant configuration documentation to the security page, instead of
having them peppered all over the place in the configuration page. This
allows an almost one-stop shop for security configuration in Spark. The
only exceptions are some YARN-specific minor features which I left in
the YARN page.

I also re-organized the page's topics, since they didn't make a lot of
sense. You had kerberos features described inside paragraphs talking
about UI access control, and other oddities. It should be easier now
to find information about specific Spark security features. I also
enabled TOCs for both the Security and YARN pages, since that makes it
easier to see what is covered.

I removed most of the comments from the SecurityManager javadoc since
they just replicated information in the security doc, with different
levels of out-of-dateness.

Marcelo Vanzin added 2 commits March 5, 2018 15:06
This change basically rewrites the security documentation so that it's
up to date with new features, more correct, and more complete.

Because security is such an important feature, I chose to move all the
relevant configuration documentation to the security page, instead of
having them peppered all over the place in the configuration page. This
allows an almost one-stop shop for security configuration in Spark. The
only exceptions are some YARN-specific minor features which I left in
the YARN page.

I also re-organized the page's topics, since they didn't make a lot of
sense. You had kerberos features described inside paragraphs talking
about UI access control, and other oddities. It should be easier now
to find information about specific Spark security features. I also
enabled TOCs for both the Security and YARN pages, since that makes it
easier to see what is covered.

I removed most of the comments from the SecurityManager javadoc since
they just replicated information in the security doc, with different
levels of out-of-dateness.
@SparkQA
Copy link

SparkQA commented Mar 6, 2018

Test build #87982 has finished for PR 20742 at commit ec76dd4.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -57,6 +57,6 @@ Collate:
'types.R'
'utils.R'
'window.R'
RoxygenNote: 5.0.1
RoxygenNote: 6.0.1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls revert this

Copy link
Contributor Author

@vanzin vanzin Mar 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argh, I thought I had removed this one. (It's really annoying when the build changes your working file set.)

@SparkQA
Copy link

SparkQA commented Mar 6, 2018

Test build #87987 has finished for PR 20742 at commit c867373.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 6, 2018

Test build #88021 has finished for PR 20742 at commit 63abd8a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor Author

vanzin commented Mar 8, 2018

Any takers for reviewing this?

@vanzin
Copy link
Contributor Author

vanzin commented Mar 9, 2018

No volunteers, so I'll volunteer @squito , @jerryshao , @tgravescs

@@ -2,6 +2,8 @@
layout: global
title: Running Spark on YARN
---
* This will become a table of contents (this text will be scraped).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scrapped

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same wording as other pages, so I'd rather keep it the same. (And "scrape" is also correct.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe I misread this, is it supposed to be this text will be scraped (meaning looked for in order to fill in TOC) or was it supposed to be scrapped (meaning thrown away)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This text will be replaced with the TOC.

@@ -3,47 +3,290 @@ layout: global
displayTitle: Spark Security
title: Security
---
* This will become a table of contents (this text will be scraped).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scrapped


A user may want to secure the UI if it has data that other users should not be allowed to see. The javax servlet filter specified by the user can authenticate the user and then once the user is logged in, Spark can compare that user versus the view ACLs to make sure they are authorized to view the UI. The configs `spark.acls.enable`, `spark.ui.view.acls` and `spark.ui.view.acls.groups` control the behavior of the ACLs. Note that the user who started the application always has view access to the UI. On YARN, the Spark UI uses the standard YARN web application proxy mechanism and will authenticate via any installed Hadoop filters.
For other resource managers, `spark.authenticate.secret` must be configured on each of the nodes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you removed the text "All the nodes (Master and Workers) and the applications need to have the same shared secret. This again is not ideal as one user could potentially affect another users application. This should be enhanced in the future to provide better protection."

Personally I would like to see a warning stay in there as I don't consider this really secure for multi-tenant environment

docs/security.md Outdated
`spark.files` configuration). The files will be placed on the driver's working directory, so the TLS
configuration should just reference the file name with no absolute path.

When distributing local key stores this way, make sure to configure the underlying distributed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may also add warning that these are distributed via underlying filesystem (--files) so make sure its trusted/secure.

docs/security.md Outdated
using the `--principal` and `--keytab` parameters.

The provided keytab will be copied over to the machine running the Application Master via the Hadoop
Distributed Cache. For this reason, it's strongly recommended that both YARN and HDFS encryption are
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

secure yarn and hdfs with encryption. Some companies don't consider this secure enough. Not sure if it makes sense for us to say anything more though.


Spark supports automatically creating new tokens for these applications when running in YARN mode.
Kerberos credentials need to be provided to the Spark application via the `spark-submit` command,
using the `--principal` and `--keytab` parameters.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we link the yarn page spark.yarn.keytab and spark.yarn.principal to this section as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the yarn/kerberos configs to a new separate table in the YARN page. That should make them more visible.

@SparkQA
Copy link

SparkQA commented Mar 13, 2018

Test build #88184 has finished for PR 20742 at commit 832d871.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tgravescs
Copy link
Contributor

+1

Copy link
Contributor

@squito squito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly fine, a couple typos, and perhaps a real doc bug on spark.network.crypto.saslFallback

docs/security.md Outdated
[Apache Commons Crypto](http://commons.apache.org/proper/commons-crypto/) library, and Spark's
configuration system allows access to that library's configuration for advanced users.

There is also support for SASL-based encryption, although it should be considered deprectated. It
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: deprecated

docs/security.md Outdated
<td>
Whether to fall back to SASL authentication if authentication fails using Spark's internal
mechanism. This is useful when the application is connecting to old shuffle services that
do not support the internal Spark authentication protocol. On the shuffle service side, enabling
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this part right -- enabling the option (not disabling) makes the shuffle service refuse to fallback to sasl? from a quick scan I expected the opposite:

https://github.com/apache/spark/blob/master/common/network-common/src/main/java/org/apache/spark/network/crypto/AuthRpcHandler.java#L97

docs/security.md Outdated
configuration has Kerberos authentication turned (`hbase.security.authentication=kerberos`).

Similarly, a Hive token will be obtained if Hive is in the classpath, and the configuration includes
a URIs for remote metastore services (`hive.metastore.uris` is not empty).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: either "a URI" or "URIs"

@squito
Copy link
Contributor

squito commented Mar 16, 2018

lgtm

@SparkQA
Copy link

SparkQA commented Mar 16, 2018

Test build #88324 has finished for PR 20742 at commit 53c1710.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Member

kiszk commented Mar 18, 2018

retest this please

@SparkQA
Copy link

SparkQA commented Mar 18, 2018

Test build #88358 has finished for PR 20742 at commit 53c1710.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor Author

vanzin commented Mar 26, 2018

No more comments, so merging to master.

@asfgit asfgit closed this in b30a7d2 Mar 26, 2018
mshtelma pushed a commit to mshtelma/spark that referenced this pull request Apr 5, 2018
This change basically rewrites the security documentation so that it's
up to date with new features, more correct, and more complete.

Because security is such an important feature, I chose to move all the
relevant configuration documentation to the security page, instead of
having them peppered all over the place in the configuration page. This
allows an almost one-stop shop for security configuration in Spark. The
only exceptions are some YARN-specific minor features which I left in
the YARN page.

I also re-organized the page's topics, since they didn't make a lot of
sense. You had kerberos features described inside paragraphs talking
about UI access control, and other oddities. It should be easier now
to find information about specific Spark security features. I also
enabled TOCs for both the Security and YARN pages, since that makes it
easier to see what is covered.

I removed most of the comments from the SecurityManager javadoc since
they just replicated information in the security doc, with different
levels of out-of-dateness.

Author: Marcelo Vanzin <[email protected]>

Closes apache#20742 from vanzin/SPARK-23572.
@vanzin vanzin deleted the SPARK-23572 branch April 9, 2018 21:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants