diff --git a/.gitignore b/.gitignore index 39085904e324c..e4c44d0590d59 100644 --- a/.gitignore +++ b/.gitignore @@ -76,6 +76,7 @@ streaming-tests.log target/ unit-tests.log work/ +docs/.jekyll-metadata # For Hive TempStatsStore/ diff --git a/core/src/main/scala/org/apache/spark/SecurityManager.scala b/core/src/main/scala/org/apache/spark/SecurityManager.scala index da1c89cd78901..09ec8932353a0 100644 --- a/core/src/main/scala/org/apache/spark/SecurityManager.scala +++ b/core/src/main/scala/org/apache/spark/SecurityManager.scala @@ -42,148 +42,10 @@ import org.apache.spark.util.Utils * should access it from that. There are some cases where the SparkEnv hasn't been * initialized yet and this class must be instantiated directly. * - * Spark currently supports authentication via a shared secret. - * Authentication can be configured to be on via the 'spark.authenticate' configuration - * parameter. This parameter controls whether the Spark communication protocols do - * authentication using the shared secret. This authentication is a basic handshake to - * make sure both sides have the same shared secret and are allowed to communicate. - * If the shared secret is not identical they will not be allowed to communicate. - * - * The Spark UI can also be secured by using javax servlet filters. A user may want to - * secure the UI if it has data that other users should not be allowed to see. The javax - * servlet filter specified by the user can authenticate the user and then once the user - * is logged in, Spark can compare that user versus the view acls to make sure they are - * authorized to view the UI. The configs 'spark.acls.enable', 'spark.ui.view.acls' and - * 'spark.ui.view.acls.groups' control the behavior of the acls. Note that the person who - * started the application always has view access to the UI. - * - * Spark has a set of individual and group modify acls (`spark.modify.acls`) and - * (`spark.modify.acls.groups`) that controls which users and groups have permission to - * modify a single application. This would include things like killing the application. - * By default the person who started the application has modify access. For modify access - * through the UI, you must have a filter that does authentication in place for the modify - * acls to work properly. - * - * Spark also has a set of individual and group admin acls (`spark.admin.acls`) and - * (`spark.admin.acls.groups`) which is a set of users/administrators and admin groups - * who always have permission to view or modify the Spark application. - * - * Starting from version 1.3, Spark has partial support for encrypted connections with SSL. - * - * At this point spark has multiple communication protocols that need to be secured and - * different underlying mechanisms are used depending on the protocol: - * - * - HTTP for broadcast and file server (via HttpServer) -> Spark currently uses Jetty - * for the HttpServer. Jetty supports multiple authentication mechanisms - - * Basic, Digest, Form, Spnego, etc. It also supports multiple different login - * services - Hash, JAAS, Spnego, JDBC, etc. Spark currently uses the HashLoginService - * to authenticate using DIGEST-MD5 via a single user and the shared secret. - * Since we are using DIGEST-MD5, the shared secret is not passed on the wire - * in plaintext. - * - * We currently support SSL (https) for this communication protocol (see the details - * below). - * - * The Spark HttpServer installs the HashLoginServer and configures it to DIGEST-MD5. - * Any clients must specify the user and password. There is a default - * Authenticator installed in the SecurityManager to how it does the authentication - * and in this case gets the user name and password from the request. - * - * - BlockTransferService -> The Spark BlockTransferServices uses java nio to asynchronously - * exchange messages. For this we use the Java SASL - * (Simple Authentication and Security Layer) API and again use DIGEST-MD5 - * as the authentication mechanism. This means the shared secret is not passed - * over the wire in plaintext. - * Note that SASL is pluggable as to what mechanism it uses. We currently use - * DIGEST-MD5 but this could be changed to use Kerberos or other in the future. - * Spark currently supports "auth" for the quality of protection, which means - * the connection does not support integrity or privacy protection (encryption) - * after authentication. SASL also supports "auth-int" and "auth-conf" which - * SPARK could support in the future to allow the user to specify the quality - * of protection they want. If we support those, the messages will also have to - * be wrapped and unwrapped via the SaslServer/SaslClient.wrap/unwrap API's. - * - * Since the NioBlockTransferService does asynchronous messages passing, the SASL - * authentication is a bit more complex. A ConnectionManager can be both a client - * and a Server, so for a particular connection it has to determine what to do. - * A ConnectionId was added to be able to track connections and is used to - * match up incoming messages with connections waiting for authentication. - * The ConnectionManager tracks all the sendingConnections using the ConnectionId, - * waits for the response from the server, and does the handshake before sending - * the real message. - * - * The NettyBlockTransferService ensures that SASL authentication is performed - * synchronously prior to any other communication on a connection. This is done in - * SaslClientBootstrap on the client side and SaslRpcHandler on the server side. - * - * - HTTP for the Spark UI -> the UI was changed to use servlets so that javax servlet filters - * can be used. Yarn requires a specific AmIpFilter be installed for security to work - * properly. For non-Yarn deployments, users can write a filter to go through their - * organization's normal login service. If an authentication filter is in place then the - * SparkUI can be configured to check the logged in user against the list of users who - * have view acls to see if that user is authorized. - * The filters can also be used for many different purposes. For instance filters - * could be used for logging, encryption, or compression. - * - * The exact mechanisms used to generate/distribute the shared secret are deployment-specific. - * - * For YARN deployments, the secret is automatically generated. The secret is placed in the Hadoop - * UGI which gets passed around via the Hadoop RPC mechanism. Hadoop RPC can be configured to - * support different levels of protection. See the Hadoop documentation for more details. Each - * Spark application on YARN gets a different shared secret. - * - * On YARN, the Spark UI gets configured to use the Hadoop YARN AmIpFilter which requires the user - * to go through the ResourceManager Proxy. That proxy is there to reduce the possibility of web - * based attacks through YARN. Hadoop can be configured to use filters to do authentication. That - * authentication then happens via the ResourceManager Proxy and Spark will use that to do - * authorization against the view acls. - * - * For other Spark deployments, the shared secret must be specified via the - * spark.authenticate.secret config. - * All the nodes (Master and Workers) and the applications need to have the same shared secret. - * This again is not ideal as one user could potentially affect another users application. - * This should be enhanced in the future to provide better protection. - * If the UI needs to be secure, the user needs to install a javax servlet filter to do the - * authentication. Spark will then use that user to compare against the view acls to do - * authorization. If not filter is in place the user is generally null and no authorization - * can take place. - * - * When authentication is being used, encryption can also be enabled by setting the option - * spark.authenticate.enableSaslEncryption to true. This is only supported by communication - * channels that use the network-common library, and can be used as an alternative to SSL in those - * cases. - * - * SSL can be used for encryption for certain communication channels. The user can configure the - * default SSL settings which will be used for all the supported communication protocols unless - * they are overwritten by protocol specific settings. This way the user can easily provide the - * common settings for all the protocols without disabling the ability to configure each one - * individually. - * - * All the SSL settings like `spark.ssl.xxx` where `xxx` is a particular configuration property, - * denote the global configuration for all the supported protocols. In order to override the global - * configuration for the particular protocol, the properties must be overwritten in the - * protocol-specific namespace. Use `spark.ssl.yyy.xxx` settings to overwrite the global - * configuration for particular protocol denoted by `yyy`. Currently `yyy` can be only`fs` for - * broadcast and file server. - * - * Refer to [[org.apache.spark.SSLOptions]] documentation for the list of - * options that can be specified. - * - * SecurityManager initializes SSLOptions objects for different protocols separately. SSLOptions - * object parses Spark configuration at a given namespace and builds the common representation - * of SSL settings. SSLOptions is then used to provide protocol-specific SSLContextFactory for - * Jetty. - * - * SSL must be configured on each node and configured for each component involved in - * communication using the particular protocol. In YARN clusters, the key-store can be prepared on - * the client side then distributed and used by the executors as the part of the application - * (YARN allows the user to deploy files before the application is started). - * In standalone deployment, the user needs to provide key-stores and configuration - * options for master and workers. In this mode, the user may allow the executors to use the SSL - * settings inherited from the worker which spawned that executor. It can be accomplished by - * setting `spark.ssl.useNodeLocalConf` to `true`. + * This class implements all of the configuration related to security features described + * in the "Security" document. Please refer to that document for specific features implemented + * here. */ - private[spark] class SecurityManager( sparkConf: SparkConf, val ioEncryptionKey: Option[Array[Byte]] = None) diff --git a/docs/configuration.md b/docs/configuration.md index e7f2419cc2fa4..2eb6a77434ea6 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -712,30 +712,6 @@ Apart from these, the following properties are also available, and may be useful When we fail to register to the external shuffle service, we will retry for maxAttempts times. -
spark.io.encryption.enabled
spark.io.encryption.keySizeBits
spark.io.encryption.keygen.algorithm
spark.ui.filters
spark.<class name of filter>.param.<param name>=<value>
+
+ spark.ui.filters=com.test.filter1
+ spark.com.test.filter1.param.name1=foo
+ spark.com.test.filter1.param.name2=bar
+ spark.core.connection.ack.wait.timeout
spark.network.timeout
Property Name | Default | Meaning |
---|---|---|
spark.acls.enable |
- false | -- Whether Spark acls should be enabled. If enabled, this checks to see if the user has - access permissions to view or modify the job. Note this requires the user to be known, - so if the user comes across as null no checks are done. Filters can be used with the UI - to authenticate and set the user. - | -
spark.admin.acls |
- Empty | -- Comma separated list of users/administrators that have view and modify access to all Spark jobs. - This can be used if you run on a shared cluster and have a set of administrators or devs who - help debug when things do not work. Putting a "*" in the list means any user can have the - privilege of admin. - | -
spark.admin.acls.groups |
- Empty | -
- Comma separated list of groups that have view and modify access to all Spark jobs.
- This can be used if you have a set of administrators or developers who help maintain and debug
- the underlying infrastructure. Putting a "*" in the list means any user in any group can have
- the privilege of admin. The user groups are obtained from the instance of the groups mapping
- provider specified by spark.user.groups.mapping . Check the entry
- spark.user.groups.mapping for more details.
- |
-
spark.user.groups.mapping |
- org.apache.spark.security.ShellBasedGroupsMappingProvider |
-
- The list of groups for a user is determined by a group mapping service defined by the trait
- org.apache.spark.security.GroupMappingServiceProvider which can be configured by this property.
- A default unix shell based implementation is provided org.apache.spark.security.ShellBasedGroupsMappingProvider
- which can be specified to resolve a list of groups for a user.
- Note: This implementation supports only a Unix/Linux based environment. Windows environment is
- currently not supported. However, a new platform/protocol can be supported by implementing
- the trait org.apache.spark.security.GroupMappingServiceProvider .
- |
-
spark.authenticate |
- false | -
- Whether Spark authenticates its internal connections. See
- spark.authenticate.secret if not running on YARN.
- |
-
spark.authenticate.secret |
- None | -- Set the secret key used for Spark to authenticate between components. This needs to be set if - not running on YARN and authentication is enabled. - | -
spark.network.crypto.enabled |
- false | -
- Enable encryption using the commons-crypto library for RPC and block transfer service.
- Requires spark.authenticate to be enabled.
- |
-
spark.network.crypto.keyLength |
- 128 | -- The length in bits of the encryption key to generate. Valid values are 128, 192 and 256. - | -
spark.network.crypto.keyFactoryAlgorithm |
- PBKDF2WithHmacSHA1 | -- The key factory algorithm to use when generating encryption keys. Should be one of the - algorithms supported by the javax.crypto.SecretKeyFactory class in the JRE being used. - | -
spark.network.crypto.saslFallback |
- true | -- Whether to fall back to SASL authentication if authentication fails using Spark's internal - mechanism. This is useful when the application is connecting to old shuffle services that - do not support the internal Spark authentication protocol. On the server side, this can be - used to block older clients from authenticating against a new shuffle service. - | -
spark.network.crypto.config.* |
- None | -- Configuration values for the commons-crypto library, such as which cipher implementations to - use. The config name should be the name of commons-crypto configuration without the - "commons.crypto" prefix. - | -
spark.authenticate.enableSaslEncryption |
- false | -- Enable encrypted communication when authentication is - enabled. This is supported by the block transfer service and the - RPC endpoints. - | -
spark.network.sasl.serverAlwaysEncrypt |
- false | -- Disable unencrypted connections for services that support SASL authentication. - | -
spark.core.connection.ack.wait.timeout |
- spark.network.timeout |
- - How long for the connection to wait for ack to occur before timing - out and giving up. To avoid unwilling timeout caused by long pause like GC, - you can set larger value. - | -
spark.modify.acls |
- Empty | -- Comma separated list of users that have modify access to the Spark job. By default only the - user that started the Spark job has access to modify it (kill it for example). Putting a "*" in - the list means any user can have access to modify it. - | -
spark.modify.acls.groups |
- Empty | -
- Comma separated list of groups that have modify access to the Spark job. This can be used if you
- have a set of administrators or developers from the same team to have access to control the job.
- Putting a "*" in the list means any user in any group has the access to modify the Spark job.
- The user groups are obtained from the instance of the groups mapping provider specified by
- spark.user.groups.mapping . Check the entry spark.user.groups.mapping
- for more details.
- |
-
spark.ui.filters |
- None | -
- Comma separated list of filter class names to apply to the Spark web UI. The filter should be a
- standard
- javax servlet Filter. Parameters to each filter can also be specified by setting a
- java system property of: - spark.<class name of filter>.params='param1=value1,param2=value2' - For example: - -Dspark.ui.filters=com.test.filter1 - -Dspark.com.test.filter1.params='param1=foo,param2=testing'
- |
-
spark.ui.view.acls |
- Empty | -- Comma separated list of users that have view access to the Spark web ui. By default only the - user that started the Spark job has view access. Putting a "*" in the list means any user can - have view access to this Spark job. - | -
spark.ui.view.acls.groups |
- Empty | -
- Comma separated list of groups that have view access to the Spark web ui to view the Spark Job
- details. This can be used if you have a set of administrators or developers or users who can
- monitor the Spark job submitted. Putting a "*" in the list means any user in any group can view
- the Spark job details on the Spark web ui. The user groups are obtained from the instance of the
- groups mapping provider specified by spark.user.groups.mapping . Check the entry
- spark.user.groups.mapping for more details.
- |
-
Property Name | Default | Meaning |
---|---|---|
spark.ssl.enabled |
- false | -
- Whether to enable SSL connections on all supported protocols.
-
- When spark.ssl.enabled is configured, spark.ssl.protocol
- is required.
-
- All the SSL settings like spark.ssl.xxx where xxx is a
- particular configuration property, denote the global configuration for all the supported
- protocols. In order to override the global configuration for the particular protocol,
- the properties must be overwritten in the protocol-specific namespace.
-
- Use spark.ssl.YYY.XXX settings to overwrite the global configuration for
- particular protocol denoted by YYY . Example values for YYY
- include fs , ui , standalone , and
- historyServer . See SSL
- Configuration for details on hierarchical SSL configuration for services.
- |
-
spark.ssl.[namespace].port |
- None | -
- The port where the SSL service will listen on.
-
- The port must be defined within a namespace configuration; see - SSL Configuration for the available - namespaces. - - When not set, the SSL port will be derived from the non-SSL port for the - same service. A value of "0" will make the service bind to an ephemeral port. - |
-
spark.ssl.enabledAlgorithms |
- Empty | -- A comma separated list of ciphers. The specified ciphers must be supported by JVM. - The reference list of protocols one can find on - this - page. - Note: If not set, it will use the default cipher suites of JVM. - | -
spark.ssl.keyPassword |
- None | -- A password to the private key in key-store. - | -
spark.ssl.keyStore |
- None | -- A path to a key-store file. The path can be absolute or relative to the directory where - the component is started in. - | -
spark.ssl.keyStorePassword |
- None | -- A password to the key-store. - | -
spark.ssl.keyStoreType |
- JKS | -- The type of the key-store. - | -
spark.ssl.protocol |
- None | -- A protocol name. The protocol must be supported by JVM. The reference list of protocols - one can find on this - page. - | -
spark.ssl.needClientAuth |
- false | -- Set true if SSL needs client authentication. - | -
spark.ssl.trustStore |
- None | -- A path to a trust-store file. The path can be absolute or relative to the directory - where the component is started in. - | -
spark.ssl.trustStorePassword |
- None | -- A password to the trust-store. - | -
spark.ssl.trustStoreType |
- JKS | -- The type of the trust-store. - | -
Property Name | Default | Meaning | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
spark.history.ui.acls.enable | -false | -
- Specifies whether acls should be checked to authorize users viewing the applications.
- If enabled, access control checks are made regardless of what the individual application had
- set for spark.ui.acls.enable when the application was run. The application owner
- will always have authorization to view their own application and any users specified via
- spark.ui.view.acls and groups specified via spark.ui.view.acls.groups
- when the application was run will also have authorization to view that application.
- If disabled, no access control checks are made.
- |
- ||||||||||||||
spark.history.ui.admin.acls | -empty | -- Comma separated list of users/administrators that have view access to all the Spark applications in - history server. By default only the users permitted to view the application at run-time could - access the related application history, with this, configured users/administrators could also - have the permission to access it. - Putting a "*" in the list means any user can have the privilege of admin. - | -||||||||||||||
spark.history.ui.admin.acls.groups | -empty | -- Comma separated list of groups that have view access to all the Spark applications in - history server. By default only the groups permitted to view the application at run-time could - access the related application history, with this, configured groups could also - have the permission to access it. - Putting a "*" in the list means any group can have the privilege of admin. - | -||||||||||||||
spark.history.fs.cleaner.enabled | false | diff --git a/docs/running-on-yarn.md b/docs/running-on-yarn.md index c010af35f8d2e..e07759a4dba87 100644 --- a/docs/running-on-yarn.md +++ b/docs/running-on-yarn.md @@ -2,6 +2,8 @@ layout: global title: Running Spark on YARN --- +* This will become a table of contents (this text will be scraped). +{:toc} Support for running on [YARN (Hadoop NextGen)](http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/YARN.html) @@ -217,8 +219,8 @@ To use a custom metrics.properties for the application master and executors, updspark.yarn.dist.forceDownloadSchemes |
(none) |
- Comma-separated list of schemes for which files will be downloaded to the local disk prior to - being added to YARN's distributed cache. For use in cases where the YARN service does not + Comma-separated list of schemes for which files will be downloaded to the local disk prior to + being added to YARN's distributed cache. For use in cases where the YARN service does not support schemes that are supported by Spark, like http, https and ftp. | ||||||||||||
spark.yarn.access.hadoopFileSystems |
- (none) | -
- A comma-separated list of secure Hadoop filesystems your Spark application is going to access. For
- example, spark.yarn.access.hadoopFileSystems=hdfs://nn1.com:8032,hdfs://nn2.com:8032,
- webhdfs://nn3.com:50070 . The Spark application must have access to the filesystems listed
- and Kerberos must be properly configured to be able to access them (either in the same realm
- or in a trusted realm). Spark acquires security tokens for each of the filesystems so that
- the Spark application can access those remote Hadoop filesystems. spark.yarn.access.namenodes
- is deprecated, please use this instead.
- |
-||||||||||||||
spark.yarn.appMasterEnv.[EnvironmentVariableName] |
(none) | @@ -373,31 +362,6 @@ To use a custom metrics.properties for the application master and executors, upd in YARN ApplicationReports, which can be used for filtering when querying YARN apps.|||||||||||||||
spark.yarn.keytab |
- (none) | -- The full path to the file that contains the keytab for the principal specified above. - This keytab will be copied to the node running the YARN Application Master via the Secure Distributed Cache, - for renewing the login tickets and the delegation tokens periodically. (Works also with the "local" master) - | -||||||||||||||
spark.yarn.principal |
- (none) | -- Principal to be used to login to KDC, while running on secure HDFS. (Works also with the "local" master) - | -||||||||||||||
spark.yarn.kerberos.relogin.period |
- 1m | -- How often to check whether the kerberos TGT should be renewed. This should be set to a value - that is shorter than the TGT renewal period (or the TGT lifetime if TGT renewal is not enabled). - The default value should be enough for most deployments. - | -||||||||||||||
spark.yarn.config.gatewayPath |
(none) | @@ -424,17 +388,6 @@ To use a custom metrics.properties for the application master and executors, upd See|||||||||||||||
spark.security.credentials.${service}.enabled |
- true |
- - Controls whether to obtain credentials for services when security is enabled. - By default, credentials for all supported services are retrieved when those services are - configured, but it's possible to disable that behavior if it somehow conflicts with the - application being run. For further details please see - [Running in a Secure Cluster](running-on-yarn.html#running-in-a-secure-cluster) - | -||||||||||||||
spark.yarn.rolledLog.includePattern |
(none) | @@ -468,48 +421,104 @@ To use a custom metrics.properties for the application master and executors, upd - The `--files` and `--archives` options support specifying file names with the # similar to Hadoop. For example you can specify: `--files localtest.txt#appSees.txt` and this will upload the file you have locally named `localtest.txt` into HDFS but this will be linked to by the name `appSees.txt`, and your application should use the name as `appSees.txt` to reference it when running on YARN. - The `--jars` option allows the `SparkContext.addJar` function to work if you are using it with local files and running in `cluster` mode. It does not need to be used if you are using it with HDFS, HTTP, HTTPS, or FTP files. -# Running in a Secure Cluster +# Kerberos + +Standard Kerberos support in Spark is covered in the [Security](security.html#kerberos) page. + +In YARN mode, when accessing Hadoop file systems, aside from the service hosting the user's home +directory, Spark will also automatically obtain delegation tokens for the service hosting the +staging directory of the Spark application. + +If an application needs to interact with other secure Hadoop filesystems, their URIs need to be +explicitly provided to Spark at launch time. This is done by listing them in the +`spark.yarn.access.hadoopFileSystems` property, described in the configuration section below. -As covered in [security](security.html), Kerberos is used in a secure Hadoop cluster to -authenticate principals associated with services and clients. This allows clients to -make requests of these authenticated services; the services to grant rights -to the authenticated principals. +The YARN integration also supports custom delegation token providers using the Java Services +mechanism (see `java.util.ServiceLoader`). Implementations of +`org.apache.spark.deploy.yarn.security.ServiceCredentialProvider` can be made available to Spark +by listing their names in the corresponding file in the jar's `META-INF/services` directory. These +providers can be disabled individually by setting `spark.security.credentials.{service}.enabled` to +`false`, where `{service}` is the name of the credential provider. + +## YARN-specific Kerberos Configuration + +
Property Name | Default | Meaning |
---|---|---|
spark.yarn.keytab |
+ (none) | +
+ The full path to the file that contains the keytab for the principal specified above. This keytab
+ will be copied to the node running the YARN Application Master via the YARN Distributed Cache, and
+ will be used for renewing the login tickets and the delegation tokens periodically. Equivalent to
+ the --keytab command line argument.
+
+ (Works also with the "local" master.) + |
+
spark.yarn.principal |
+ (none) | +
+ Principal to be used to login to KDC, while running on secure clusters. Equivalent to the
+ --principal command line argument.
+
+ (Works also with the "local" master.) + |
+
spark.yarn.access.hadoopFileSystems |
+ (none) | +
+ A comma-separated list of secure Hadoop filesystems your Spark application is going to access. For
+ example, spark.yarn.access.hadoopFileSystems=hdfs://nn1.com:8032,hdfs://nn2.com:8032,
+ webhdfs://nn3.com:50070 . The Spark application must have access to the filesystems listed
+ and Kerberos must be properly configured to be able to access them (either in the same realm
+ or in a trusted realm). Spark acquires security tokens for each of the filesystems so that
+ the Spark application can access those remote Hadoop filesystems.
+ |
+
spark.yarn.kerberos.relogin.period |
+ 1m | ++ How often to check whether the kerberos TGT should be renewed. This should be set to a value + that is shorter than the TGT renewal period (or the TGT lifetime if TGT renewal is not enabled). + The default value should be enough for most deployments. + | +
Property Name | Default | Meaning |
---|---|---|
spark.authenticate |
+ false | +Whether Spark authenticates its internal connections. | +
spark.authenticate.secret |
+ None | ++ The secret key used authentication. See above for when this configuration should be set. + | +
Property Name | Default | Meaning |
---|---|---|
spark.network.crypto.enabled |
+ false | ++ Enable AES-based RPC encryption, including the new authentication protocol added in 2.2.0. + | +
spark.network.crypto.keyLength |
+ 128 | ++ The length in bits of the encryption key to generate. Valid values are 128, 192 and 256. + | +
spark.network.crypto.keyFactoryAlgorithm |
+ PBKDF2WithHmacSHA1 | ++ The key factory algorithm to use when generating encryption keys. Should be one of the + algorithms supported by the javax.crypto.SecretKeyFactory class in the JRE being used. + | +
spark.network.crypto.config.* |
+ None | +
+ Configuration values for the commons-crypto library, such as which cipher implementations to
+ use. The config name should be the name of commons-crypto configuration without the
+ commons.crypto prefix.
+ |
+
spark.network.crypto.saslFallback |
+ true | ++ Whether to fall back to SASL authentication if authentication fails using Spark's internal + mechanism. This is useful when the application is connecting to old shuffle services that + do not support the internal Spark authentication protocol. On the shuffle service side, + disabling this feature will block older clients from authenticating. + | +
spark.authenticate.enableSaslEncryption |
+ false | ++ Enable SASL-based encrypted communication. + | +
spark.network.sasl.serverAlwaysEncrypt |
+ false | ++ Disable unencrypted connections for ports using SASL authentication. This will deny connections + from clients that have authentication enabled, but do not request SASL-based encryption. + | +
Property Name | Default | Meaning |
---|---|---|
spark.io.encryption.enabled |
+ false | ++ Enable local disk I/O encryption. Currently supported by all modes except Mesos. It's strongly + recommended that RPC encryption be enabled when using this feature. + | +
spark.io.encryption.keySizeBits |
+ 128 | ++ IO encryption key size in bits. Supported values are 128, 192 and 256. + | +
spark.io.encryption.keygen.algorithm |
+ HmacSHA1 | ++ The algorithm to use when generating the IO encryption key. The supported algorithms are + described in the KeyGenerator section of the Java Cryptography Architecture Standard Algorithm + Name Documentation. + | +
spark.io.encryption.commons.config.* |
+ None | +
+ Configuration values for the commons-crypto library, such as which cipher implementations to
+ use. The config name should be the name of commons-crypto configuration without the
+ commons.crypto prefix.
+ |
+
spark.user.groups.mapping
config option, described in the table
+below.
+
+The following options control the authentication of Web UIs:
+
+Property Name | Default | Meaning |
---|---|---|
spark.ui.filters |
+ None | ++ See the Spark UI configuration for how to configure + filters. + | +
spark.acls.enable |
+ false | ++ Whether UI ACLs should be enabled. If enabled, this checks to see if the user has access + permissions to view or modify the application. Note this requires the user to be authenticated, + so if no authentication filter is installed, this option does not do anything. + | +
spark.admin.acls |
+ None | ++ Comma-separated list of users that have view and modify access to the Spark application. + | +
spark.admin.acls.groups |
+ None | ++ Comma-separated list of groups that have view and modify access to the Spark application. + | +
spark.modify.acls |
+ None | ++ Comma-separated list of users that have modify access to the Spark application. + | +
spark.modify.acls.groups |
+ None | ++ Comma-separated list of groups that have modify access to the Spark application. + | +
spark.ui.view.acls |
+ None | ++ Comma-separated list of users that have view access to the Spark application. + | +
spark.ui.view.acls.groups |
+ None | ++ Comma-separated list of groups that have view access to the Spark application. + | +
spark.user.groups.mapping |
+ org.apache.spark.security.ShellBasedGroupsMappingProvider |
+
+ The list of groups for a user is determined by a group mapping service defined by the trait
+ org.apache.spark.security.GroupMappingServiceProvider , which can be configured by
+ this property.
+
+ By default, a Unix shell-based implementation is used, which collects this information + from the host OS. + + Note: This implementation supports only Unix/Linux-based environments. + Windows environment is currently not supported. However, a new platform/protocol can + be supported by implementing the trait mentioned above. + |
+
Property Name | Default | Meaning |
---|---|---|
spark.history.ui.acls.enable | +false | +
+ Specifies whether ACLs should be checked to authorize users viewing the applications in
+ the history server. If enabled, access control checks are performed regardless of what the
+ individual applications had set for spark.ui.acls.enable . The application owner
+ will always have authorization to view their own application and any users specified via
+ spark.ui.view.acls and groups specified via spark.ui.view.acls.groups
+ when the application was run will also have authorization to view that application.
+ If disabled, no access control checks are made for any application UIs available through
+ the history server.
+ |
+
spark.history.ui.admin.acls | +None | ++ Comma separated list of users that have view access to all the Spark applications in history + server. + | +
spark.history.ui.admin.acls.groups | +None | ++ Comma separated list of groups that have view access to all the Spark applications in history + server. + | +
Config Namespace | Component |
---|---|
spark.ssl |
+ + The default SSL configuration. These values will apply to all namespaces below, unless + explicitly overridden at the namespace level. + | +
spark.ssl.ui |
Spark application Web UI | @@ -58,49 +347,205 @@ component-specific configuration namespaces used to override the default setting
Property Name | Default | Meaning |
---|---|---|
${ns}.enabled |
+ false | +Enables SSL. When enabled, ${ns}.ssl.protocol is required. |
+
${ns}.port |
+ None | +
+ The port where the SSL service will listen on.
+
+ The port must be defined within a specific namespace configuration. The default + namespace is ignored when reading this configuration. + + When not set, the SSL port will be derived from the non-SSL port for the + same service. A value of "0" will make the service bind to an ephemeral port. + |
+
${ns}.enabledAlgorithms |
+ None | +
+ A comma separated list of ciphers. The specified ciphers must be supported by JVM.
+
+ The reference list of protocols can be found in the "JSSE Cipher Suite Names" section + of the Java security guide. The list for Java 8 can be found at + this + page. + + Note: If not set, the default cipher suite for the JRE will be used. + |
+
${ns}.keyPassword |
+ None | ++ The password to the private key in the key store. + | +
${ns}.keyStore |
+ None | ++ Path to the key store file. The path can be absolute or relative to the directory in which the + process is started. + | +
${ns}.keyStorePassword |
+ None | +Password to the key store. | +
${ns}.keyStoreType |
+ JKS | +The type of the key store. | +
${ns}.protocol |
+ None | +
+ TLS protocol to use. The protocol must be supported by JVM.
+
+ The reference list of protocols can be found in the "Additional JSSE Standard Names" + section of the Java security guide. For Java 8, the list can be found at + this + page. + |
+
${ns}.needClientAuth |
+ false | +Whether to require client authentication. | +
${ns}.trustStore |
+ None | ++ Path to the trust store file. The path can be absolute or relative to the directory in which + the process is started. + | +
${ns}.trustStorePassword |
+ None | +Password for the trust store. | +
${ns}.trustStoreType |
+ JKS | +The type of the trust store. | +
Property Name | Default | Meaning |
---|---|---|
spark.ui.xXssProtection |
+ 1; mode=block |
+
+ Value for HTTP X-XSS-Protection response header. You can choose appropriate value
+ from below:
+
|
+
spark.ui.xContentTypeOptions.enabled |
+ true |
+ + When enabled, X-Content-Type-Options HTTP response header will be set to "nosniff". + | +
spark.ui.strictTransportSecurity |
+ None | +
+ Value for HTTP Strict Transport Security (HSTS) Response Header. You can choose appropriate
+ value from below and set expire-time accordingly. This option is only used when
+ SSL/TLS is enabled.
+
|
+
Property Name | Default | Meaning | |
---|---|---|---|
spark.ui.xXssProtection |
- 1; mode=block |
-
- Value for HTTP X-XSS-Protection response header. You can choose appropriate value
- from below:
-
|
-|
spark.ui.xContentTypeOptions.enabled |
+ spark.security.credentials.${service}.enabled |
true |
- When value is set to "true", X-Content-Type-Options HTTP response header will be set - to "nosniff". Set "false" to disable. - | -
spark.ui.strictTransportSecurity |
- None | -
- Value for HTTP Strict Transport Security (HSTS) Response Header. You can choose appropriate
- value from below and set expire-time accordingly, when Spark is SSL/TLS enabled.
-
|
org.apache.spark.SecurityManager
for implementation details about security.
+## Long-Running Applications
+
+Long-running applications may run into issues if their run time exceeds the maximum delegation
+token lifetime configured in services it needs to access.
+
+Spark supports automatically creating new tokens for these applications when running in YARN mode.
+Kerberos credentials need to be provided to the Spark application via the `spark-submit` command,
+using the `--principal` and `--keytab` parameters.
+
+The provided keytab will be copied over to the machine running the Application Master via the Hadoop
+Distributed Cache. For this reason, it's strongly recommended that both YARN and HDFS be secured
+with encryption, at least.
+
+The Kerberos login will be periodically renewed using the provided credentials, and new delegation
+tokens for supported will be created.
+
+
+# Event Logging
+
+If your applications are using event logging, the directory where the event logs go
+(`spark.eventLog.dir`) should be manually created with proper permissions. To secure the log files,
+the directory permissions should be set to `drwxrwxrwxt`. The owner and group of the directory
+should correspond to the super user who is running the Spark History Server.
+This will allow all users to write to the directory but will prevent unprivileged users from
+reading, removing or renaming a file unless they own it. The event log files will be created by
+Spark with permissions such that only the user and group have read and write access.