Secure HDFS Support #414

ifilonenko · 2017-08-02T22:10:02Z

What changes were proposed in this pull request?

This it the on-going work of setting up Secure HDFS interaction with Spark-on-K8S.
The architecture is discussed in this community-wide google doc
This initiative can be broken down into 4 stages.

STAGE 1

Detecting HADOOP_CONF_DIR environmental variable and using Config Maps to store all Hadoop config files locally, while also setting HADOOP_CONF_DIR locally in the driver / executors

STAGE 2

Grabbing TGT from LTC or using keytabs+principle and creating a DT that will be mounted as a secret

STAGE 3

Driver + Executor Logic

How was this patch tested?

E2E Integration tests
- Stage 1
- Stage 2
- Stage 3
Unit tests
- Stage 1
- Stage 2
- Stage 3

Docs and Error Handling?

Docs
Error Handling

…ILE_DIR for Volume mount

…erized cluster in integration tests

ifilonenko · 2017-08-03T00:33:23Z

rerun unit tests please

ifilonenko · 2017-08-04T04:53:59Z

rerun integration tests please

ifilonenko · 2017-08-04T14:16:09Z

rerun integration tests please

ifilonenko · 2017-08-04T18:23:02Z

rerun unit tests please

kimoonkim · 2017-08-04T21:28:29Z

@ifilonenko and I talked offline. I am doing a preliminary review on this end-to-end prototype. After this review, we want to break this into smaller PRs and add unit tests to them.

ifilonenko · 2017-08-04T21:43:52Z

rerun integration tests please

kimoonkim

Starting a preliminary review on this end-to-end prototype. After this review, we want to break this into smaller PRs and add unit tests to them.

The change looks good overall. I was able to follow the code relatively easily.

kimoonkim · 2017-08-04T21:29:50Z

docs/running-on-kubernetes.md

+  <td><code>spark.kubernetes.kerberos</code></td> 
+  <td>false</td>
+  <td>
+    Specify whether your job is a job that will require a Delegation Token to access HDFS. By default, we


I feel like "Delegation Token" is too much detail for this user-exposed documentation. Rewrite this to mention Kerberos, but omit "Delegation Token"?

kimoonkim · 2017-08-04T21:32:02Z

docs/running-on-kubernetes.md

@@ -768,6 +768,53 @@ from the other deployment modes. See the [configuration page](configuration.html
    <code>myIdentifier</code>. Multiple node selector keys can be added by setting multiple configurations with this prefix.
  </td>
 </tr>
+<tr>
+  <td><code>spark.kubernetes.kerberos</code></td> 


Indicate this is a boolean by adding .enabled to the name? Many boolean flags have the suffix.

kimoonkim · 2017-08-04T21:36:57Z

docs/running-on-kubernetes.md

+  <td>
+    Assuming you have set <code>spark.kubernetes.kerberos</code> to be true. This will let you specify 
+    the location of your Kerberos keytab to be used in order to access Secure HDFS. This is optional as you 
+    may login by running <code>kinit -kt</code> before running the spark-submit, and the submission client


s/ -kt//. -kt is for keytab. kinit can be invoked without keytab file like $ kinit <USERNAME>, which also allows you to avoid using this option.

kimoonkim · 2017-08-04T21:37:07Z

docs/running-on-kubernetes.md

+  <td>
+    Assuming you have set <code>spark.kubernetes.kerberos</code> to be true. This will let you specify 
+    your Kerberos principal that you wish to use to access Secure HDFS. This is optional as you 
+    may login by running <code>kinit -kt</code> before running the spark-submit, and the submission client


kimoonkim · 2017-08-04T21:38:23Z

docs/running-on-kubernetes.md

+  <td><code>spark.kubernetes.kerberos.tokensecret.name</code></td> 
+  <td>(none)</td>
+  <td>
+    Assuming you have set <code>spark.kubernetes.kerberos</code> to be true. This will let you specify 


Mention this is optional if the user does not want to use an existing delegation token?

What do you mean?

kimoonkim · 2017-08-04T23:53:56Z

...park/deploy/kubernetes/submit/submitsteps/hadoopsteps/HadoopKerberosKeytabResolverStep.scala

+        // Spark core providers to handle delegation token renewal
+        renewer = jobUserUGI.getShortUserName
+        logInfo(s"Renewer is: $renewer")
+        renewedCredentials = new Credentials(originalCredentials)


s/renewedCredentials/credentials/

kimoonkim · 2017-08-04T23:54:26Z

...park/deploy/kubernetes/submit/submitsteps/hadoopsteps/HadoopKerberosKeytabResolverStep.scala

+        renewedCredentials = new Credentials(originalCredentials)
+        dfs.addDelegationTokens(renewer, renewedCredentials)
+        renewedTokens = renewedCredentials.getAllTokens.asScala
+        logInfo(s"Renewed tokens: ${renewedCredentials.toString}")


s/Renewed //

kimoonkim · 2017-08-04T23:54:57Z

...park/deploy/kubernetes/submit/submitsteps/hadoopsteps/HadoopKerberosKeytabResolverStep.scala

+        dfs.addDelegationTokens(renewer, renewedCredentials)
+        renewedTokens = renewedCredentials.getAllTokens.asScala
+        logInfo(s"Renewed tokens: ${renewedCredentials.toString}")
+        logInfo(s"All renewed tokens: ${renewedTokens.mkString(",")}")


s/ renewed tokens/tokens/

kimoonkim · 2017-08-04T23:55:15Z

...park/deploy/kubernetes/submit/submitsteps/hadoopsteps/HadoopKerberosKeytabResolverStep.scala

+        renewedTokens = renewedCredentials.getAllTokens.asScala
+        logInfo(s"Renewed tokens: ${renewedCredentials.toString}")
+        logInfo(s"All renewed tokens: ${renewedTokens.mkString(",")}")
+        logInfo(s"All renewed secret keys: ${renewedCredentials.getAllSecretKeys}")


s/renewed secret/secret/

kimoonkim · 2017-08-05T00:05:30Z

.../kubernetes/core/src/main/scala/org/apache/spark/deploy/kubernetes/HadoopConfBootstrap.scala

+        .withMountPath(HADOOP_FILE_DIR)
+        .endVolumeMount()
+      .addNewEnv()
+        .withName(HADOOP_CONF_DIR)


I am not sure setting the HADOOP_CONF_DIR env alone will make the driver and executor JVMs to find these hadoop config files. The config dir path should be included in the classpath of the JVMs when the JVMs start up. Given our driver and executor Dockerfiles launch the JVM directly (although through tini), I don't think any software layer will pick up the HADOOP_CONF_DIR env and put that in the JVM classpath.

I think we want to modify the command line of the Dockerfiles to include HADOOP_CONF_DIR env. Here's executor command line part for an example. We can add HADOOP_CONF_DIR to SPARK_CLASSPATH here. And do the same thing for the driver:

CMD SPARK_CLASSPATH=“${SPARK_HOME}/jars/*” && \ if ! [ -z ${SPARK_MOUNTED_CLASSPATH}+x} ]; then SPARK_CLASSPATH=“$SPARK_MOUNTED_CLASSPATH:$SPARK_CLASSPATH”; fi && \ if ! [ -z ${SPARK_EXECUTOR_EXTRA_CLASSPATH+x} ]; then SPARK_CLASSPATH=“$SPARK_EXECUTOR_EXTRA_CLASSPATH:$SPARK_CLASSPATH”; fi && \ if ! [ -z ${SPARK_EXTRA_CLASSPATH+x} ]; then SPARK_CLASSPATH=“$SPARK_EXTRA_CLASSPATH:$SPARK_CLASSPATH”; fi && \ if ! [ -z ${SPARK_MOUNTED_FILES_DIR} ]; then cp -R “$SPARK_MOUNTED_FILES_DIR/.” .; fi && \ exec /sbin/tini -- ${JAVA_HOME}/bin/java -Dspark.executor.port=$SPARK_EXECUTOR_PORT -Xms$SPARK_EXECUTOR_MEMORY -Xmx$SPARK_EXECUTOR_MEMORY -cp $SPARK_CLASSPATH org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url $SPARK_DRIVER_URL --executor-id $SPARK_EXECUTOR_ID --cores $SPARK_EXECUTOR_CORES --app-id $SPARK_APPLICATION_ID --hostname $SPARK_EXECUTOR_POD_IP

ifilonenko · 2017-08-31T21:52:00Z

Partial mocking of UGI functions has been done, with the exception of the FileSystem portion in the KeytabResolverStep.

Garbage Collection of the secret post job is already handled by the Client.scala OwnerReference.

Current failures in integration tests are due to issues found after rebasing PRs. Will be addressed before ready for merging

…-414

Fix executor env to include simple authn

…-414

Fix a bug in executor env handling

Fix a bug in how the driver sets simple authn

ifilonenko · 2017-09-02T07:48:57Z

rerun integration tests please

ifilonenko · 2017-09-02T15:42:43Z

rerun integration tests please

kimoonkim

I finished looking a the change. Looks good to me overall. Thanks for putting this together.

Left a few comments. PTAL.

Perhaps we can merge this into the target branch hdfs-kerberos-support soon after this round. Then start breaking this into multiple PRs heading to branch-2.2-kubernetes. Like one PR for the main code and another for the integration test code.

kimoonkim · 2017-09-11T16:32:34Z

docs/running-on-kubernetes.md

+  <td>spark.kubernetes.kerberos.dt.label</td>
+  <td>
+    Assuming you have set <code>spark.kubernetes.kerberos.enabled</code> to be true. This will let you specify 
+    the label within the pre-specified secret where the data of your existing delegation token data is stored. 


s/label/data item key name/

kimoonkim · 2017-09-11T16:33:19Z

docs/running-on-kubernetes.md

+  <td>
+    Assuming you have set <code>spark.kubernetes.kerberos.enabled</code> to be true. This will let you specify 
+    the name of the secret where your existing delegation token data is stored. You must also specify the 
+    item key <code>spark.kubernetes.kerberos.tokensecret.itemkey</code> where your data is stored on the secret. 


Can you mention that this is optional in case you want to use pre-existing secret and a new secret will be automatically created otherwise?

kimoonkim · 2017-09-11T16:34:00Z

resource-managers/kubernetes/core/pom.xml

@@ -100,6 +100,12 @@
    <dependency>
      <groupId>com.fasterxml.jackson.jaxrs</groupId>
      <artifactId>jackson-jaxrs-json-provider</artifactId>
+      <exclusions>


Hmm. Do we still need this?

kimoonkim · 2017-09-11T16:36:12Z

...etes/core/src/main/scala/org/apache/spark/deploy/kubernetes/KerberosTokenConfBootstrap.scala

+  * This is separated out from the HadoopConf steps API because this component can be reused to
+  * mounted the DT secret for executors as well.
+  */
+private[spark] trait KerberosTokenBootstrapConf {


Make the trail name match the file name, KerberosTokenConfBootstrap?

kimoonkim · 2017-09-11T16:39:56Z

.../src/main/scala/org/apache/spark/scheduler/cluster/kubernetes/KubernetesClusterManager.scala

+    val maybeHadoopConfigMap = sparkConf.getOption(HADOOP_CONFIG_MAP_SPARK_CONF_NAME)
+    val maybeHadoopConfDir = sparkConf.getOption(HADOOP_CONF_DIR_LOC)
+    val maybeDTSecretName = sparkConf.getOption(HADOOP_KERBEROS_CONF_SECRET)
+    val maybeDTLabelName = sparkConf.getOption(HADOOP_KERBEROS_CONF_ITEM_KEY)


s/LabelName/DataItem/

kimoonkim · 2017-09-11T16:44:11Z

...park/deploy/kubernetes/submit/submitsteps/hadoopsteps/HadoopKerberosKeytabResolverStep.scala

+      }})
+    credentials.getAllTokens.asScala.isEmpty
+    tokens.isEmpty
+    if (tokens.isEmpty) logError("Did not obtain any Delegation Tokens")


I think we should throw an exception. Otherwise, it's hard to debug this downstream.

kimoonkim · 2017-09-11T16:46:32Z

...park/deploy/kubernetes/submit/submitsteps/hadoopsteps/HadoopKerberosKeytabResolverStep.scala

+    val secretDT =
+      new SecretBuilder()
+        .withNewMetadata()
+          .withName(HADOOP_KERBEROS_SECRET_NAME)


Shouldn't we generate a unique secret name so that multiple jobs can run simultaneously using different secrets?

On the same namespace why would you have the need for different secrets? I guess the problem would happen when the ownerRef destroy's the secret while another job is processing.

Yes, if we have multiple jobs, it's better to have one secret per job.

kimoonkim · 2017-09-11T16:57:21Z

...park/deploy/kubernetes/submit/submitsteps/hadoopsteps/HadoopKerberosKeytabResolverStep.scala

+    tokens.isEmpty
+    if (tokens.isEmpty) logError("Did not obtain any Delegation Tokens")
+    val data = serialize(credentials)
+    val renewalTime = getTokenRenewalInterval(tokens, hadoopConf).getOrElse(Long.MaxValue)


s/renewalTime/renewalInterval/

kimoonkim · 2017-09-11T17:04:32Z

.../apache/spark/deploy/kubernetes/submit/submitsteps/hadoopsteps/HadoopStepsOrchestrator.scala

+      hadoopConfDir)
+    val maybeKerberosStep =
+      if (isKerberosEnabled) {
+        maybeExistingSecret.map(secretItemKey => Some(new HadoopKerberosSecretResolverStep(


s/secretItemKey/existingSecretName/

kimoonkim · 2017-09-11T17:39:48Z

...park/deploy/kubernetes/submit/submitsteps/hadoopsteps/HadoopKerberosKeytabResolverStep.scala

+          logInfo(s"Renewal interval is $interval for token ${token.getKind.toString}")
+          interval
+        }.toOption}
+      if (renewIntervals.isEmpty) None else Some(renewIntervals.min)


Just as a note to myself, I see now that we'll return the earliest expiration time in case there are multiple tokens.

Maybe we should add a comment about this, because it wasn't easy to discover.

kimoonkim · 2017-09-11T22:26:57Z

.../scala/org/apache/spark/deploy/kubernetes/submit/submitsteps/HadoopConfigBootstrapStep.scala

+      currentHadoopSpec = nextStep.configureContainers(currentHadoopSpec)
+    }
+    val configMap =
+      new ConfigMapBuilder()


Also use owner ref here so this gets deleted after the job is done?

I believe since all of these are added to the driverSpec: otherKubernetesResources the ownerref is applied after the driverPod is launched in Client.scala

kimoonkim · 2017-09-19T21:54:36Z

The latest commit addressed most of my comments. Looks great to me. Thanks @ifilonenko for the work so far.

ifilonenko · 2017-09-19T21:57:42Z

@erikerlandson after all tests pass, can you give the final okay before merge?

erikerlandson · 2017-09-20T00:48:33Z

LGTM, and passing CI. This is good to merge when we're ready!

foxish · 2017-09-20T00:49:53Z

Let's merge after cutting the new release and tagging.

ifilonenko · 2017-09-26T02:05:17Z

important note. this PR will require refactoring upon merging because of most recent commits with renaming and unit test additions to the KubernetesSchedulerBackend. These changes will be handled on the hdfs-kerberos-support branch directly.

New master

ifilonenko added 17 commits August 1, 2017 16:23

Initial architecture design for HDFS support

ea9e516

Minor styling

47ea307

Added proper logic for mounting ConfigMaps

60a19ca

styling

1d0175a

modified otherKubernetesResource logic

163193a

fixed Integration tests and modified HADOOP_CONF_DIR variable to be F…

8381fa6

…ILE_DIR for Volume mount

setting HADOOP_CONF_DIR env variables

d4b1a68

Included integration tests for Stage 1

0bba092

Initial Kerberos support

06df962

initial Stage 2 architecture using deprecated 2.1 methods

d7f54dd

Added current, BROKEN, integration test environment for review

d3c5a03

working hadoop cluster

d7441ba

Using locks and monitors to ensure proper configs for setting up kerb…

04eed68

…erized cluster in integration tests

working Stage 2

62354eb

documentation

514ac19

Integration Stages 1,2 and 3

3fbf88c

further testing work

b321436

ifilonenko mentioned this pull request Aug 2, 2017

[WIP] Secure HDFS Support #391

Closed

14 tasks

fixing imports

b6912d2

Stage 3 Integration tests pass

c6b11f8

ifilonenko added 2 commits August 4, 2017 08:42

uncommented SparkDockerBuilder

1e71ca7

testing fix

350c8ed

kimoonkim suggested changes Aug 5, 2017

View reviewed changes

handled comments and increased test hardening

5e4051c

ifilonenko mentioned this pull request Aug 30, 2017

Sync hdfs-kerberos-support to branch-2.2-kubernetes #472

Merged

ifilonenko added 4 commits August 31, 2017 01:04

merge issues

499b037

address initial comments and scalastyle issues

ffe7891

addresses comments from PR

6efa379

mocking hadoopUGI

6052a13

kimoonkim added 2 commits September 1, 2017 13:15

Fix executor env to include simple authn

f9ca47d

Merge remote-tracking branch 'bloomberg/secure-hdfs-support4' into pr…

91e364c

…-414

kimoonkim mentioned this pull request Sep 1, 2017

Fix executor env to include simple authn bloomberg/apache-spark-on-k8s#1

Merged

ifilonenko and others added 6 commits September 1, 2017 17:19

Merge pull request #1 from kimoonkim/pr-414

4fe86f0

Fix executor env to include simple authn

Fix a bug in executor env handling

d2c8649

Merge remote-tracking branch 'bloomberg/secure-hdfs-support4' into pr…

4780878

…-414

Merge pull request #2 from kimoonkim/pr-414

17f2702

Fix a bug in executor env handling

Fix a bug in how the driver sets simple authn

b566fa9

Merge pull request #3 from kimoonkim/pr-414

726ff64

Fix a bug in how the driver sets simple authn

kimoonkim reviewed Sep 11, 2017

View reviewed changes

handling Pr comments

2d48613

kimoonkim approved these changes Sep 19, 2017

View reviewed changes

erikerlandson merged commit 569f73c into apache-spark-on-k8s:hdfs-kerberos-support Sep 28, 2017

ifilonenko mentioned this pull request Oct 12, 2017

SparkPi Example: java.nio.channels.UnresolvedAddressException #523

Closed

chenchun mentioned this pull request Dec 19, 2017

Append HADOOP_CONF_DIR to SPARK_CLASS in driver/executor Dockerfiles #578

Merged

ifilonenko pushed a commit to ifilonenko/spark that referenced this pull request Feb 26, 2019

Merge pull request apache-spark-on-k8s#414 from palantir/sr/new-master

0d4c3a0

New master

Secure HDFS Support #414

Secure HDFS Support #414

Conversation

ifilonenko commented Aug 2, 2017 • edited Loading

What changes were proposed in this pull request?

How was this patch tested?

Docs and Error Handling?

ifilonenko commented Aug 3, 2017

ifilonenko commented Aug 4, 2017

ifilonenko commented Aug 4, 2017

ifilonenko commented Aug 4, 2017

kimoonkim commented Aug 4, 2017

ifilonenko commented Aug 4, 2017

kimoonkim left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ifilonenko commented Aug 31, 2017

ifilonenko commented Sep 2, 2017

ifilonenko commented Sep 2, 2017

kimoonkim left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kimoonkim commented Sep 19, 2017

ifilonenko commented Sep 19, 2017

erikerlandson commented Sep 20, 2017

foxish commented Sep 20, 2017

ifilonenko commented Sep 26, 2017

ifilonenko commented Aug 2, 2017 •

edited

Loading