Skip to content

Commit

Permalink
[SPARK-41958][CORE][3.3] Disallow arbitrary custom classpath with pro…
Browse files Browse the repository at this point in the history
…xy user in cluster mode (apache#706)

Backporting fix for SPARK-41958 to 3.3 branch from apache#39474
Below description from original PR.

--------------------------

### What changes were proposed in this pull request?

This PR proposes to disallow arbitrary custom classpath with proxy user in cluster mode by default.

### Why are the changes needed?

To avoid arbitrary classpath in spark cluster.

### Does this PR introduce _any_ user-facing change?

Yes. User should reenable this feature by `spark.submit.proxyUser.allowCustomClasspathInClusterMode`.

### How was this patch tested?

Manually tested.

Closes apache#39474 from Ngone51/dev.

Lead-authored-by: Peter Toth <peter.tothgmail.com>



(cherry picked from commit 909da96)

### What changes were proposed in this pull request?

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Closes apache#41428 from degant/spark-41958-3.3.

Lead-authored-by: Degant Puri <[email protected]>

Signed-off-by: Dongjoon Hyun <[email protected]>
Co-authored-by: Degant Puri <[email protected]>
Co-authored-by: Peter Toth <[email protected]>
  • Loading branch information
3 people authored Oct 30, 2023
1 parent 9985f0b commit 07fc833
Show file tree
Hide file tree
Showing 3 changed files with 24 additions and 0 deletions.
15 changes: 15 additions & 0 deletions core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
Original file line number Diff line number Diff line change
Expand Up @@ -303,6 +303,10 @@ private[spark] class SparkSubmit extends Logging {
val isKubernetesClient = clusterManager == KUBERNETES && deployMode == CLIENT
val isKubernetesClusterModeDriver = isKubernetesClient &&
sparkConf.getBoolean("spark.kubernetes.submitInDriver", false)
val isCustomClasspathInClusterModeDisallowed =
!sparkConf.get(ALLOW_CUSTOM_CLASSPATH_BY_PROXY_USER_IN_CLUSTER_MODE) &&
args.proxyUser != null &&
(isYarnCluster || isMesosCluster || isStandAloneCluster || isKubernetesCluster)

if (!isMesosCluster && !isStandAloneCluster) {
// Resolve maven dependencies if there are any and add classpath to jars. Add them to py-files
Expand Down Expand Up @@ -863,6 +867,13 @@ private[spark] class SparkSubmit extends Logging {

sparkConf.set("spark.app.submitTime", System.currentTimeMillis().toString)

if (childClasspath.nonEmpty && isCustomClasspathInClusterModeDisallowed) {
childClasspath.clear()
logWarning(s"Ignore classpath ${childClasspath.mkString(", ")} with proxy user specified " +
s"in Cluster mode when ${ALLOW_CUSTOM_CLASSPATH_BY_PROXY_USER_IN_CLUSTER_MODE.key} is " +
s"disabled")
}

(childArgs.toSeq, childClasspath.toSeq, sparkConf, childMainClass)
}

Expand Down Expand Up @@ -916,6 +927,10 @@ private[spark] class SparkSubmit extends Logging {
logInfo(s"Classpath elements:\n${childClasspath.mkString("\n")}")
logInfo("\n")
}
assert(!(args.deployMode == "cluster" && args.proxyUser != null && childClasspath.nonEmpty) ||
sparkConf.get(ALLOW_CUSTOM_CLASSPATH_BY_PROXY_USER_IN_CLUSTER_MODE),
s"Classpath of spark-submit should not change in cluster mode if proxy user is specified " +
s"when ${ALLOW_CUSTOM_CLASSPATH_BY_PROXY_USER_IN_CLUSTER_MODE.key} is disabled")
val loader = getSubmitClassLoader(sparkConf)
for (jar <- childClasspath) {
addJarToClasspath(jar, loader)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2360,4 +2360,11 @@ package object config {
.version("3.3.0")
.intConf
.createWithDefault(5)

private[spark] val ALLOW_CUSTOM_CLASSPATH_BY_PROXY_USER_IN_CLUSTER_MODE =
ConfigBuilder("spark.submit.proxyUser.allowCustomClasspathInClusterMode")
.internal()
.version("3.3.3")
.booleanConf
.createWithDefault(true)
}
2 changes: 2 additions & 0 deletions docs/core-migration-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,8 @@ license: |

- Since Spark 3.3, Spark migrates its log4j dependency from 1.x to 2.x because log4j 1.x has reached end of life and is no longer supported by the community. Vulnerabilities reported after August 2015 against log4j 1.x were not checked and will not be fixed. Users should rewrite original log4j properties files using log4j2 syntax (XML, JSON, YAML, or properties format). Spark rewrites the `conf/log4j.properties.template` which is included in Spark distribution, to `conf/log4j2.properties.template` with log4j2 properties format.

- Since Spark 3.3.3, `spark.submit.proxyUser.allowCustomClasspathInClusterMode` allows users to disable custom class path in cluster mode by proxy users. It still defaults to `true` to maintain backward compatibility.

## Upgrading from Core 3.1 to 3.2

- Since Spark 3.2, `spark.scheduler.allocation.file` supports read remote file using hadoop filesystem which means if the path has no scheme Spark will respect hadoop configuration to read it. To restore the behavior before Spark 3.2, you can specify the local scheme for `spark.scheduler.allocation.file` e.g. `file:///path/to/file`.
Expand Down

0 comments on commit 07fc833

Please sign in to comment.