[SPARK-48238][BUILD][YARN] Replace YARN AmIpFilter with a forked impl…

…ementation ### What changes were proposed in this pull request? This PR replaces AmIpFilter with a forked implementation, and removes the dependency `hadoop-yarn-server-web-proxy` ### Why are the changes needed? SPARK-47118 upgraded Spark built-in Jetty from 10 to 11, and migrated from `javax.servlet` to `jakarta.servlet`, which breaks the Spark on YARN. ``` Caused by: java.lang.IllegalStateException: class org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter is not a jakarta.servlet.Filter at org.sparkproject.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99) at org.sparkproject.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:93) at org.sparkproject.jetty.servlet.ServletHandler.lambda$initialize$2(ServletHandler.java:724) at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1625) at java.base/java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:734) at java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:762) at org.sparkproject.jetty.servlet.ServletHandler.initialize(ServletHandler.java:749) ... 38 more ``` During the investigation, I found a comment here apache/spark#31642 (comment) > Agree that in the long term we should either: 1) consider to re-implement the logic in Spark which allows us to get away from server-side dependency in Hadoop ... This should be a simple and clean way to address the exact issue, then we don't need to wait for Hadoop `jakarta.servlet` migration, and it also strips a Hadoop dependency. ### Does this PR introduce _any_ user-facing change? No, this recovers the bootstrap of the Spark application on YARN mode, keeping the same behavior with Spark 3.5 and earlier versions. ### How was this patch tested? UTs are added. (refer to `org.apache.hadoop.yarn.server.webproxy.amfilter.TestAmFilter`) I tested it in a YARN cluster. Spark successfully started. ``` roothadoop-master1:/opt/spark-SPARK-48238# JAVA_HOME=/opt/openjdk-17 bin/spark-sql --conf spark.yarn.appMasterEnv.JAVA_HOME=/opt/openjdk-17 --conf spark.executorEnv.JAVA_HOME=/opt/openjdk-17 WARNING: Using incubator modules: jdk.incubator.vector Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 2024-05-18 04:11:36 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2024-05-18 04:11:44 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive} is set, falling back to uploading libraries under SPARK_HOME. Spark Web UI available at http://hadoop-master1.orb.local:4040 Spark master: yarn, Application Id: application_1716005503866_0001 spark-sql (default)> select version(); 4.0.0 4ddc2303c7cbabee12a3de9f674aaacad3f5eb01 Time taken: 1.707 seconds, Fetched 1 row(s) spark-sql (default)> ``` When access `http://hadoop-master1.orb.local:4040`, it redirects to `http://hadoop-master1.orb.local:8088/proxy/redirect/application_1716005503866_0001/`, and the UI looks correct. <img width="1474" alt="image" src="https://github.com/apache/spark/assets/26535726/8500fc83-48c5-4603-8d05-37855f0308ae"> ### Was this patch authored or co-authored using generative AI tooling? No Closes #46611 from pan3793/SPARK-48238. Authored-by: Cheng Pan <[email protected]> Signed-off-by: yangjie01 <[email protected]>
a0x8o · May 20, 2024 · 59ac4a1 · 59ac4a1
1 parent 5a735a2
commit 59ac4a1
Show file tree

Hide file tree

Showing 2 changed files with 2 additions and 2 deletions.
diff --git a/...core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/StatefulsetPodsAllocator.scala b/...core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/StatefulsetPodsAllocator.scala
@@ -18,8 +18,8 @@ package org.apache.spark.scheduler.cluster.k8s
 
 import java.util.concurrent.TimeUnit
 
+import scala.collection.JavaConverters._
 import scala.collection.mutable
-import scala.jdk.CollectionConverters._
 
 import io.fabric8.kubernetes.api.model.{PersistentVolumeClaim,
   PersistentVolumeClaimBuilder, PodSpec, PodSpecBuilder, PodTemplateSpec}

diff --git a/...ore/src/test/scala/org/apache/spark/scheduler/cluster/k8s/StatefulsetAllocatorSuite.scala b/...ore/src/test/scala/org/apache/spark/scheduler/cluster/k8s/StatefulsetAllocatorSuite.scala
@@ -124,7 +124,7 @@ class StatefulSetAllocatorSuite extends SparkFunSuite with BeforeAndAfter {
     snapshotsStore = new DeterministicExecutorPodsSnapshotsStore()
     podsAllocatorUnderTest = new StatefulSetPodsAllocator(
       conf, secMgr, executorBuilder, kubernetesClient, snapshotsStore, null)
-    when(schedulerBackend.getExecutorIds()).thenReturn(Seq.empty)
+    when(schedulerBackend.getExecutorIds).thenReturn(Seq.empty)
     podsAllocatorUnderTest.start(TEST_SPARK_APP_ID, schedulerBackend)
   }