Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-48238][BUILD][YARN] Replace YARN AmIpFilter with a forked impl…
…ementation ### What changes were proposed in this pull request? This PR replaces AmIpFilter with a forked implementation, and removes the dependency `hadoop-yarn-server-web-proxy` ### Why are the changes needed? SPARK-47118 upgraded Spark built-in Jetty from 10 to 11, and migrated from `javax.servlet` to `jakarta.servlet`, which breaks the Spark on YARN. ``` Caused by: java.lang.IllegalStateException: class org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter is not a jakarta.servlet.Filter at org.sparkproject.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99) at org.sparkproject.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:93) at org.sparkproject.jetty.servlet.ServletHandler.lambda$initialize$2(ServletHandler.java:724) at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1625) at java.base/java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:734) at java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:762) at org.sparkproject.jetty.servlet.ServletHandler.initialize(ServletHandler.java:749) ... 38 more ``` During the investigation, I found a comment here apache/spark#31642 (comment) > Agree that in the long term we should either: 1) consider to re-implement the logic in Spark which allows us to get away from server-side dependency in Hadoop ... This should be a simple and clean way to address the exact issue, then we don't need to wait for Hadoop `jakarta.servlet` migration, and it also strips a Hadoop dependency. ### Does this PR introduce _any_ user-facing change? No, this recovers the bootstrap of the Spark application on YARN mode, keeping the same behavior with Spark 3.5 and earlier versions. ### How was this patch tested? UTs are added. (refer to `org.apache.hadoop.yarn.server.webproxy.amfilter.TestAmFilter`) I tested it in a YARN cluster. Spark successfully started. ``` roothadoop-master1:/opt/spark-SPARK-48238# JAVA_HOME=/opt/openjdk-17 bin/spark-sql --conf spark.yarn.appMasterEnv.JAVA_HOME=/opt/openjdk-17 --conf spark.executorEnv.JAVA_HOME=/opt/openjdk-17 WARNING: Using incubator modules: jdk.incubator.vector Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 2024-05-18 04:11:36 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2024-05-18 04:11:44 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive} is set, falling back to uploading libraries under SPARK_HOME. Spark Web UI available at http://hadoop-master1.orb.local:4040 Spark master: yarn, Application Id: application_1716005503866_0001 spark-sql (default)> select version(); 4.0.0 4ddc2303c7cbabee12a3de9f674aaacad3f5eb01 Time taken: 1.707 seconds, Fetched 1 row(s) spark-sql (default)> ``` When access `http://hadoop-master1.orb.local:4040`, it redirects to `http://hadoop-master1.orb.local:8088/proxy/redirect/application_1716005503866_0001/`, and the UI looks correct. <img width="1474" alt="image" src="https://github.com/apache/spark/assets/26535726/8500fc83-48c5-4603-8d05-37855f0308ae"> ### Was this patch authored or co-authored using generative AI tooling? No Closes #46611 from pan3793/SPARK-48238. Authored-by: Cheng Pan <[email protected]> Signed-off-by: yangjie01 <[email protected]>
- Loading branch information