Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-30051][BUILD] Clean up hadoop-3.2 dependency #26742

Closed
wants to merge 1 commit into from
Closed

[SPARK-30051][BUILD] Clean up hadoop-3.2 dependency #26742

wants to merge 1 commit into from

Conversation

dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Dec 3, 2019

What changes were proposed in this pull request?

This PR aims to cut org.eclipse.jetty:jetty-webappand org.eclipse.jetty:jetty-xml transitive dependency from hadoop-common.

Why are the changes needed?

This will simplify our dependency management by the removal of unused dependencies.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Pass the GitHub Action with all combinations and the Jenkins UT with (Hadoop-3.2).

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-30051][BUILD] Clean up hadoop-3.2 dependency [SPARK-30051][BUILD][test-hadoop3.2] Clean up hadoop-3.2 dependency Dec 3, 2019
@dongjoon-hyun
Copy link
Member Author

Retest this please.

@SparkQA
Copy link

SparkQA commented Dec 3, 2019

Test build #114744 has finished for PR 26742 at commit b326f31.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 3, 2019

Test build #114745 has finished for PR 26742 at commit b326f31.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-30051][BUILD][test-hadoop3.2] Clean up hadoop-3.2 dependency [SPARK-30051][BUILD] Clean up hadoop-3.2 dependency Dec 3, 2019
@dongjoon-hyun
Copy link
Member Author

Hi, @srowen and @HyukjinKwon .
Could you review this PR?

<exclusion>
<groupId>org.eclipse.jetty</groupId>
<artifactId>jetty-webapp</artifactId>
</exclusion>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srowen
Copy link
Member

srowen commented Dec 3, 2019

What's the logic here -- it's not actually needed at compile time because?

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Dec 3, 2019

spark-core already embeds the required jetty libraries into spark-core_2.13_*.jar, and these files are not used even in our UTs. Ur, are we using these libraries?

@dongjoon-hyun
Copy link
Member Author

If I misunderstand something, I'll close this PR~ Please let me know.

@srowen
Copy link
Member

srowen commented Dec 3, 2019

I may also be missing something, but jetty is still used throughout the code. This seems to remove it entirely from the distribution by marking it provided. Given that it's directly used it seems to need to be compile scope and be included. I do agree with a direct dependency on it if it's directly used.

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Dec 3, 2019

Before this PR, I tested with make-distribution.sh and master/worker/historyserver/sparkthriftserver/shell (I didn't test the yarn suffle service though).

This seems to remove it entirely from the distribution by marking it provided.

  1. Yes. This PR removes two files because we embedded needed jetty into spark-core library already. The following is the result of make-distribution.sh.
$ jar tvf spark-core_2.12-3.0.0-SNAPSHOT.jar | grep jetty | tail -n5
 11996 Mon Dec 02 22:25:20 PST 2019 org/sparkproject/jetty/servlets/PushCacheFilter.class
  9569 Mon Dec 02 22:25:20 PST 2019 org/sparkproject/jetty/servlets/PutFilter.class
  1302 Mon Dec 02 22:25:20 PST 2019 org/sparkproject/jetty/servlets/CGI$3.class
   108 Mon Apr 29 15:50:58 PDT 2019 META-INF/maven/org.eclipse.jetty/jetty-servlets/pom.properties
  2142 Mon Dec 02 22:25:20 PST 2019 org/sparkproject/jetty/servlets/CGI$EnvList.class
  1. Jetty project consists of multi-modules like Apache Spark. jetty-webapp is something like graphx. jetty-webapp has WebAppContext which loads web.xml and builds web apps.

A WebAppContext is a derivation of ServletContextHandler that supports the standardized layout of a web application and configuration of session, security, listeners, filter, servlets, and JSP via a web.xml descriptor normally found in the /WEB-INF directory of a web application.

https://github.com/eclipse/jetty.project/tree/jetty-9.4.x/jetty-webapp

AFAIK, Apache Spark don't use web.xml or the standardized-layouted web applications style. I guess yarn shuffle-service also doesn't use this.

@srowen
Copy link
Member

srowen commented Dec 3, 2019

OK yeah I agree the webapp artifact isn't used directly. It isn't in the other deps files either. When you say embedded, you mean shaded? Yes it is.

I guess I'm confused why we don't see the artifact in the deps list anyway, even if shaded, or why we only see a few still in the Hadoop 2.x builds like jetty-util, but, maybe I just don't recall how it works. OK well it looks consistent at least.

@dongjoon-hyun
Copy link
Member Author

Thank you for review. Merged to master.

@dongjoon-hyun dongjoon-hyun deleted the SPARK-30051 branch December 3, 2019 22:34
Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me too

attilapiros pushed a commit to attilapiros/spark that referenced this pull request Dec 6, 2019
### What changes were proposed in this pull request?

This PR aims to cut `org.eclipse.jetty:jetty-webapp`and `org.eclipse.jetty:jetty-xml` transitive dependency from `hadoop-common`.

### Why are the changes needed?

This will simplify our dependency management by the removal of unused dependencies.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Pass the GitHub Action with all combinations and the Jenkins UT with (Hadoop-3.2).

Closes apache#26742 from dongjoon-hyun/SPARK-30051.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
lwwmanning pushed a commit to palantir/spark that referenced this pull request Jun 19, 2020
This PR aims to cut `org.eclipse.jetty:jetty-webapp`and `org.eclipse.jetty:jetty-xml` transitive dependency from `hadoop-common`.

This will simplify our dependency management by the removal of unused dependencies.

No.

Pass the GitHub Action with all combinations and the Jenkins UT with (Hadoop-3.2).

Closes apache#26742 from dongjoon-hyun/SPARK-30051.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants