-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-2848] Shade Guava in uber-jars. #1813
Changes from all commits
616998e
2fec990
637189b
d3ea8e1
fef4370
05e0a3d
819b445
9bdffb0
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -46,8 +46,14 @@ | |
</dependencies> | ||
</profile> | ||
</profiles> | ||
|
||
<dependencies> | ||
<!-- Promote Guava to compile scope in this module so it's included while shading. --> | ||
<dependency> | ||
<groupId>com.google.guava</groupId> | ||
<artifactId>guava</artifactId> | ||
<scope>compile</scope> | ||
</dependency> | ||
<dependency> | ||
<groupId>org.apache.spark</groupId> | ||
<artifactId>spark-core_${scala.binary.version}</artifactId> | ||
|
@@ -209,6 +215,12 @@ | |
</includes> | ||
</artifactSet> | ||
<filters> | ||
<filter> | ||
<artifact>com.google.guava:guava</artifact> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ditto here, add a comment on why we keep Optional There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (I guess it's above too, but wouldn't hurt to place it exactly where we do the exclude). |
||
<excludes> | ||
<exclude>com/google/common/base/Optional*</exclude> | ||
</excludes> | ||
</filter> | ||
<filter> | ||
<artifact>*:*</artifact> | ||
<excludes> | ||
|
@@ -226,6 +238,18 @@ | |
<goal>shade</goal> | ||
</goals> | ||
<configuration> | ||
<relocations> | ||
<relocation> | ||
<pattern>com.google</pattern> | ||
<shadedPattern>org.spark-project.guava</shadedPattern> | ||
<includes> | ||
<include>com.google.common.**</include> | ||
</includes> | ||
<excludes> | ||
<exclude>com.google.common.base.Optional**</exclude> | ||
</excludes> | ||
</relocation> | ||
</relocations> | ||
<transformers> | ||
<transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer" /> | ||
<transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer"> | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -260,6 +260,7 @@ | |
<groupId>com.google.guava</groupId> | ||
<artifactId>guava</artifactId> | ||
<version>14.0.1</version> | ||
<scope>provided</scope> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll probably learn something here, but why is it necessary to mark it provided and then compile-scope again in the assembly? what does it matter if it's compile scope everywhere? I realize Hive on Spark may need it to be provided but it can change the nature of this dependency down-stream if needed. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If you leave it as compile, it will pollute the compilation classpath of people using Spark in their projects (Guava will be pulled as a compile-scope dependency too). Adding it as compile-scope in the assembly project is then needed because maven-shade-plugin will only include those in the final jar (pretty sure there are setting to include other scopes, but didn't want to play with those). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah I understand what the scopes mean. But Spark is already "provided" to any project compiling against Spark; you're not suppose to pull in Spark as compile-scope. This makes all its transitive dependencies provided in this scenario already. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Well, if someone "accidentally" pulls Spark as a compile-scope dependency, they'd run into what I described. This avoids that problem. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yea, in that scenario, someone's trying to run Spark outside of a cluster providing Spark or something. But then this fails since Guava is said to be provided, but is not actually provided by anything! Unless I suppose the app also depends on Guava. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Well, but that's what provided means, right? It's provided. Somehow. That "how" is not the concern of the library being pulled. Normally it's provided by the Spark uber-jar which contains the dependency. But if someone is not using those, it becomes their job to provide it (and make sure it's a compatible version). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes it's what "provided" means, but why do you assume the app provides Guava itself, but none of the others? I see no reason to think of Guava specially and not Commons Math or log4j or whatever. What does it solve to treat this exceptionally? It means extra declarations in Spark, and does not affect your ability to shade the dependency in the assembly. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Well, this is kind of a problem with the scopes in maven and the way things are generally distributed... For libraries, I believe all dependencies should be "provided". So commons-math and all others should, in my view, also be provided. Of course I won't make that change here. When you package libraries into an app (which is the case of the uber-jars, meant to be distributed in a cluster), then you need to package those dependencies. In maven that generally means making them "compile" scope. For a pure library, you wouldn't have this packaging step; that would be done by the users of the library. Leaving the library with a compile-scope dependency means you'll run into all the issues I pointed out in the bug:
So this change solves that problem for Guava. If you use it, you have to explicitly declare and package it. For others, I'll leave it for the time when people decide to tackle other dependencies (I explicitly created a sub-task for Guava to avoid having to deal with other dependencies). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also note that if you leave it as a compile-time dependency, and people are using Guava in their app, and not packaging Guava, their app will fail to run. Because the uber jar in the cluster won't have those classes. So saying "provided" here moves that particular error to the build. |
||
</dependency> | ||
<dependency> | ||
<groupId>org.apache.commons</groupId> | ||
|
@@ -1017,6 +1018,21 @@ | |
|
||
<profiles> | ||
|
||
<!-- | ||
This profile is enabled automatically by the sbt built. It changes the scope for the guava | ||
dependency, since we don't shade it in the artifacts generated by the sbt build. | ||
--> | ||
<profile> | ||
<id>sbt</id> | ||
<dependencies> | ||
<dependency> | ||
<groupId>com.google.guava</groupId> | ||
<artifactId>guava</artifactId> | ||
<scope>compile</scope> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. hey just wondering - since spark core lists this a compile dependency, and everything else depends on spark core, what makes this necessary? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sorry - I see - the issue is that elsewhere this is not in compile scope. However, why not just have it be compile scope by default in the root pom? Is the issue that we need to make sure it does not exist at compile scope in our published root pom? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. See previous discussion I had with Sean in this PR. There's really no good scope in maven for what we want to do. "compile" means that guava will show up in the user's compilation classpath, and since we're shading guava in the assembly, it may lead to runtime errors since guava won't be there. "provided" means that guava is not exposed to the user at all; so if the user wants to use guava, he'll need to add an explicit dependency and package it himself. I like "provided" better because it means things fail for the user at build time, not runtime, if the user is using guava but hasn't declared the dependency. And that's why I had all those hacks in the sbt build that you asked me to remove. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I just meant that, since we are using the shade plug-in, will it just remove the entire guava from our published pom's anyways... then I realized that it won't remove guava from the root pom most likely, since we only shade inside of spark-core. |
||
</dependency> | ||
</dependencies> | ||
</profile> | ||
|
||
<!-- Ganglia integration is not included by default due to LGPL-licensed code --> | ||
<profile> | ||
<id>spark-ganglia-lgpl</id> | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there logic in here that specifically tells maven to exclude this from our published pom? Does this automatically happen by virtue of including the
artifactSet
below?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just tested this locally and it appears that it does get excluded by virtue of the artifact set.