-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-14399] Remove unnecessary excludes from POMs and simplify Hive POM #12171
Conversation
It seems that the explicit dep. was pulling our version of Janino's commons-compiler a bit lower than it should have been.
See https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html#Dependency_Management > "dependency management takes precedence over dependency mediation"
Oh, /cc @vanzin as well. |
It looks like compilation failed because the This is a great illustration of why Spark's build bans the use of dependencies which aren't present in Maven Central: non-central repos can go offline and break a previously-good build. I did keep the exclusion which is supposed to be preventing this dep. from being pulled in, so I wonder whether this is a problem specific to the SBT build. Let me try re-running with Maven to see if our POM reader plugin could be to blame. |
Jenkins, retest this please |
<plugin> | ||
<groupId>org.apache.maven.plugins</groupId> | ||
<artifactId>maven-enforcer-plugin</artifactId> | ||
<version>1.4.1</version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: you don't need to repeat the version here
Great cleanup! |
From the Jenkins Maven output:
It looks like our Maven build isn't as strict in its enforcement of only using Maven Central. I guess maybe Maven will still try to resolve excluded transitive dependencies unless you add an explicit direct dependency? The extra download time by itself isn't a huge deal / that's not necessary what we want to optimize for (otherwise we'd just ban all transitive dependencies and explicitly promote everything to top-level). However, I don't want to rely on clojars so I'll look into restoring that direct dep. on calcite. |
It looks like our custom |
In fact, being able to use the nice wildcard syntax may be a way's off because the official sbt-pom-reader doesn't support Maven 3.3.x due to API incompatibilities: https://github.com/sbt/sbt-pom-reader/blob/master/project/dependencies.scala#L5 |
Actually, we might be able to use |
Test build #54971 has finished for PR 12171 at commit
|
Test build #54967 has finished for PR 12171 at commit
|
without looking at the details, I appreciated the ambition —and like that wildcard artifact exclusion. what is the dependency graph that SBT generates from this? Joda time is an interesting point ... we've seen some problems related to spark + aws having joda time mismatches. the amazon aws/s3/kinesis libs do require joda time, and there was some conflict creeping in there between the hadoop installation and kinesis imports in the spark job. |
Test build #54972 has finished for PR 12171 at commit
|
It looks like Hive might actually need Joda time:
|
I re-added joda-time; let's see if that fixes Hive. |
@@ -840,14 +827,13 @@ | |||
<groupId>org.apache.avro</groupId> | |||
<artifactId>avro-mapred</artifactId> | |||
<version>${avro.version}</version> | |||
<!-- use the build matching the hadoop api of avro-mapred (i.e. no classifier for hadoop 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we even need this anymore? We don't support hadoop1 in Spark 2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point; we can just hardcode hadoop2
here for now.
LGTM if sbt and maven tests pass. |
Test build #55002 has finished for PR 12171 at commit
|
Hmm, the FlumePollingStreamSuite test looks broken because it has hung twice now. I'll try to reproduce offline. |
@JoshRosen Guava and Jetty excludes might have been done when they were shaded: |
Now that we no longer publish assembly JARs for Spark, I don't think that shading is going to require exclusions because the only reason to do that would be if we wanted to completely prevent Jetty and Guava from appearing in our transitive dependency classpath, a task which currently isn't possible unless we republish all of our deps. which depend on Guava and Jetty. |
This patch aims to simplify our build by removing a number of unnecessary excludes. The individual commit messages describe the changes here in more detail, but there are a few key themes:
Remove many excludes in the root POM: According to the Maven documentation, "dependency management takes precedence over dependency mediation". In many cases, we were both excluding transitive dependencies while also using those dependencies in
spark-core
and pinning those dependencies' versions in thedependencyManagment
section of our POM.As a rough guideline, I think like we should only exclude dependencies under the following scenarios:
Remove explicitly-promoted transitive dependencies from Hive POM: in an early revision of SPARK-8064, build against Hive 1.2.1 #7191, it looks like we tried to use the
core
-classified version ofhive-exec
, which is published as a thin-JAR which does not bundle third-party dependencies' classes. However, both the regular and classified versions of the dependency share the same effective POM, which, in this case, happens to be the dependency-reduced-POM produced by the Maven shade plugin. As a result, users of thecore
-classified JAR must add direct dependencies on what are logically transitive dependencies ofhive-exec
, so SPARK-8064, build against Hive 1.2.1 #7191 added several such dependencies inhive/pom.xml
.However, by the end of SPARK-8064, build against Hive 1.2.1 #7191 we abandoned the idea of using an existing Hive JAR and chose to republish our own JAR, so this promotion of transitive dependencies is no longer necessary. As a result, this patch was able to significantly cut down the size of the
hive
POM by removing those direct dependencies. This had the side-effect of exposing a problem where one of our Janino JARs was being pulled to a lower version as a side-effect of this unnecessary dependency promotion.Use Maven's new wildcard exclude support: Maven now supports wildcard exclusion patterns, so I used that to simplify the exclusion of
org.ow2.asm
artifacts.Improve
dev/test-dependencies
: a bad regex caused theorg.spark-project:hive
JARs to not be listed underdev/deps
; this is now fixed.Remove explicit
io.netty
dependency: we don't seem to need this anymore. I did leave thedependencyManagement
entry for now in order to prevent an unexpected downgrade./cc @srowen @steveloughran @vanzin @pwendell @rxin for review.