-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-6122][Core] Upgrade tachyon-client version to 0.6.3 #5354
Conversation
…rsion than spark's expected version.
Can one of the admins verify this patch? |
Jenkins, test this please. |
Test build #29691 has started for PR 5354 at commit |
Test build #29691 has finished for PR 5354 at commit
|
Test FAILed. |
Jenkins, test this please |
Test PASSed. |
Jenkins, retest this please. |
Test build #29731 has started for PR 5354 at commit |
Test build #29731 has finished for PR 5354 at commit
|
Test PASSed. |
@JoshRosen @haoyuan |
Jenkins, retest this please. |
1 similar comment
Jenkins, retest this please. |
Test build #29753 has started for PR 5354 at commit |
Test build #29753 has finished for PR 5354 at commit
|
Test PASSed. |
@JoshRosen @aarondav @pwendell Thanks! |
The change LGTM but I'll leave it to @pwendell to OK the pom changes. I'm not sure if there's a better way to express basically "don't include any of this guy's transitive dependencies". |
Yea, this is really messy. Why are so many exclusions needed? many of these are artifacts that are already in the assembly, so, there's no particular problem. However it does look like the Tachyon client has a lot of big dependencies. Is the client really pulling in At the least we need to keep out hadoop-core, the servlet APIs, EL, probably hsqldb if possible, Glassfish. Anything that's added needs its license checked. |
Oh, I'm referring to I think the right-er thing to do given the facts here is just make the whole project actually use the version of httpclient / httpcore that this string implies, and make it 4.3.6. |
hm.. your approach sounds good to me, we should really make sure we have the same version of a library across dependent projects. |
@srowen For Finding the perfect way to reconcile Spark's dependencies as you mentioned before, deserves its own ticket and will require substantial effort. For example, it is not even easy to tell that different versions of @ScrapCodes |
I don't see anything in Right now, If Tachyon requires HttpClient 4.3.2+, then that will need to be resolved here. Your change does not actually cause Spark to use 4.3.2, as I say above. That's easy to address, and needs to happen in this PR. We don't need a separate PR for the Kinesis change since that's logically part of the change you will have to make here to make this do what you want. |
@srowen Tachyon does not require httpclient 4.3.2, it's the selenium tests that do. To be clear, the dependency changes I am making are to address the jar conflicts that occur when running the sbt tests. |
Why would the Tachyon version change itself alter the jackson or httpclient dependencies? was it already not working? |
The conflicting dependencies were already included prior to my changes. I think since the correct jars were also available, there is a chance to use those and mask the problem. |
@srowen |
I'm running SBT tests now just to see if I can reproduce a failure. I guess I'd be surprised if the default SBT build doesn't work, since it would have been this way for a while. I do see that the SBT build adds a bunch of stuff to So, if the dependency changes are not related to Tachyon, I think I'd skip them. There's no problem with Maven right now. The Jackson exclusion should not make a difference; in Maven, the version is consistently 1.8.8 for pre-Hadoop-2 builds, and 1.9.13 for Hadoop 2.2+ builds. The HttpClient situation is a little bit wrong but if it's not actually affecting Tachyon, that can be fixed separately. |
@srowen |
Yeah, that still confuses me. If Tachyon doesn't touch I can appreciate that -- whatever is going on -- we still want the SBT build to work even if it's not the main build, and still want Hadoop 1.x builds to work even if it's not the default. Let me try applying this change and seeing the diff in SBT myself, to try to verify what is essential to change. |
@srowen |
I'm still working through this -- I do see the same problem you see with the httpclient library, and it's because the SBT resolution rules are different. It's weird, but yeah that probably has to be patched up as you have done. I have a proposed change that touches up the handling in Kinesis, etc. I didn't see a Jackson-related problem, not yet, but that may be masked by earlier failures. Basically, I'd be surprised if this exclusion is the only possible solution, so I want to try it out in a different environment. |
@srowen |
I don't see a Jackson-related failure, even when directly running However I do see the HttpClient problem:
The change here works, although I think it can be tightened up a little bit. My last experiment will be to see if we can get away with |
This is the best I've got, which still works with SBT: It's mostly the same, but I don't find a jackson exclusion is needed, and I think the httpclient situation could be further tightened. @aniketbhatnagar @ScrapCodes does the change to the |
It seems like the Jackson dep has to be excluded to get SBT + Hadoop 1.0.4 to work. I think that has to stay then, yeah. I think the httpclient stuff can be cleaned up a small bit but that too is essential. I'm getting worried at how much the divergence between SBT and Maven is causing us to hack the build, making it harder to get the build right for both. For example, these changes aren't necessary at all for Maven. It's exacerbated by trying to support Hadoop 1.x. Still maybe we kick this can down the road a bit longer, to get in this change. |
…client version setting from profiles.
@srowen |
Test build #30915 has started for PR 5354 at commit |
+1 from my side. having a consistent httpclient version would be so much better! |
Test build #30915 has finished for PR 5354 at commit
|
Test PASSed. |
LGTM. Thank you for your perseverance. This gets the change in with minimal additional change to the build, keeps everything compiling and actually improves the management of one dependency along the way. I think the large list of removed dependencies above is a false positive. It can't remove these. Let me merge and let's double check that the other Jenkins builds are still happy. |
@srowen @aniketbhatnagar |
This is a reopening of apache#4867. A short summary of the issues resolved from the previous PR: 1. HTTPClient version mismatch: Selenium (used for UI tests) requires version 4.3.x, and Tachyon included 4.2.5 through a transitive dependency of its shaded thrift jar. To address this, Tachyon 0.6.3 will promote the transitive dependencies of the shaded jar so they can be excluded in spark. 2. Jackson-Mapper-ASL version mismatch: In lower versions of hadoop-client (ie. 1.0.4), version 1.0.1 is included. The parquet library used in spark sql requires version 1.8+. Its unclear to me why upgrading tachyon-client would cause this dependency to break. The solution was to exclude jackson-mapper-asl from hadoop-client. It seems that the dependency management in spark-parent will not work on transitive dependencies, one way to make sure jackson-mapper-asl is included with the correct version is to add it as a top level dependency. The best solution would be to exclude the dependency in the modules which require a higher version, but that did not fix the unit tests. Any suggestions on the best way to solve this would be appreciated! Author: Calvin Jia <[email protected]> Closes apache#5354 from calvinjia/upgrade_tachyon_0.6.3 and squashes the following commits: 0eefe4d [Calvin Jia] Handle httpclient version in maven dependency management. Remove httpclient version setting from profiles. 7c00dfa [Calvin Jia] Set httpclient version to 4.3.2 for selenium. Specify version of httpclient for sql/hive (previously 4.2.5 transitive dependency of libthrift). 9263097 [Calvin Jia] Merge master to test latest changes dbfc1bd [Calvin Jia] Use Tachyon 0.6.4 for cleaner dependencies. e2ff80a [Calvin Jia] Exclude the jetty and curator promoted dependencies from tachyon-client. a3a29da [Calvin Jia] Update tachyon-client exclusions. 0ae6c97 [Calvin Jia] Change tachyon version to 0.6.3 a204df9 [Calvin Jia] Update make distribution tachyon version. a93c94f [Calvin Jia] Exclude jackson-mapper-asl from hadoop client since it has a lower version than spark's expected version. a8a923c [Calvin Jia] Exclude httpcomponents from Tachyon 910fabd [Calvin Jia] Update to master eed9230 [Calvin Jia] Update tachyon version to 0.6.1. 11907b3 [Calvin Jia] Use TachyonURI for tachyon paths instead of strings. 71bf441 [Calvin Jia] Upgrade Tachyon client version to 0.6.0.
Hi, I have tried applying the patch on spark-1.3.1. I am getting following error git apply --check /tmp/5354.patch launcher folder is not available when we download spark-1.3.1 source code. Any help. |
@vnkesarwani, try |
This is a reopening of apache#4867. A short summary of the issues resolved from the previous PR: 1. HTTPClient version mismatch: Selenium (used for UI tests) requires version 4.3.x, and Tachyon included 4.2.5 through a transitive dependency of its shaded thrift jar. To address this, Tachyon 0.6.3 will promote the transitive dependencies of the shaded jar so they can be excluded in spark. 2. Jackson-Mapper-ASL version mismatch: In lower versions of hadoop-client (ie. 1.0.4), version 1.0.1 is included. The parquet library used in spark sql requires version 1.8+. Its unclear to me why upgrading tachyon-client would cause this dependency to break. The solution was to exclude jackson-mapper-asl from hadoop-client. It seems that the dependency management in spark-parent will not work on transitive dependencies, one way to make sure jackson-mapper-asl is included with the correct version is to add it as a top level dependency. The best solution would be to exclude the dependency in the modules which require a higher version, but that did not fix the unit tests. Any suggestions on the best way to solve this would be appreciated! Author: Calvin Jia <[email protected]> Closes apache#5354 from calvinjia/upgrade_tachyon_0.6.3 and squashes the following commits: 0eefe4d [Calvin Jia] Handle httpclient version in maven dependency management. Remove httpclient version setting from profiles. 7c00dfa [Calvin Jia] Set httpclient version to 4.3.2 for selenium. Specify version of httpclient for sql/hive (previously 4.2.5 transitive dependency of libthrift). 9263097 [Calvin Jia] Merge master to test latest changes dbfc1bd [Calvin Jia] Use Tachyon 0.6.4 for cleaner dependencies. e2ff80a [Calvin Jia] Exclude the jetty and curator promoted dependencies from tachyon-client. a3a29da [Calvin Jia] Update tachyon-client exclusions. 0ae6c97 [Calvin Jia] Change tachyon version to 0.6.3 a204df9 [Calvin Jia] Update make distribution tachyon version. a93c94f [Calvin Jia] Exclude jackson-mapper-asl from hadoop client since it has a lower version than spark's expected version. a8a923c [Calvin Jia] Exclude httpcomponents from Tachyon 910fabd [Calvin Jia] Update to master eed9230 [Calvin Jia] Update tachyon version to 0.6.1. 11907b3 [Calvin Jia] Use TachyonURI for tachyon paths instead of strings. 71bf441 [Calvin Jia] Upgrade Tachyon client version to 0.6.0.
This is a reopening of #4867.
A short summary of the issues resolved from the previous PR:
It seems that the dependency management in spark-parent will not work on transitive dependencies, one way to make sure jackson-mapper-asl is included with the correct version is to add it as a top level dependency. The best solution would be to exclude the dependency in the modules which require a higher version, but that did not fix the unit tests. Any suggestions on the best way to solve this would be appreciated!