-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-11416][BUILD] Update to Chill 0.8.0 & Kryo 3.0.3 #12076
Conversation
@@ -122,7 +122,7 @@ jsr305-1.3.9.jar | |||
jta-1.1.jar | |||
jtransforms-2.4.0.jar | |||
jul-to-slf4j-1.7.16.jar | |||
kryo-2.21.jar | |||
kryo-shaded-3.0.3.jar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks to this shading, we no longer need to add exclusions for Kryo's ASM dependencies.
No API change? |
Seemingly none that affect us so far. There are some binary incompatibilities in |
Test build #54558 has finished for PR 12076 at commit
|
Test failures appear unrelated. |
Jenkins, retest this please. |
Test build #54571 has finished for PR 12076 at commit
|
Huh, weird. It looks like this somehow is a legitimate failure. I'll investigate. |
quick glance, it looked to me like the failures were hive related. I don't know if you use any hive code, but could the fact that hive also uses kryo create a diamond that gives a runtime failure? Just a shot in the dark. |
Btw, we'll have a kryo 3.1.0 release soon (depends on pending fixes regarding handling of generics). This will contain an incompatibility regarding Closure handling (EsotericSoftware/kryo#415), not sure if that's relevant for you. |
Yep, I think that this is due to conflicts with Hive's version of Kryo. @vanzin, @steveloughran: Given that our execution Hive (1.2.1) uses Kryo 2.x, do we need to upgrade to Hive 2 in order to use a Kryo 3.x-compatible version of Hive (HIVE-12175)? |
I haven't looked at the failures here, but the upstream version of Hive shades Kryo, so hopefully it wouldn't be affected. But I don't know whether Steve made any changes to Spark's fork. |
Here's the branch that our custom Looking at the changes there, it appears that our custom Hive 1.2.1 downgrades Kryo from 2.22 to 2.21. However, it appears that our custom build undoes the shading of Kryo and adds shading of Protobuf. |
I understand why we needed to add shading of Protobuf, but why did we need to remove shading of Kryo? Since I suspect that we do not want to couple the upgrading to Kryo 3 with an upgrade to Hive 2.0, would it make sense to publish a |
It should be possible to leave Kryo shaded, but it will require some code changes. We do that in CDH, here's our diff: https://github.com/cloudera/spark/commit/0d8339b6f45460bba67a53ca88d423a301ab98c3#diff-57959d009ae01f8799a2c4d461e96a76 I think that was the only kryo-related change, although I made that change a while ago and might have forgotten other places. |
how is kryo used in hive? i suspect we might no longer be using that in spark 2.0, once we get rid of all the ddls. |
Upgrading to hive 2 would finally get everything in sync: ideally eliminate the need to have a custom hive JAR at all. Essentially all spark needs is a version of Hive with parameters that can be exchanged across all needed methods, with the conflict packages shaded, and the non-conflict packages generally omitted If you want to find out where things don't link, edit the spark pom to pull in org.apache.hive/hive and see what breaks. (of course, the other thing that would be nice would be for Hive to make it possible to subclass their thrift service cleanly; I've got a PoC of that somewhere ... it's not hard to add) |
...Thinking about this; it might be possible to go to hive with a shaded kryo, with the invocation of those methods which exchange kryo types referring to the shaded values. That doesn't do anything for marshalling stuff between kryo versions |
Grepping
[de]serializeObjectByKryo methods' signatures.
I don't spot any other Hive-specific uses of Kryo which look problematic, so it seems plausible that we can shade without major problems. Am I overlooking other uses of Kryo to exchange data between Spark and Hive internals? |
I don't recall doing any reflection related stuff to work with kryo; problems were showing up in compilation. There's one more thing to worry about, hive minimal vs uber jars. It may still be necessary to have a custom 1.2.x JAR, just to keep all the other cruft out. |
My understanding so far:
Therefore, I think there's no avoiding having to publish another custom Hive 1.2.1 build. I propose that we do so by editing the 1.2.1-spark POM to restore Kryo shading. I'll work on this tomorrow and loop back to this discussion once I've completed the required Hive dep. bumps. |
Makes sense. Note that getting the SBT dependencies to match maven's was a complete nightmare; it'd probably have been easier to write a new version reconciler for ivy than solve it for spark alone. You shouldn't have to do much to that listing as it stands today, but updating to later hive versions will probably hurt again. FWIW, Hadoop 2.x isn't going to increment protobuf, I'm confident of that to the extent that there are enough people who would veto any attempt. Worry more about guava and jackson |
Splitting the Hive Kryo shading into a new JIRA: https://issues.apache.org/jira/browse/SPARK-14435 |
Test build #55257 has finished for PR 12076 at commit
|
Test build #55265 has finished for PR 12076 at commit
|
This patch updates our custom Hive 1.2.1 fork in order to shade Kryo in Hive. This is a blocker for upgrading Spark to use Kryo 3 (see apache#12076). The source for this new fork of Hive can be found at https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2 Here's the complete diff from the official Hive 1.2.1 release: apache/hive@release-1.2.1...JoshRosen:release-1.2.1-spark2 Here's the diff from the sources that pwendell used to publish the current `1.2.1.spark` release of Hive: pwendell/hive@release-1.2.1-spark...JoshRosen:release-1.2.1-spark2. This diff looks large because his branch used a shell script to rewrite the groupId, whereas I had to commit the groupId changes in order to prevent the find-and-replace from affecting the package names in our relocated Kryo classes: pwendell/hive@release-1.2.1-spark...JoshRosen:release-1.2.1-spark2#diff-6ada9aaec70e069df8f2c34c5519dd1e Using these changes, I was able to publish a local version of Hive and verify that this change fixes the test failures which are blocking apache#12076. Note that this PR will not compile until we complete the review of the Hive POM changes and stage and publish a release. /cc vanzin, steveloughran, and pwendell for review. Author: Josh Rosen <[email protected]> Closes apache#12215 from JoshRosen/shade-kryo-in-hive.
The Hive conflict should now be fixed now that #12215 has been merged, so I've updated this and am planning to merge as soon as tests pass. |
Test build #55393 has finished for PR 12076 at commit
|
Alright, I'm going to go ahead and merge this. We'll upgrade to 2.12-friendly versions of Chill + Kryo as part of the 2.12 testing patch which will come out later. |
This patch upgrades Chill to 0.8.0 and Kryo to 3.0.3. While we'll likely need to bump these dependencies again before Spark 2.0 (due to SPARK-14221 / twitter/chill#252), I wanted to get the bulk of the Kryo 2 -> Kryo 3 migration done now in order to figure out whether there are any unexpected surprises. Author: Josh Rosen <[email protected]> Closes apache#12076 from JoshRosen/kryo3.
This patch upgrades Chill to 0.8.0 and Kryo to 3.0.3. While we'll likely need to bump these dependencies again before Spark 2.0 (due to SPARK-14221 / twitter/chill#252), I wanted to get the bulk of the Kryo 2 -> Kryo 3 migration done now in order to figure out whether there are any unexpected surprises.