-
Notifications
You must be signed in to change notification settings - Fork 597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Java 17 and Spark 3.3.0. #8035
Java 17 and Spark 3.3.0. #8035
Conversation
Github actions tests reported job failures from actions build 3128674181
|
c0be97c
to
c7a0085
Compare
Github actions tests reported job failures from actions build 3130948787
|
Github actions tests reported job failures from actions build 3132106780
|
462b49b
to
1c6b6d2
Compare
Github actions tests reported job failures from actions build 3136014671
|
Github actions tests reported job failures from actions build 3236342061
|
Github actions tests reported job failures from actions build 3237120012
|
Github actions tests reported job failures from actions build 3413577203
|
Github actions tests reported job failures from actions build 3877173679
|
Github actions tests reported job failures from actions build 3987916695
|
@samuelklee I've worked through all of the issues with the other repos (Barclay, picard, etc.), so we're ready to move forward with Java 17 reviews, but the At one point when I looked into these, the values produced when the tests fail look similar(/identical ?) to the values that the tests USED to produce (i.e., I suspect if we reverted expected results files, the tests might succeed on the docker and fail on the non-docker). I'm not sure what is causing the instability, but this is going to become a blocking issue for this in the next couple of weeks. We also have the option of disabling them only on the docker (or non-docker), but I hate to do that. Can you take a look at this in the next week or so ? There is also one WDL test ( The other test failures (doc and wdl gen) can be ignored for now - they require code that is on another branch that is dependent on a new Barclay version. |
Thanks @cmnbroad, I can take a look starting Friday. In the meantime, could you highlight any possible differences between the Docker vs. non-Docker Java / native environments, if any come to mind? As for the WDL test, I would be very surprised if it was a memory issue---I think those tests build a panel with 2 samples over fewer than 100 intervals. Doing PCA on a 2 x 100 matrix shouldn't require 7GB...! |
Also just dropping a link to #8107, some other relevant links there as well. |
Looks like the Docker environment uses Java 17.0.1+12, while the non-Docker environment just specifies Java 17? Not sure what the latter resolves to, so maybe this isn’t the issue, but I’ll do more digging later. |
I did a simple experiment and changed the version of Java used in the non-Docker ("17", although again I'm not sure what this actually resolves to) to that used in the Docker (17.0.1+12). This causes both non-Docker and Docker tests to now fail, rather than just the Docker tests; see #8174 (comment). Moreover, the test failures produce exactly the same discrepant numerical results. I think we can probably conclude that the expected test results were generated with "17" and that changing to 17.0.1+12 generates different results. This is not too unreasonable; see the Slack thread linked in #8111 (comment), for example, which shows that we might be getting into pretty hairy territory and that even changes to things like how HotSpot Intrinsics are implemented in each JVM can cause the numerical differences we see here. So perhaps we can either 1) change the Docker version to the version corresponding to "17" or 2) change the non-Docker version to 17.0.1+12 and update the expected results? Not sure about the failing WDL test yet, but hopefully this is enough to get us started! |
@samuelklee Thats great - at least it gives us a way to move forward. I'm not surprised that there are differences between 8/11/17, but its a bit surprising that there is so much difference between Java 17 point releases. When I'm back next week I'll update my branch to use the same Java versions and expected results files. Thanks for looking into this! |
@cmnbroad could the failing WDL test simply be due to some Spark configuration issue, rather than memory? Locally, for both 1) the WDL test within the Docker and 2) CreateReadCountPanelOfNormalsIntegrationTest using 17.0.3 without the Docker, I seem to hit the exception discussed here: https://stackoverflow.com/questions/72724816/running-unit-tests-with-spark-3-3-0-on-java-17-fails-with-illegalaccesserror-cl Not sure why CreateReadCountPanelOfNormalsIntegrationTest seems to pass in the CI environments, but perhaps it'll be more obvious to you? Just for context, note that this tool relies on the Spark MLlib implementation of PCA. |
@samuelklee Oh interesting - the main jar and the test jar have directives - see for example the one for the main jar here - that should prevent that exception from happening (but only if you're running from a jar, otherwise you need to add a bunch of --add-exports manually to prevent that exception - I need to write up a wiki article on how to do that, since you need to do it to run tests from within the IDE). So it shouldn't happen on the docker, but if it is, maybe I missed a code path. I'll take a look. |
So the issue was that the manifest syntax for --add-opens that I was using was incorrect - its amazing that all the other tests pass that way - but I can now see how that happens. I have a fix and the test passes now. I just need to clean everything up. Thanks for looking @samuelklee. |
Github actions tests reported job failures from actions build 4294252456
|
Github actions tests reported job failures from actions build 4294588163
|
Github actions tests reported job failures from actions build 4295524165
|
Github actions tests reported job failures from actions build 4296321287
|
@Louisb I did wind up making the docgen and wdlgen and tab completion integration tests non-docker only, since they're failing when run on the docker. The tasks themselves seem to be working fine - these tests are really just here to do a quick smoke test, and to make it easy to debug issues in case of a failure. I suspect the failures are related to the way we run the tests on the docker using the 3 separate jars. So for now I made them non-docker only. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The final changes seem sane. I'm in favor of merging this and then fixing up the base image after it goes in since it's taking longer than anticipated.
@lbergelson or @cmnbroad: would you mind kicking off a build on this, so I can see how DISCVR-seq builds against it? Are you planning a GATK release any time soon? |
Thats a good idea. I don't know what our release plan is. I think we might
do one more minor release on java 8 for someone internal and then give a
bit of time before a java 17 release to let things shake out a bit. There
are a bunch of things we punted until it was merged that we want to get
done related to the update. I will put out a snapshot tonight or tomorrow
for you to test against. I wouldn't anticipate many major problems
updating although there might be some wrangling module exports which is
awfully confusing. The bulk of our issues had to do with fixing the
documentation generation and dealing with spark both of which hopefully
will just work for you.
…On Thu, Mar 2, 2023, 3:56 PM bbimber ***@***.***> wrote:
@lbergelson <https://github.com/lbergelson> or @cmnbroad
<https://github.com/cmnbroad>: would you mind kicking off a build on
this, so I can see how DISCVR-seq builds against it? Are you planning a
GATK release any time soon?
—
Reply to this email directly, view it on GitHub
<#8035 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABD3RLCRVXIM5Y3O2RC6IKTW2ECP3ANCNFSM6AAAAAAQV3ZLXM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
ok, thanks. DISCVRseq does use the doclet stuff to build javadoc, and I recall this being a little wonky to get working. it'd be helpful to figure out how easy/complex this is going to be. i'm all for migrating though. |
Hi, I'm a GATK4 user and I apologize in advance if these questions can be answered in documentation: I could not find answers to these myself. Are releases not made alongside master branch changes? Is the master branch not a stable branch? I am every excited to try GATK with Java 17, however I'm unsure if it's time to test since there is not a release for this pull request. |
@lbergelson: were you able to kick off a build? thanks in advance. |
@bbimber Ack, sorry, there's a snapshot up at: |
@jdmccauley Yes, we do releases from the master branch periodically -- there are 1 or 2 additional outstanding PRs we'd like to get merged before the next release, but it should not be much longer. In the mean time, if you're unable to build GATK from source, you could try running the latest prebuilt nightly docker image from https://hub.docker.com/r/broadinstitute/gatk-nightly/ |
@droazen Understood, thanks! I just pulled the jar out of the latest nightly docker image and it seems to run with Java 17 on my machine. Thanks for the tip, and I'm looking forward to the official release after those PRs! |
No description provided.