-
-
Notifications
You must be signed in to change notification settings - Fork 646
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test_java_binary_versions
from javac_binary_test.py
is flaky
#12293
Comments
From my read of the stack trace (haven't debugged further yet), I have two hypothesis:
I'll dig in further, but for now I'm going to try to reduce the number of unique JVMs bootstrapped by our tests in order to discount or at least reduce the disk pressure in the case of the latter. |
…uce CI flakiness (#12325) See #12293 (comment) for motivation. This commit has no functional change, and almost certainly doesn't reduce the efficacy of the existing tests in any substantial way. [ci skip-rust] [ci skip-build-wheels]
A related failure:
@patricklaw should we maybe add pytest retries to these tests? |
Another one:
@patricklaw any insight into why these are so flaky? Do you think retries make sense? |
See #12293. These seem to be failing around 5-10% of builds in CI. Because this backend is still experimental, there's less risk in skipping the tests. We can stabilize and re-enable before productionizing the backend. [ci skip-rust] [ci skip-build-wheels]
…antsbuild#12293. This causes JavacBinary to pass the `--system-jvm` option to Coursier, which selects whichever JDK Coursier happens to find already installed on the system. Long term, we shouldn't use this behavior in tests and probably shouldn't even allow it at all since the inherent non-hermeticity isn't properly captured in cache keys, but for now it allows testing of Coursier-dependent backends in CI until pantsbuild#12293 is resolved properly.
…antsbuild#12293. This causes JavacBinary to pass the `--system-jvm` option to Coursier, which selects whichever JDK Coursier happens to find already installed on the system. Long term, we shouldn't use this behavior in tests and probably shouldn't even allow it at all since the inherent non-hermeticity isn't properly captured in cache keys, but for now it allows testing of Coursier-dependent backends in CI until pantsbuild#12293 is resolved properly.
…antsbuild#12293. This causes JavacBinary to pass the `--system-jvm` option to Coursier, which selects whichever JDK Coursier happens to find already installed on the system. Long term, we shouldn't use this behavior in tests and probably shouldn't even allow it at all since the inherent non-hermeticity isn't properly captured in cache keys, but for now it allows testing of Coursier-dependent backends in CI until pantsbuild#12293 is resolved properly.
This implementation is just good enough to demonstrate how to use the existing Pants Java infrastructure to compile and consume Java source for the purpose of executing junit tests. This initial iteration has several limitations: * jUnit5 (org.junit.platform:junit-platform-console:1.7.2) is hard-coded as the JUnit runner in the rule source. As needed, this can be hoisted into a subsystem for configurability. By design, junit4 is not supported as a **runner**, because its classpath scanning isn't powerful enough. However, junit4 tests can still be run with the junit5 runner. * junit_tests targets have the same requirement of java_library targets that there must be exactly 1 coursier_lockfile dependency in the transitive closure of the junit_tests target. In practice this means that any third party dependencies required by the test source must also be shared by the library targets upon which the test transitively depends. Lockfile subsetting will mostly make this a non-issue, but it's still unfortunate that all test targets are indirectly locked to all other test targets that transitively depend on the same Java library code. * Due to pantsbuild#12293, the test runner currently hard-codes the Coursier `--system-jvm` argument. Future revisions will expose this as an option via `junit_test` parameters and/or a junit subsystem. [ci skip-rust] [ci skip-build-wheels]
This implementation is just good enough to demonstrate how to use the existing Pants Java infrastructure to compile and consume Java source for the purpose of executing junit tests. This initial iteration has several limitations: * jUnit5 (org.junit.platform:junit-platform-console:1.7.2) is hard-coded as the JUnit runner in the rule source. As needed, this can be hoisted into a subsystem for configurability. By design, junit4 is not supported as a **runner**, because its classpath scanning isn't powerful enough. However, junit4 tests can still be run with the junit5 runner. * junit_tests targets have the same requirement of java_library targets that there must be exactly 1 coursier_lockfile dependency in the transitive closure of the junit_tests target. In practice this means that any third party dependencies required by the test source must also be shared by the library targets upon which the test transitively depends. Lockfile subsetting will mostly make this a non-issue, but it's still unfortunate that all test targets are indirectly locked to all other test targets that transitively depend on the same Java library code. * Due to pantsbuild#12293, the test runner currently hard-codes the Coursier `--system-jvm` argument. Future revisions will expose this as an option via `junit_test` parameters and/or a junit subsystem. [ci skip-rust] [ci skip-build-wheels]
This implementation is just good enough to demonstrate how to use the existing Pants Java infrastructure to compile and consume Java source for the purpose of executing junit tests. This initial iteration has several limitations: * jUnit5 (org.junit.platform:junit-platform-console:1.7.2) is hard-coded as the JUnit runner in the rule source. As needed, this can be hoisted into a subsystem for configurability. By design, junit4 is not supported as a **runner**, because its classpath scanning isn't powerful enough. However, junit4 tests can still be run with the junit5 runner. * junit_tests targets have the same requirement of java_library targets that there must be exactly 1 coursier_lockfile dependency in the transitive closure of the junit_tests target. In practice this means that any third party dependencies required by the test source must also be shared by the library targets upon which the test transitively depends. Lockfile subsetting will mostly make this a non-issue, but it's still unfortunate that all test targets are indirectly locked to all other test targets that transitively depend on the same Java library code. * Due to pantsbuild#12293, the test runner currently hard-codes the Coursier `--system-jvm` argument. Future revisions will expose this as an option via `junit_test` parameters and/or a junit subsystem. [ci skip-rust] [ci skip-build-wheels]
This implementation is just good enough to demonstrate how to use the existing Pants Java infrastructure to compile and consume Java source for the purpose of executing junit tests. This initial iteration has several limitations: * jUnit5 (org.junit.platform:junit-platform-console:1.7.2) is hard-coded as the JUnit runner in the rule source. As needed, this can be hoisted into a subsystem for configurability. By design, junit4 is not supported as a **runner**, because its classpath scanning isn't powerful enough. However, junit4 tests can still be run with the junit5 runner. * junit_tests targets have the same requirement of java_library targets that there must be exactly 1 coursier_lockfile dependency in the transitive closure of the junit_tests target. In practice this means that any third party dependencies required by the test source must also be shared by the library targets upon which the test transitively depends. Lockfile subsetting will mostly make this a non-issue, but it's still unfortunate that all test targets are indirectly locked to all other test targets that transitively depend on the same Java library code. * Due to pantsbuild#12293, the test runner currently hard-codes the Coursier `--system-jvm` argument. Future revisions will expose this as an option via `junit_test` parameters and/or a junit subsystem. [ci skip-rust] [ci skip-build-wheels]
…12293. (#12425) * Add special casing for '--javac-jdk=system' as a temporary hack for #12293. This causes JavacBinary to pass the `--system-jvm` option to Coursier, which selects whichever JDK Coursier happens to find already installed on the system. Long term, we shouldn't use this behavior in tests and probably shouldn't even allow it at all since the inherent non-hermeticity isn't properly captured in cache keys, but for now it allows testing of Coursier-dependent backends in CI until #12293 is resolved properly. * Also skip test_compile_jdk_versions in javac_test.py, which forces other JVM versions. [ci skip-rust] [ci skip-build-wheels]
Hm, this flaked on
|
The failed download here is the JDK, not a fetched jar. Maybe we should just host the JDK somewhere or allow a JDK in the Pants CI Docker image? |
Alternate idea: embed the JDK archive in the CI Docker image and then run a web server during the integration test pointing to that JDK archive. |
Hmm the actual error is:
So the extraction failed because the directory where it was unpacking was not empty. |
Potentially related: coursier/coursier#1815 Theory: Multiple concurrent Coursier runs are racing with each other. |
Also potentially related: coursier/coursier#1818 |
This implementation is just good enough to demonstrate how to use the existing Pants Java infrastructure to compile and consume Java source for the purpose of executing junit tests. This initial iteration has several limitations: * jUnit5 (org.junit.platform:junit-platform-console:1.7.2) is hard-coded as the JUnit runner in the rule source. As needed, this can be hoisted into a subsystem for configurability. By design, junit4 is not supported as a **runner**, because its classpath scanning isn't powerful enough. However, junit4 tests can still be run with the junit5 runner. * junit_tests targets have the same requirement of java_library targets that there must be exactly 1 coursier_lockfile dependency in the transitive closure of the junit_tests target. In practice this means that any third party dependencies required by the test source must also be shared by the library targets upon which the test transitively depends. Lockfile subsetting will mostly make this a non-issue, but it's still unfortunate that all test targets are indirectly locked to all other test targets that transitively depend on the same Java library code. * Due to pantsbuild#12293, the test runner currently hard-codes the Coursier `--system-jvm` argument. Future revisions will expose this as an option via `junit_test` parameters and/or a junit subsystem. [ci skip-rust] [ci skip-build-wheels]
This implementation is just good enough to demonstrate how to use the existing Pants Java infrastructure to compile and consume Java source for the purpose of executing junit tests. This initial iteration has several limitations: * jUnit5 (org.junit.platform:junit-platform-console:1.7.2) is hard-coded as the JUnit runner in the rule source. As needed, this can be hoisted into a subsystem for configurability. By design, junit4 is not supported as a **runner**, because its classpath scanning isn't powerful enough. However, junit4 tests can still be run with the junit5 runner. * junit_tests targets have the same requirement of java_library targets that there must be exactly 1 coursier_lockfile dependency in the transitive closure of the junit_tests target. In practice this means that any third party dependencies required by the test source must also be shared by the library targets upon which the test transitively depends. Lockfile subsetting will mostly make this a non-issue, but it's still unfortunate that all test targets are indirectly locked to all other test targets that transitively depend on the same Java library code. * Due to #12293, the test runner currently hard-codes the Coursier `--system-jvm` argument. Future revisions will expose this as an option via `junit_test` parameters and/or a junit subsystem. [ci skip-rust] [ci skip-build-wheels]
So it seems like the workaround I added didn't actually fix the issue, because Coursier still needed to fetch a JVK even when Tom: Those Coursier bugs look spot on. My initial suspicion was that this was a race condition of some kind in Coursier's JVM fetch logic. |
I think that there might be a fairly straightforward |
I'll look into this. |
Side note: I found the https://github.com/shyiko/jabba project today which is a JVM version manager (as pyenv is for Python). Coursier actually uses the JVM index from |
@stuhood: Does memoization only operate if pantsd is running? |
No, always. |
## Motivation As described in #12293, multiple Coursier invocations were downloading the JDK and triggering [a race condition in Coursier's locking](coursier/coursier#1815) that caused flakiness in tests. ## Solution This PR mitigates the issue by isolating JDK download to a single `Process`. The new `JdkSetup` type provides rules with the command to obtain the location of the JDK so they may query Coursier for JAVA_HOME. This has the benefit of still downloading in remote execution, but providing some guarantee that there will be a single download.
FYI @tdyas, seen on
|
So even with #12972, concurrent instances of Coursier are still trying to download a JDK. Maybe the GitHub Actions config is missing the JDK setup config is some job where the JDK is needed? Related, I believe we should give up on using Coursier to download the JDK. For now, we should require that it be installed in the system (whether that be local or remote). |
That is the effect of the |
Note the error was |
@tdyas : Argh. I realized the issue with #12293 (comment) ... each test process (we're up to 6ish test files that invoke the JDK) is still concurrent, and they're sharing the system |
But the conflict is happening in the Coursier cache directory, right? And not the named cache dir? |
...yea, maybe. But if we're not pointing the Coursier cache directory into the Additionally, we could use a stable named cache directory for integration tests, but concurrency control it with https://www.pantsbuild.org/docs/reference-pytest#section-execution-slot-var (i.e., effectively have N subdirectories)... that wouldn't avoid this issue in production when multiple repos are sharing a directory, but it might give us breathing room for tests. |
…and use them for Coursier (#13046) Coursier is used to fetch JVMs and artifacts for Java support. On `main`, it uses its default cache directory, but as shown in #12293, this has concurrency issues when multiple clients are fetching JVMs at the same time. The underlying issue is likely to be fixed by coursier/coursier#2197, but in the meantime we can move to using `append_only_caches` for the `coursier` `--cache-dir` and `--jvm-dir`. Alone, this isn't sufficient to avoid concurrency issues (since fetches would simply collide in a new location): but it allows us to re-configure the cache location in `RuleRunner` tests using the `[pytest] execution_slot_var` to prevent collisions. Along the way, support for mixing `append_only_caches` and `use_nailgun` needed fixing. Re-enables running of JVM tests by default, and fixes #12293.
This failed in the
main
branch: https://github.com/pantsbuild/pants/runs/3010851715?check_suite_focus=true#step:11:919cc @patricklaw. I haven't had time to investigate
The text was updated successfully, but these errors were encountered: