Run majority of unit tests in CI with V2 Pytest runner #7724

Eric-Arellano · 2019-05-14T17:14:30Z

Problem

Beyond wanting to move everything towards V2 in general, this change will allow us to start remoting our unit tests for greater parallelism and less dependance on Travis.

Solution

We must make several changes to land this:

Use travis_wait to avoid a timeout. See https://docs.travis-ci.com/user/common-build-problems/#build-times-out-because-no-output-was-received.
Introduce a blacklist to run with V1. Tracked by Get all unit tests working with V2 test runner #7772.
Fix several BUILD files that were missing the proper dependencies and tags.
Remove support for sharding. This is not yet implemented in V2.

Result

CI is overall less useful. The unit tests now take 50 minutes to run, rather than 20. Likewise, the logging is less useful because the use of travis_wait means that it is not logged until the entire process completes, and currently we log stdout and stderr for successful tests, so there is more noise.

However, these costs are deemed worth it to allow us to soon use remoting.

build-support/bin/ci.sh

…use-v2

Turns out that travis_wait does not actually read the input to check if the process is still running. Instead, it says "you have up to x minutes for this process to complete, otherwise it will time out." It took 41 minutes for the Py2 unit tests to pass last time, so we set it to 50 minutes to avoid flakiness.

…rgets

stuhood

Let's do it.

illicitonion · 2019-05-21T12:21:22Z

Do you know the cause of the 250% perf regression? How much do we think is down to each of:

Resolving more (once-per-target instead of once per pants invocation)
Copying files into tempdirs more
Overhead of starting more python processes
Overhead of starting more pytest runs
?

Eric-Arellano · 2019-05-21T14:06:31Z

Do you know the cause of the 250% perf regression? How much do we think is down to each of:

I haven't quantitatively confirmed anything, but my intuition from working with the V2 test runner the past few weeks is that it's the result of resolving requirements for each individual test, as opposed to not having to resolve any requirements at all because we just use what's already in the venv.

I'm not sure how we could solve this...copying files from the venv into the subprocess sounds horribly error prone when we introduce remoting. Precomputing a Pex with every single permutation of dependency combinations would likely take too much memory, and it's not clear how we could get that cache up to Travis.

illicitonion · 2019-05-21T14:16:54Z

Do you know the cause of the 250% perf regression? How much do we think is down to each of:

I haven't quantitatively confirmed anything, but my intuition from working with the V2 test runner the past few weeks is that it's the result of resolving requirements for each individual test, as opposed to not having to resolve any requirements at all because we just use what's already in the venv.

Interesting... We should do one resolve per 3rdparty dependency combination, right? I'd be interested to know how many resolves we're actually doing... From some quick grepping, it looks like we have 38 3rdparty deps from our python code, and I would be surprised if we were doing more than maybe 30 resolves, and 30 minutes is a long time for 30 resolves to be happening, especially in parallel...

I guess the easiest way to find this out right now would be to grab a --native-engine-visualize-to trace for a whole run, and look at the graph...

stuhood · 2019-05-21T17:50:37Z

Do you know the cause of the 250% perf regression? How much do we think is down to each of:

I haven't quantitatively confirmed anything, but my intuition from working with the V2 test runner the past few weeks is that it's the result of resolving requirements for each individual test, as opposed to not having to resolve any requirements at all because we just use what's already in the venv.

Interesting... We should do one resolve per 3rdparty dependency combination, right? I'd be interested to know how many resolves we're actually doing... From some quick grepping, it looks like we have 38 3rdparty deps from our python code, and I would be surprised if we were doing more than maybe 30 resolves, and 30 minutes is a long time for 30 resolves to be happening, especially in parallel...

+1. Also, it's worth noting that caching (either pantsd, local as in #6898, or remote) should completely eliminate the resolve overhead in 99% of cases.

I guess the easiest way to find this out right now would be to grab a --native-engine-visualize-to trace for a whole run, and look at the graph...

That or getting the zipkin patch relanded. I suspect that visualize-to is too noisy for this case...

…use-v2

…ep cycle PR fix

@cosmicexplorer

…/python/subsystems` (#7793) ### Problem `backend/python/subsystems/python_native_code.py` was depending on `pants.backend.native.subsystems`, but not declaring the dependency in its `BUILD` file. The naive solution of adding the proper depedency to its `BUILD` results in a dependency cycle, as `backend/native/subsystems/conan.py` already depends depending on `pants.backend.python.subsystems.python_tool_base`. So, `./pants test tests/python/pants_test/backend/python/tasks:pytest_run` would fail when ran directly, but pass in CI because all of the targets would get the sources combined. When changing to using the V2 test runner, we no longer allow this sort of leaky dependency usage—every dependency must be properly declared in the appropriate `BUILD` file. So, `./pants test tests/python/pants_test/backend/python/tasks:pytest_run` started failing in #7724. ### Solution Move `conan.py` into a new subdirectory `backend/native/subsystems/packaging`, as suggested by @cosmicexplorer.

…use-v2

Eric-Arellano · 2019-05-23T16:48:16Z

.travis.yml

@@ -947,23 +947,23 @@ matrix:
        - *py27_linux_test_config_env
        - CACHE_NAME=linuxunittests.py27
      script:
-        - ./build-support/bin/ci.sh -2lp
+        - travis_wait 50 ./build-support/bin/ci.sh -2lp


Frustrating UX experience of using travis_wait: the resulting std{out,err} will be stripped off any color. We normally print those in red upon test failures to make it easier to scan.

We won't be able to remove travis_wait until improving the performance via #7795 or via remoting.

…build#7724)" This reverts commit c9ea445.

### Problem Now that we use the V2 test runner as of #7724, unit tests both take much longer (20 minutes -> 40-50 minutes) and have become very flaky (not exclusively thanks to V2). Especially because the tests flake so much, it is frustrating to have to wait a whole 50 minutes to rerun the shard. ### Solution We can't use automated sharding because V2 does not support that yet, but we can introduce our own manual shards. One shard runs the V2 tests, and the other runs all blacklisted tests, the contrib tests, and the `pants-plugin` tests. ### Result Flakes will be slightly less painful, because when something flakes you will not have to run the entire 50 minutes of CI again, but just the subset for that specific shard.

Eric-Arellano force-pushed the unit-tests-use-v2 branch 4 times, most recently from 3e72aab to 359cc48 Compare May 17, 2019 15:29

Eric-Arellano mentioned this pull request May 20, 2019

Get all unit tests working with V2 test runner #7772

Closed

Eric-Arellano added 7 commits May 20, 2019 12:05

Use V2 test runner to run unit tests in CI

3ca1648

Properly cleanup the targets.txt file

e9b7d34

Restore accidentally deleted tests:: target

bcfa71a

Use travis_wait because Py3 shard timed out

3aa87d9

Fix issues with targets due to bad BUILD entries

8c0d4bc

Add V2 blacklist

48c42a4

Fix whitespace

512833a

Eric-Arellano force-pushed the unit-tests-use-v2 branch from 359cc48 to 512833a Compare May 20, 2019 19:05

Eric-Arellano changed the title ~~WIP: Run unit tests in CI with V2 Pytest runner~~ Run unit tests in CI with V2 Pytest runner May 20, 2019

Eric-Arellano requested review from illicitonion, jsirois, cosmicexplorer and stuhood May 20, 2019 19:09

jsirois approved these changes May 20, 2019

View reviewed changes

build-support/bin/ci.sh Outdated Show resolved Hide resolved

jsirois approved these changes May 20, 2019

View reviewed changes

build-support/bin/ci.sh Outdated Show resolved Hide resolved

Eric-Arellano changed the title ~~Run unit tests in CI with V2 Pytest runner~~ Run majority of unit tests in CI with V2 Pytest runner May 20, 2019

stuhood mentioned this pull request May 20, 2019

Begin remote execution of pants' own unit tests in Travis #7649

Closed

4 tasks

Eric-Arellano added 5 commits May 20, 2019 18:04

Merge branch 'master' of github.com:pantsbuild/pants into unit-tests-…

fa083ff

…use-v2

Fix Python 2 test runner failing

c3f0a65

Fix blacklist by adding new target and using explicit name for two ta…

bee3fbd

…rgets

Add two more tests to blacklist

2765348

stuhood approved these changes May 21, 2019

View reviewed changes

Avoid shellcheck issues thanks to --target-spec-file

bce7b25

Eric-Arellano added 2 commits May 21, 2019 09:46

Actually fix Python 2 text_type regression for encoding environment

c0a65dd

Fix init test by not loading unnecessary backend_packages

f3eccff

Merge branch 'master' of github.com:pantsbuild/pants into unit-tests-…

248acbd

…use-v2

Eric-Arellano mentioned this pull request May 23, 2019

Fix dependency cycle between backend/native/subsystems and backend/python/subsystems #7793

Merged

Remove test_python_test_runner.py from blacklist in anticipation of d…

121a543

…ep cycle PR fix

Eric-Arellano mentioned this pull request May 23, 2019

Investigate why V2 test runner is slower than V1 #7795

Closed

Merge branch 'master' of github.com:pantsbuild/pants into unit-tests-…

41dff76

…use-v2

Eric-Arellano commented May 23, 2019

View reviewed changes

Eric-Arellano merged commit c9ea445 into pantsbuild:master May 23, 2019

Eric-Arellano deleted the unit-tests-use-v2 branch May 23, 2019 21:00

blorente pushed a commit to blorente/pants that referenced this pull request May 28, 2019

Revert "Run majority of unit tests in CI with V2 Pytest runner (pants…

fe0155f

…build#7724)" This reverts commit c9ea445.

This was referenced Jun 4, 2019

Run contrib tests with unit tests and integration tests for fewer CI delays #7709

Merged

Split up CI unit tests into two distinct shards #7867

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run majority of unit tests in CI with V2 Pytest runner #7724

Run majority of unit tests in CI with V2 Pytest runner #7724

Eric-Arellano commented May 14, 2019 •

edited

Loading

stuhood left a comment

illicitonion commented May 21, 2019

Eric-Arellano commented May 21, 2019

illicitonion commented May 21, 2019

stuhood commented May 21, 2019

Eric-Arellano May 23, 2019

Run majority of unit tests in CI with V2 Pytest runner #7724

Run majority of unit tests in CI with V2 Pytest runner #7724

Conversation

Eric-Arellano commented May 14, 2019 • edited Loading

Problem

Solution

Result

stuhood left a comment

Choose a reason for hiding this comment

illicitonion commented May 21, 2019

Eric-Arellano commented May 21, 2019

illicitonion commented May 21, 2019

stuhood commented May 21, 2019

Eric-Arellano May 23, 2019

Choose a reason for hiding this comment

Eric-Arellano commented May 14, 2019 •

edited

Loading