Optimize V2 engine test runner to have constant space complexity #7741

Eric-Arellano · 2019-05-16T17:35:22Z

The V2 Python test runner has a space complexity of O(t + t*e + b), where:

t is the number of targets
e is the number of env values used by any ExecuteProcessRequest, e.g.

pants/src/python/pants/backend/python/rules/python_test_runner.py

Line 96 in 155077e

env={'PATH': text_type(os.pathsep.join(python_setup.interpreter_search_paths))},

pants/src/python/pants/backend/python/rules/python_test_runner.py

Line 159 in 155077e

env={'PATH': text_type(os.pathsep.join(python_setup.interpreter_search_paths))},
b is the number of BUILD files in the transitive closure (i.e. all the HydratedTargets required by the original target)

The values in O(t + t*e + b) result as follows:

t corresponds to the stdout and stderr sent by each target.
- Could be addressed by extending workunits for declarative messaging in noninteractive v2 @rules #7071.
t * e is multiplied by t because the env variables get reserialized and persisted to memory for each target, even if the value has not changed.
b results from parse_address_family, which materializes and persists the BUILD file.
- The issue is not necessarily materializing the file, but that it persists. Why is this being persisted? For example, is AddressMapper holding on to the file?

See https://gist.github.com/Eric-Arellano/defca7d864a9f3939448964beea618d4#file-test_strutil_tarutil-txt for an example of what gets persisted after a test run finishes, and the below and above files to see how it scales.

This complexity specifically is for what is leftover after all the console rules have finished running, as defined by what is left in the ExternContent._handles(). See this diff for how the leftover values are found: https://gist.github.com/Eric-Arellano/defca7d864a9f3939448964beea618d4#file-d-diff.

Ideally this would be O(1), so that the V2 test runner is completely agnostic to the number of targets used.

Comparison to `./pants list`

./pants list currently has a space complexity O(1). It works around the space complexity of O(t + t*e + b) as follows:

b is not an issue because there is never more than one BUILD file per target.
t*e is not an issue because we do not use any env values.
t is not an issue because the engine does not send back stdout.

See https://gist.github.com/Eric-Arellano/defca7d864a9f3939448964beea618d4#file-list_strutil_tarutil-txt and the files above and below it for what is persisted after running ./pants list.

The implication is that ./pants list—and other V2 tasks—could suffer from the same space complexity issues, but in practice this rule works around it.

The text was updated successfully, but these errors were encountered:

### Problem Python requires packages to have an `__init__.py` file to be recognized as a proper module for the sake of imports. #7696 does this for Pytest, but inlines the logic, even though it will likely be helpful for other Python rules as well. Further, because this logic was originally written before being able to from `Digest->Snapshot` thanks to #7725, we had to use `FilesContent` to grab the paths of the digest. This would mean that every single source file would be materialized and persisted to memory, resulting in extremely high memory usage (found as part of investigation in #7741). There is no need for the actual content, just the paths, so this is a huge inefficiency. Will close #7715. ### Solution Generalize into `@rule(InitInjectedDigest, [Snapshot])`, where `InitInjectedDigest` is a thin wrapper around a `Digest`. We take a `Snapshot` because we need the file paths to work properly. This contrasts with earlier using `FileContents` to get the same paths. A `Snapshot` is much more light weight. We return a `Digest` because that is the minimum information necessary to work properly, and the caller of the rule can then convert that `Digest` back into a `Snapshot`. ### Result It will now be easier for other Python rules to work with Python packages. The unnecessary memory usage is now fixed. The V2 Pytest runner now has a space complexity of `O(t + t*e + b)`, rather than `O(t + t*e + s)`, where `t` is # targets, `e` is # env vars, `b` is # `BUILD` files, and `s` is # source files.

Eric-Arellano · 2019-06-19T21:28:21Z

We saw an out of memory error when running ./pants --no-v1 --v2 test tests/python/pants_test/engine:fs. It could not be reproduced locally.

Eric-Arellano · 2019-08-19T04:10:39Z

Closing because this hasn't been an issue anymore and at the moment appears to be a premature optimization. If we see an OOM again, we can re-open.

Eric-Arellano added the engine label May 16, 2019

Eric-Arellano mentioned this issue May 16, 2019

Extract a generalized V2 rule to inject __init__.py files #7722

Merged

Eric-Arellano closed this as completed Aug 19, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize V2 engine test runner to have constant space complexity #7741

Optimize V2 engine test runner to have constant space complexity #7741

Eric-Arellano commented May 16, 2019 •

edited

Loading

Eric-Arellano commented Jun 19, 2019

Eric-Arellano commented Aug 19, 2019

Optimize V2 engine test runner to have constant space complexity #7741

Optimize V2 engine test runner to have constant space complexity #7741

Comments

Eric-Arellano commented May 16, 2019 • edited Loading

Comparison to ./pants list

Eric-Arellano commented Jun 19, 2019

Eric-Arellano commented Aug 19, 2019

Eric-Arellano commented May 16, 2019 •

edited

Loading

Comparison to `./pants list`