Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize V2 engine test runner to have constant space complexity #7741

Closed
Eric-Arellano opened this issue May 16, 2019 · 2 comments
Closed

Comments

@Eric-Arellano
Copy link
Contributor

Eric-Arellano commented May 16, 2019

The V2 Python test runner has a space complexity of O(t + t*e + b), where:

The values in O(t + t*e + b) result as follows:

  • t corresponds to the stdout and stderr sent by each target.
  • t * e is multiplied by t because the env variables get reserialized and persisted to memory for each target, even if the value has not changed.
  • b results from parse_address_family, which materializes and persists the BUILD file.
    • The issue is not necessarily materializing the file, but that it persists. Why is this being persisted? For example, is AddressMapper holding on to the file?

See https://gist.github.com/Eric-Arellano/defca7d864a9f3939448964beea618d4#file-test_strutil_tarutil-txt for an example of what gets persisted after a test run finishes, and the below and above files to see how it scales.

This complexity specifically is for what is leftover after all the console rules have finished running, as defined by what is left in the ExternContent._handles(). See this diff for how the leftover values are found: https://gist.github.com/Eric-Arellano/defca7d864a9f3939448964beea618d4#file-d-diff.

Ideally this would be O(1), so that the V2 test runner is completely agnostic to the number of targets used.

Comparison to ./pants list

./pants list currently has a space complexity O(1). It works around the space complexity of O(t + t*e + b) as follows:

  • b is not an issue because there is never more than one BUILD file per target.
  • t*e is not an issue because we do not use any env values.
  • t is not an issue because the engine does not send back stdout.

See https://gist.github.com/Eric-Arellano/defca7d864a9f3939448964beea618d4#file-list_strutil_tarutil-txt and the files above and below it for what is persisted after running ./pants list.

The implication is that ./pants list—and other V2 tasks—could suffer from the same space complexity issues, but in practice this rule works around it.

Eric-Arellano added a commit that referenced this issue May 16, 2019
### Problem
Python requires packages to have an `__init__.py` file to be recognized as a proper module for the sake of imports.

#7696 does this for Pytest, but inlines the logic, even though it will likely be helpful for other Python rules as well.

Further, because this logic was originally written before being able to from `Digest->Snapshot` thanks to #7725, we had to use `FilesContent` to grab the paths of the digest. This would mean that every single source file would be materialized and persisted to memory, resulting in extremely high memory usage (found as part of investigation in #7741). There is no need for the actual content, just the paths, so this is a huge inefficiency.

Will close #7715.

### Solution
Generalize into `@rule(InitInjectedDigest, [Snapshot])`, where `InitInjectedDigest` is a thin wrapper around a `Digest`.

We take a `Snapshot` because we need the file paths to work properly. This contrasts with earlier using `FileContents` to get the same paths. A `Snapshot` is much more light weight.

We return a `Digest` because that is the minimum information necessary to work properly, and the caller of the rule can then convert that `Digest` back into a `Snapshot`.

### Result
It will now be easier for other Python rules to work with Python packages.

The unnecessary memory usage is now fixed. The V2 Pytest runner now has a space complexity of `O(t + t*e + b)`, rather than `O(t + t*e + s)`, where `t` is # targets, `e` is # env vars, `b` is # `BUILD` files, and `s` is # source files.
@Eric-Arellano
Copy link
Contributor Author

We saw an out of memory error when running ./pants --no-v1 --v2 test tests/python/pants_test/engine:fs. It could not be reproduced locally.

@Eric-Arellano
Copy link
Contributor Author

Closing because this hasn't been an issue anymore and at the moment appears to be a premature optimization. If we see an OOM again, we can re-open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant