-
-
Notifications
You must be signed in to change notification settings - Fork 652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize V2 engine test runner to have constant space complexity #7741
Comments
Eric-Arellano
added a commit
that referenced
this issue
May 16, 2019
### Problem Python requires packages to have an `__init__.py` file to be recognized as a proper module for the sake of imports. #7696 does this for Pytest, but inlines the logic, even though it will likely be helpful for other Python rules as well. Further, because this logic was originally written before being able to from `Digest->Snapshot` thanks to #7725, we had to use `FilesContent` to grab the paths of the digest. This would mean that every single source file would be materialized and persisted to memory, resulting in extremely high memory usage (found as part of investigation in #7741). There is no need for the actual content, just the paths, so this is a huge inefficiency. Will close #7715. ### Solution Generalize into `@rule(InitInjectedDigest, [Snapshot])`, where `InitInjectedDigest` is a thin wrapper around a `Digest`. We take a `Snapshot` because we need the file paths to work properly. This contrasts with earlier using `FileContents` to get the same paths. A `Snapshot` is much more light weight. We return a `Digest` because that is the minimum information necessary to work properly, and the caller of the rule can then convert that `Digest` back into a `Snapshot`. ### Result It will now be easier for other Python rules to work with Python packages. The unnecessary memory usage is now fixed. The V2 Pytest runner now has a space complexity of `O(t + t*e + b)`, rather than `O(t + t*e + s)`, where `t` is # targets, `e` is # env vars, `b` is # `BUILD` files, and `s` is # source files.
We saw an out of memory error when running |
Closing because this hasn't been an issue anymore and at the moment appears to be a premature optimization. If we see an OOM again, we can re-open. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The V2 Python test runner has a space complexity of
O(t + t*e + b)
, where:t
is the number of targetse
is the number ofenv
values used by anyExecuteProcessRequest
, e.g.pants/src/python/pants/backend/python/rules/python_test_runner.py
Line 96 in 155077e
pants/src/python/pants/backend/python/rules/python_test_runner.py
Line 159 in 155077e
b
is the number ofBUILD
files in the transitive closure (i.e. all theHydratedTarget
s required by the original target)The values in
O(t + t*e + b)
result as follows:t
corresponds to thestdout
andstderr
sent by each target.t * e
is multiplied byt
because the env variables get reserialized and persisted to memory for each target, even if the value has not changed.b
results fromparse_address_family
, which materializes and persists theBUILD
file.AddressMapper
holding on to the file?See https://gist.github.com/Eric-Arellano/defca7d864a9f3939448964beea618d4#file-test_strutil_tarutil-txt for an example of what gets persisted after a test run finishes, and the below and above files to see how it scales.
This complexity specifically is for what is leftover after all the console rules have finished running, as defined by what is left in the
ExternContent._handles()
. See this diff for how the leftover values are found: https://gist.github.com/Eric-Arellano/defca7d864a9f3939448964beea618d4#file-d-diff.Ideally this would be
O(1)
, so that the V2 test runner is completely agnostic to the number of targets used.Comparison to
./pants list
./pants list
currently has a space complexityO(1)
. It works around the space complexity ofO(t + t*e + b)
as follows:b
is not an issue because there is never more than oneBUILD
file per target.t*e
is not an issue because we do not use anyenv
values.t
is not an issue because the engine does not send back stdout.See https://gist.github.com/Eric-Arellano/defca7d864a9f3939448964beea618d4#file-list_strutil_tarutil-txt and the files above and below it for what is persisted after running
./pants list
.The implication is that
./pants list
—and other V2 tasks—could suffer from the same space complexity issues, but in practice this rule works around it.The text was updated successfully, but these errors were encountered: