-
-
Notifications
You must be signed in to change notification settings - Fork 652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Engine executions for multiple SingleAddresses are redundant #4533
Comments
This was referenced Mar 7, 2018
Hm... it's also possible that by fixing #4358 (ie, removing |
stuhood
pushed a commit
that referenced
this issue
Mar 14, 2018
### Problem A few more performance issues were discovered in `./pants --changed=.. list` due to the removal of the v1 engine: * In order to calculate `direct` or `transitive` dependents, `ChangeCalculator` needs to compute the dependents graph. But it was doing this using `HydratedTargets` (ie, with sources globs and other fields expanded). * After the `ChangeCalculator` was used to compute the new set of `TargetRoots`, we were converting the matched `Address` objects back into `Spec`s, triggering #4533. * When locating candidate owning targets for some sources, `SourceMapper` was using `HydratedTargets`, which are (implicitly/unnecessarily) transitive. * When matching source files against candidate targets, `SourceMapper` was using un-batched regex-based spec matching calls. ### Solution * Add and switch to computing `HydratedStructs` in `ChangeCalculator`, which do not require expanding sources globs. * Rename `HydratedTargets` to `TransitiveHydratedTargets`, and add a non-transitive rule to compute `HydratedTargets`. Use this in `SourceMapper`. * Preserve `Address` objects in `ChangedTargetRoots`, which allows for a single deduped `TransitiveHydratedTarget` walk to warm and populate the graph. * Add and used batched forms of the `filespec` matching methods to avoid re-parsing/compiling regexes. These matches should be ported to rust at some point. ### Result Transitive runs that touch more of the graph take time closer to the `./pants list ::` time, as expected. Before: ``` $ time ./pants --changed-diffspec=22ca0604b1c6ce8de019214b821e922aac66b026^..22ca060 --changed-include-dependees=transitive list | wc -l 562 real 0m15.945s user 0m15.180s sys 0m1.731s ``` After: ``` $ time ./pants --changed-diffspec=22ca0604b1c6ce8de019214b821e922aac66b026^..22ca060 --changed-include-dependees=transitive list | wc -l 563 real 0m5.402s user 0m4.580s sys 0m1.319s ``` Larger runs (more files/targets) see an even more significant speedup: on the order of 5-10x.
wisechengyi
pushed a commit
that referenced
this issue
Mar 14, 2018
### Problem A few more performance issues were discovered in `./pants --changed=.. list` due to the removal of the v1 engine: * In order to calculate `direct` or `transitive` dependents, `ChangeCalculator` needs to compute the dependents graph. But it was doing this using `HydratedTargets` (ie, with sources globs and other fields expanded). * After the `ChangeCalculator` was used to compute the new set of `TargetRoots`, we were converting the matched `Address` objects back into `Spec`s, triggering #4533. * When locating candidate owning targets for some sources, `SourceMapper` was using `HydratedTargets`, which are (implicitly/unnecessarily) transitive. * When matching source files against candidate targets, `SourceMapper` was using un-batched regex-based spec matching calls. ### Solution * Add and switch to computing `HydratedStructs` in `ChangeCalculator`, which do not require expanding sources globs. * Rename `HydratedTargets` to `TransitiveHydratedTargets`, and add a non-transitive rule to compute `HydratedTargets`. Use this in `SourceMapper`. * Preserve `Address` objects in `ChangedTargetRoots`, which allows for a single deduped `TransitiveHydratedTarget` walk to warm and populate the graph. * Add and used batched forms of the `filespec` matching methods to avoid re-parsing/compiling regexes. These matches should be ported to rust at some point. ### Result Transitive runs that touch more of the graph take time closer to the `./pants list ::` time, as expected. Before: ``` $ time ./pants --changed-diffspec=22ca0604b1c6ce8de019214b821e922aac66b026^..22ca060 --changed-include-dependees=transitive list | wc -l 562 real 0m15.945s user 0m15.180s sys 0m1.731s ``` After: ``` $ time ./pants --changed-diffspec=22ca0604b1c6ce8de019214b821e922aac66b026^..22ca060 --changed-include-dependees=transitive list | wc -l 563 real 0m5.402s user 0m4.580s sys 0m1.319s ``` Larger runs (more files/targets) see an even more significant speedup: on the order of 5-10x.
Looking at this one today. |
This was referenced Mar 16, 2018
stuhood
pushed a commit
that referenced
this issue
Mar 19, 2018
### Problem While chasing performance in one area for #4349 we added `SelectTransitive`. This decreased performance in another significant area: executions involving multiple transitive roots can't structure-share at all when expanding dependencies, leading to many redundant walks (#4533). Additionally, usability regressed in a few ways: the product `Graph` could not implement cycle detection for dependencies expanded via `SelectTransitive` (#4358), and in the case of a missing or broken dependency, the referring context was lost (#4515). ### Solution * Remove usage of `SelectTransitive` for `TransitiveHydratedTargets` to reintroduce structure sharing and improve usability (fixes #4358 and fixes #4515). * Replace `@rule`s that operate on single-`Spec`s with batch forms that operate on a `Specs` collection (fixes #4533). ### Result Cycles should be detected much earlier with: ``` Exception message: Build graph construction failed: ExecutionError Received unexpected Throw state(s): Computing Select(Collection(dependencies=(SingleAddress(directory=u'testprojects/src/java/org/pantsbuild/testproject/cycle1', name=u'cycle1'),)), =TransitiveHydratedTargets) Computing Task(transitive_hydrated_targets, Collection(dependencies=(SingleAddress(directory=u'testprojects/src/java/org/pantsbuild/testproject/cycle1', name=u'cycle1'),)), =TransitiveHydratedTargets) Computing Task(transitive_hydrated_target, testprojects/src/java/org/pantsbuild/testproject/cycle1:cycle1, =TransitiveHydratedTarget) Computing Task(transitive_hydrated_target, testprojects/src/java/org/pantsbuild/testproject/cycle2:cycle2, =TransitiveHydratedTarget) Throw(No source of required dependency: Dep graph contained a cycle.) Traceback (no traceback): <pants native internals> Exception: No source of required dependency: Dep graph contained a cycle. ``` _(more polish needed here: re-opened #3695 to clean up `trace`)_ And time taken is proportional to the total number of matched targets, rather than to the sum of the closure sizes of the roots: ``` # before: $ time ./pants --target-spec-file=targets.txt list | wc -l 1500 real 0m15.297s user 0m14.603s sys 0m1.625s # after: $ time ./pants --target-spec-file=targets.txt list | wc -l 1500 real 0m5.989s user 0m5.261s sys 0m1.310s ``` Runtimes for things like `./pants list ::` are unaffected, although memory usage increases by about 5%, likely due to the branchier resulting `Graph`.
stuhood
pushed a commit
that referenced
this issue
Mar 20, 2018
While chasing performance in one area for #4349 we added `SelectTransitive`. This decreased performance in another significant area: executions involving multiple transitive roots can't structure-share at all when expanding dependencies, leading to many redundant walks (#4533). Additionally, usability regressed in a few ways: the product `Graph` could not implement cycle detection for dependencies expanded via `SelectTransitive` (#4358), and in the case of a missing or broken dependency, the referring context was lost (#4515). * Remove usage of `SelectTransitive` for `TransitiveHydratedTargets` to reintroduce structure sharing and improve usability (fixes #4358 and fixes #4515). * Replace `@rule`s that operate on single-`Spec`s with batch forms that operate on a `Specs` collection (fixes #4533). Cycles should be detected much earlier with: ``` Exception message: Build graph construction failed: ExecutionError Received unexpected Throw state(s): Computing Select(Collection(dependencies=(SingleAddress(directory=u'testprojects/src/java/org/pantsbuild/testproject/cycle1', name=u'cycle1'),)), =TransitiveHydratedTargets) Computing Task(transitive_hydrated_targets, Collection(dependencies=(SingleAddress(directory=u'testprojects/src/java/org/pantsbuild/testproject/cycle1', name=u'cycle1'),)), =TransitiveHydratedTargets) Computing Task(transitive_hydrated_target, testprojects/src/java/org/pantsbuild/testproject/cycle1:cycle1, =TransitiveHydratedTarget) Computing Task(transitive_hydrated_target, testprojects/src/java/org/pantsbuild/testproject/cycle2:cycle2, =TransitiveHydratedTarget) Throw(No source of required dependency: Dep graph contained a cycle.) Traceback (no traceback): <pants native internals> Exception: No source of required dependency: Dep graph contained a cycle. ``` _(more polish needed here: re-opened #3695 to clean up `trace`)_ And time taken is proportional to the total number of matched targets, rather than to the sum of the closure sizes of the roots: ``` $ time ./pants --target-spec-file=targets.txt list | wc -l 1500 real 0m15.297s user 0m14.603s sys 0m1.625s $ time ./pants --target-spec-file=targets.txt list | wc -l 1500 real 0m5.989s user 0m5.261s sys 0m1.310s ``` Runtimes for things like `./pants list ::` are unaffected, although memory usage increases by about 5%, likely due to the branchier resulting `Graph`.
stuhood
pushed a commit
that referenced
this issue
Apr 4, 2018
### Problem At one point it was necessary for performance purposes (due to #4533) to differentiate between `ChangedTargetRoots` (containing `Address` objects) and `LiteralTargetRoots` containing parsed specs... but with #4533 fixed, the split is no longer necessary. ### Solution Lift the `Collection` helper to `pants.objects.Collection` to allow for reuse, relocate `Specs` next to types that it holds, and merge the two `TargetRoots` subclasses.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
In many cases we're aware of literal addresses that may or may not exist (ie, they came in from the command line or via
--target-spec-file
). Currently those are represented as independentSingleAddress
objects, each of which ends up being a root in the engine.While the work will not be completely redundant, requesting transitive
HydratedTargets
for each of many independent roots will result in many independent dependencies sets, which then need to be merged for output.To preserve semantics while improving performance, we should request transitive
HydratedTargets
for a container holding all of the roots, which will dedupe across the entire set.The text was updated successfully, but these errors were encountered: