Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Engine executions for multiple SingleAddresses are redundant #4533

Closed
stuhood opened this issue Apr 29, 2017 · 2 comments · Fixed by #5605
Closed

Engine executions for multiple SingleAddresses are redundant #4533

stuhood opened this issue Apr 29, 2017 · 2 comments · Fixed by #5605
Assignees

Comments

@stuhood
Copy link
Member

stuhood commented Apr 29, 2017

In many cases we're aware of literal addresses that may or may not exist (ie, they came in from the command line or via --target-spec-file). Currently those are represented as independent SingleAddress objects, each of which ends up being a root in the engine.

While the work will not be completely redundant, requesting transitive HydratedTargets for each of many independent roots will result in many independent dependencies sets, which then need to be merged for output.

To preserve semantics while improving performance, we should request transitive HydratedTargets for a container holding all of the roots, which will dedupe across the entire set.

@stuhood
Copy link
Member Author

stuhood commented Mar 14, 2018

Hm... it's also possible that by fixing #4358 (ie, removing SelectTransitive) we would allow for more memoization here, which would mean that the API did not need to change.

stuhood pushed a commit that referenced this issue Mar 14, 2018
### Problem

A few more performance issues were discovered in `./pants --changed=.. list` due to the removal of the v1 engine:

* In order to calculate `direct` or `transitive` dependents, `ChangeCalculator` needs to compute the dependents graph. But it was doing this using `HydratedTargets` (ie, with sources globs and other fields expanded).
* After the `ChangeCalculator` was used to compute the new set of `TargetRoots`, we were converting the matched `Address` objects back into `Spec`s, triggering #4533.
* When locating candidate owning targets for some sources, `SourceMapper` was using `HydratedTargets`, which are (implicitly/unnecessarily) transitive.
* When matching source files against candidate targets, `SourceMapper` was using un-batched regex-based spec matching calls.

### Solution

* Add and switch to computing `HydratedStructs` in `ChangeCalculator`, which do not require expanding sources globs.
* Rename `HydratedTargets` to `TransitiveHydratedTargets`, and add a non-transitive rule to compute `HydratedTargets`. Use this in `SourceMapper`.
* Preserve `Address` objects in `ChangedTargetRoots`, which allows for a single deduped `TransitiveHydratedTarget` walk to warm and populate the graph.
* Add and used batched forms of the `filespec` matching methods to avoid re-parsing/compiling regexes. These matches should be ported to rust at some point. 

### Result

Transitive runs that touch more of the graph take time closer to the `./pants list ::` time, as expected.
 
Before:
```
$ time ./pants --changed-diffspec=22ca0604b1c6ce8de019214b821e922aac66b026^..22ca060 --changed-include-dependees=transitive list | wc -l

     562

real	0m15.945s
user	0m15.180s
sys	0m1.731s
```
After:
```
$ time ./pants --changed-diffspec=22ca0604b1c6ce8de019214b821e922aac66b026^..22ca060 --changed-include-dependees=transitive list | wc -l

     563

real	0m5.402s
user	0m4.580s
sys	0m1.319s
```

Larger runs (more files/targets) see an even more significant speedup: on the order of 5-10x.
wisechengyi pushed a commit that referenced this issue Mar 14, 2018
### Problem

A few more performance issues were discovered in `./pants --changed=.. list` due to the removal of the v1 engine:

* In order to calculate `direct` or `transitive` dependents, `ChangeCalculator` needs to compute the dependents graph. But it was doing this using `HydratedTargets` (ie, with sources globs and other fields expanded).
* After the `ChangeCalculator` was used to compute the new set of `TargetRoots`, we were converting the matched `Address` objects back into `Spec`s, triggering #4533.
* When locating candidate owning targets for some sources, `SourceMapper` was using `HydratedTargets`, which are (implicitly/unnecessarily) transitive.
* When matching source files against candidate targets, `SourceMapper` was using un-batched regex-based spec matching calls.

### Solution

* Add and switch to computing `HydratedStructs` in `ChangeCalculator`, which do not require expanding sources globs.
* Rename `HydratedTargets` to `TransitiveHydratedTargets`, and add a non-transitive rule to compute `HydratedTargets`. Use this in `SourceMapper`.
* Preserve `Address` objects in `ChangedTargetRoots`, which allows for a single deduped `TransitiveHydratedTarget` walk to warm and populate the graph.
* Add and used batched forms of the `filespec` matching methods to avoid re-parsing/compiling regexes. These matches should be ported to rust at some point. 

### Result

Transitive runs that touch more of the graph take time closer to the `./pants list ::` time, as expected.
 
Before:
```
$ time ./pants --changed-diffspec=22ca0604b1c6ce8de019214b821e922aac66b026^..22ca060 --changed-include-dependees=transitive list | wc -l

     562

real	0m15.945s
user	0m15.180s
sys	0m1.731s
```
After:
```
$ time ./pants --changed-diffspec=22ca0604b1c6ce8de019214b821e922aac66b026^..22ca060 --changed-include-dependees=transitive list | wc -l

     563

real	0m5.402s
user	0m4.580s
sys	0m1.319s
```

Larger runs (more files/targets) see an even more significant speedup: on the order of 5-10x.
@stuhood stuhood self-assigned this Mar 15, 2018
@stuhood
Copy link
Member Author

stuhood commented Mar 15, 2018

Looking at this one today.

stuhood pushed a commit that referenced this issue Mar 19, 2018
### Problem

While chasing performance in one area for #4349 we added `SelectTransitive`. This decreased performance in another significant area: executions involving multiple transitive roots can't structure-share at all when expanding dependencies, leading to many redundant walks (#4533). 

Additionally, usability regressed in a few ways: the product `Graph` could not implement cycle detection for dependencies expanded via `SelectTransitive` (#4358), and in the case of a missing or broken dependency, the referring context was lost (#4515). 

### Solution

* Remove usage of `SelectTransitive` for `TransitiveHydratedTargets` to reintroduce structure sharing and improve usability (fixes #4358 and fixes #4515).
* Replace `@rule`s that operate on single-`Spec`s with batch forms that operate on a `Specs` collection (fixes #4533).

### Result

Cycles should be detected much earlier with:
```
Exception message: Build graph construction failed: ExecutionError Received unexpected Throw state(s):
Computing Select(Collection(dependencies=(SingleAddress(directory=u'testprojects/src/java/org/pantsbuild/testproject/cycle1', name=u'cycle1'),)), =TransitiveHydratedTargets)
  Computing Task(transitive_hydrated_targets, Collection(dependencies=(SingleAddress(directory=u'testprojects/src/java/org/pantsbuild/testproject/cycle1', name=u'cycle1'),)), =TransitiveHydratedTargets)
    Computing Task(transitive_hydrated_target, testprojects/src/java/org/pantsbuild/testproject/cycle1:cycle1, =TransitiveHydratedTarget)
      Computing Task(transitive_hydrated_target, testprojects/src/java/org/pantsbuild/testproject/cycle2:cycle2, =TransitiveHydratedTarget)
        Throw(No source of required dependency: Dep graph contained a cycle.)
          Traceback (no traceback):
            <pants native internals>
          Exception: No source of required dependency: Dep graph contained a cycle.
```
_(more polish needed here: re-opened #3695 to clean up `trace`)_

And time taken is proportional to the total number of matched targets, rather than to the sum of the closure sizes of the roots:
```
# before:
$ time ./pants --target-spec-file=targets.txt list | wc -l
    1500

real	0m15.297s
user	0m14.603s
sys	0m1.625s

# after:
$ time ./pants --target-spec-file=targets.txt list | wc -l
    1500

real	0m5.989s
user	0m5.261s
sys	0m1.310s
```

Runtimes for things like `./pants list ::` are unaffected, although memory usage increases by about 5%, likely due to the branchier resulting `Graph`.
stuhood pushed a commit that referenced this issue Mar 20, 2018
While chasing performance in one area for #4349 we added `SelectTransitive`. This decreased performance in another significant area: executions involving multiple transitive roots can't structure-share at all when expanding dependencies, leading to many redundant walks (#4533).

Additionally, usability regressed in a few ways: the product `Graph` could not implement cycle detection for dependencies expanded via `SelectTransitive` (#4358), and in the case of a missing or broken dependency, the referring context was lost (#4515).

* Remove usage of `SelectTransitive` for `TransitiveHydratedTargets` to reintroduce structure sharing and improve usability (fixes #4358 and fixes #4515).
* Replace `@rule`s that operate on single-`Spec`s with batch forms that operate on a `Specs` collection (fixes #4533).

Cycles should be detected much earlier with:
```
Exception message: Build graph construction failed: ExecutionError Received unexpected Throw state(s):
Computing Select(Collection(dependencies=(SingleAddress(directory=u'testprojects/src/java/org/pantsbuild/testproject/cycle1', name=u'cycle1'),)), =TransitiveHydratedTargets)
  Computing Task(transitive_hydrated_targets, Collection(dependencies=(SingleAddress(directory=u'testprojects/src/java/org/pantsbuild/testproject/cycle1', name=u'cycle1'),)), =TransitiveHydratedTargets)
    Computing Task(transitive_hydrated_target, testprojects/src/java/org/pantsbuild/testproject/cycle1:cycle1, =TransitiveHydratedTarget)
      Computing Task(transitive_hydrated_target, testprojects/src/java/org/pantsbuild/testproject/cycle2:cycle2, =TransitiveHydratedTarget)
        Throw(No source of required dependency: Dep graph contained a cycle.)
          Traceback (no traceback):
            <pants native internals>
          Exception: No source of required dependency: Dep graph contained a cycle.
```
_(more polish needed here: re-opened #3695 to clean up `trace`)_

And time taken is proportional to the total number of matched targets, rather than to the sum of the closure sizes of the roots:
```
$ time ./pants --target-spec-file=targets.txt list | wc -l
    1500

real	0m15.297s
user	0m14.603s
sys	0m1.625s

$ time ./pants --target-spec-file=targets.txt list | wc -l
    1500

real	0m5.989s
user	0m5.261s
sys	0m1.310s
```

Runtimes for things like `./pants list ::` are unaffected, although memory usage increases by about 5%, likely due to the branchier resulting `Graph`.
stuhood pushed a commit that referenced this issue Apr 4, 2018
### Problem

At one point it was necessary for performance purposes (due to #4533) to differentiate between `ChangedTargetRoots` (containing `Address` objects) and `LiteralTargetRoots` containing parsed specs... but with #4533 fixed, the split is no longer necessary.

### Solution

Lift the `Collection` helper to `pants.objects.Collection` to allow for reuse, relocate `Specs` next to types that it holds, and merge the two `TargetRoots` subclasses.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant