Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce allocations in NuGet.DependencyResolver.Tracker #5267

Merged
merged 2 commits into from
Jul 11, 2023

Conversation

drewnoakes
Copy link
Contributor

@drewnoakes drewnoakes commented Jun 21, 2023

Bug

Fixes: https://devdiv.visualstudio.com/DevDiv/_workitems/edit/1826391
Fixes: NuGet/Home#12719

Regression? Last working version:

Description

NuGet.DependencyResolver.Tracker keeps track of graph items as they are observed. The internal implementation has been optimised to pack data in a more compact fashion, as this type was identified as contributing to GC pauses in VS by GCPauseWatson.

Unit tests were added to document the previous behaviour, then the class was rewritten to use less memory.

Some of the optimisations:

  • Don't create an Entry for every look up operation. Only create it when some state actually needs to be written (adding a graph item, or marking ambiguous).
  • Don't allocate the HashSet for every entry in the tracker. Only allocate one once there is more than one item for a given library.
  • Don't call Contains on the HashSet before calling Add. If the item already exists, Add is a no-op.

PR Checklist

  • PR has a meaningful title

  • PR has a linked issue.

  • Described changes

  • Tests

    • Automated tests added
    • OR
    • Test exception
    • OR
    • N/A
  • Documentation

    • Documentation PR or issue filled
    • OR
    • N/A

This object keeps track of graph items as they are observed. The internal implementation has been optimised to pack data in a more compact fashion, as this type was identified as contributing to GC pauses in VS by GCPauseWatson.

Unit tests were added to document the previous behaviour, then the class was rewritten to use less memory.

Some of the optimisations:
- Don't create an entry for every look up operation. Only create it when some state actually needs to be written (adding a graph item, or marking ambiguous).
- Don't allocate the HashSet for every entry in the tracker. Only allocate one once there is more than one item for a given library.
- Don't call Contains on the HashSet before calling Add. If the item already exists, Add is a no-op.
@drewnoakes drewnoakes requested a review from a team as a code owner June 21, 2023 13:39
@nkolev92 nkolev92 added Merge next release PRs that should not be merged until the dev branch targets the next release and removed Merge next release PRs that should not be merged until the dev branch targets the next release labels Jun 22, 2023
@ghost ghost added the Status:No recent activity PRs that have not had any recent activity and will be closed if the label is not removed label Jun 29, 2023
@ghost
Copy link

ghost commented Jun 29, 2023

This PR has been automatically marked as stale because it has no activity for 7 days. It will be closed if no further activity occurs within another 7 days of this comment. If it is closed, you may reopen it anytime when you're ready again, as long as you don't delete the branch.

@drewnoakes
Copy link
Contributor Author

Pinging @NuGet/nuget-client to make sure this PR doesn't get buried.

@ghost ghost removed the Status:No recent activity PRs that have not had any recent activity and will be closed if the label is not removed label Jun 29, 2023
jeffkl
jeffkl previously approved these changes Jun 30, 2023
Copy link
Member

@nkolev92 nkolev92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Few small comments.

/// memory. For large graphs, this can amount to a significant saving in memory,
/// which helps overall performance.
/// </summary>
private object? _storage;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given casting has a perf cost, plus object obscures types, is it worthwhile having two fields instead, private TItem? _single and private HashSet<TItem>? _collection instead? It gives more compile time static type analysis.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal here was to reduce memory footprint, and a second field would increase size again. Type checking is not free, but it's still very fast. The scenario where this was flagged as contributing to GC, the CPU was spending a huge amount of time on GC. My sense is that reducing GC pressure would offset any overhead from type checking. To write a benchmark you'd need realistic data loads, which would take time to gather. I don't think it'd be worth the effort to investigate given how much low hanging fruit exists throughout NuGet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Drew's assessment here.

@nkolev92 nkolev92 requested review from zivkan and jeffkl July 7, 2023 17:35
@nkolev92 nkolev92 self-assigned this Jul 7, 2023
@nkolev92 nkolev92 merged commit f2500c1 into dev Jul 11, 2023
@nkolev92 nkolev92 deleted the dev-drnoakes-1826391-tracker-allocations branch July 11, 2023 17:40
@aortiz-msft aortiz-msft added the Community PRs created by someone not in the NuGet team label Aug 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Community PRs created by someone not in the NuGet team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

WalkTreeRejectNodesOfRejectedNodes constantly triggering resizes of its tracker collection
7 participants