Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process projects in parallel in FAR. #45074

Merged
merged 4 commits into from
Jun 11, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
using System.Threading.Tasks;
using Microsoft.CodeAnalysis.FindSymbols.Finders;
using Microsoft.CodeAnalysis.Internal.Log;
using Microsoft.CodeAnalysis.PooledObjects;
using Microsoft.CodeAnalysis.Shared.Utilities;
using Roslyn.Utilities;

Expand All @@ -25,7 +26,6 @@ internal partial class FindReferencesSearchEngine
private readonly IStreamingProgressTracker _progressTracker;
private readonly IStreamingFindReferencesProgress _progress;
private readonly CancellationToken _cancellationToken;
private readonly ProjectDependencyGraph _dependencyGraph;
private readonly FindReferencesSearchOptions _options;

public FindReferencesSearchEngine(
Expand All @@ -41,7 +41,6 @@ public FindReferencesSearchEngine(
_finders = finders;
_progress = progress;
_cancellationToken = cancellationToken;
_dependencyGraph = solution.GetProjectDependencyGraph();
_options = options;

_progressTracker = progress.ProgressTracker;
Expand Down Expand Up @@ -78,33 +77,18 @@ private async Task ProcessAsync(ProjectToDocumentMap projectToDocumentMap)
return;
}

// Get the connected components of the dependency graph and process each individually.
// That way once a component is done we can throw away all the memory associated with
// it.
// For each connected component, we'll process the individual projects from bottom to
// top. i.e. we'll first process the projects with no dependencies. Then the projects
// that depend on those projects, and so on. This way we always have created the
// dependent compilations when they're needed by later projects. If we went the other
// way (i.e. processed the projects with lots of project dependencies first), then we'd
// have to create all their dependent compilations in order to get their compilation.
// This would be very expensive and would take a lot of time before we got our first
// result.
var connectedProjects = _dependencyGraph.GetDependencySets(_cancellationToken);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is no longer an issue as we can still get results uickly by processing hte compilations in parallel, thus returning results quickly for the compilations we can get quickly.


// Add a progress item for each (document, symbol, finder) set that we will execute.
// We'll mark the item as completed in "ProcessDocumentAsync".
var totalFindCount = projectToDocumentMap.Sum(
kvp1 => kvp1.Value.Sum(kvp2 => kvp2.Value.Count));
await _progressTracker.AddItemsAsync(totalFindCount).ConfigureAwait(false);

// Now, go through each connected project set and process it independently.
foreach (var connectedProjectSet in connectedProjects)
{
_cancellationToken.ThrowIfCancellationRequested();
using var _ = ArrayBuilder<Task>.GetInstance(out var tasks);

await ProcessProjectsAsync(
connectedProjectSet, projectToDocumentMap).ConfigureAwait(false);
}
foreach (var (project, documentMap) in projectToDocumentMap)
tasks.Add(Task.Run(() => ProcessProjectAsync(project, documentMap), _cancellationToken));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is Task.Run here just to pass cancellationToken?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's to ensure that the work is truly parallel and the main lol can get to the next item to add to the work queue. If there work doesn't actually ever yield, and if entirely synchronous, then not doing this could have no benefit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's to ensure that the work is truly parallel and the main lol can get to the next item to add to the work queue.

I looked back here to read your answer again. But I think that's not what I was asking.

I was wondering why not: tasks.Add(ProcesssProjectAsync(...))? Is cancellationToken the only reason?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering why not: tasks.Add(ProcesssProjectAsync(...))? Is cancellationToken the only reason?
@CyrusNajmabadi

No. It's so the work is actually parallel. Adding lots of async work doesn't make anything acutally parallel. Consider something as simple as this:

public Task ExpensiveAsync()
{
    Thread.Sleep(10000);
    return Task.CompletedTask;
}

if we go in a loop 10 times and attempt to add ExpensiveAsync to an array of tasks, it will take 100 seconds. By doing Task.Run, we explicitly move this work to a threadpool thread, and allow the work to operate entirely concurrently, allowing it all to finish in 10 seconds.

This is obviously an extreme case (total thread blocking, no actual asynchrony), but the general idea still holds. Putting a lot of async work into a queue doesn't make it necessary concurrent. Doing the Task.Run does ensure this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is obviously an extreme case (total thread blocking, no actual asynchrony), but the general idea still holds.

Got it. Just so that I understand, it does depend on the ProcesssProjectAsync implementation to be possibly CPU-bound/Blocking, and even for IO-bound scenarios, Task.Run will ensure any "additional work" that is being done as part of it to run in parallel.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. tasks.Add(ProcesssProjectAsync(...)) is possibly concurrent. tasks.Add(Task.Run(() is definitely concurrent.


await Task.WhenAll(tasks).ConfigureAwait(false);
}
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,69 +12,9 @@
namespace Microsoft.CodeAnalysis.FindSymbols
{
using DocumentMap = MultiDictionary<Document, (ISymbol symbol, IReferenceFinder finder)>;
using ProjectToDocumentMap = Dictionary<Project, MultiDictionary<Document, (ISymbol symbol, IReferenceFinder finder)>>;

internal partial class FindReferencesSearchEngine
{
private async Task ProcessProjectsAsync(
IEnumerable<ProjectId> connectedProjectSet,
ProjectToDocumentMap projectToDocumentMap)
{
var visitedProjects = new HashSet<ProjectId>();

// Make sure we process each project in the set. Process each project in depth first
// order. That way when we process a project, the compilations for all projects that it
// depends on will have been created already.
foreach (var projectId in connectedProjectSet)
{
_cancellationToken.ThrowIfCancellationRequested();

await ProcessProjectAsync(
projectId, projectToDocumentMap, visitedProjects).ConfigureAwait(false);
}
}

private async Task ProcessProjectAsync(
ProjectId projectId,
ProjectToDocumentMap projectToDocumentMap,
HashSet<ProjectId> visitedProjects)
{
// Don't visit projects more than once.
if (visitedProjects.Add(projectId))
{
var project = _solution.GetProject(projectId);

// Visit dependencies first. That way the compilation for a project that we depend
// on is already ready for us when we need it.
foreach (var dependent in project.ProjectReferences)
{
_cancellationToken.ThrowIfCancellationRequested();

await ProcessProjectAsync(
dependent.ProjectId, projectToDocumentMap, visitedProjects).ConfigureAwait(false);
}

await ProcessProjectAsync(project, projectToDocumentMap).ConfigureAwait(false);
}
}

private async Task ProcessProjectAsync(
Project project,
ProjectToDocumentMap projectToDocumentMap)
{
if (!projectToDocumentMap.TryGetValue(project, out var documentMap))
{
// No files in this project to process. We can bail here. We'll have cached our
// compilation if there are any projects left to process that depend on us.
return;
}

projectToDocumentMap.Remove(project);

// Now actually process the project.
await ProcessProjectAsync(project, documentMap).ConfigureAwait(false);
}

private async Task ProcessProjectAsync(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the only bit of code in this file, could probably just be moved to FindReferencesSearchEngine.cs now that it's much simpler

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. can clean up in my continuing followup PRs :)

Project project,
DocumentMap documentMap)
Expand Down