Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update graph partitioning scheme #1443

Merged
merged 80 commits into from
Apr 6, 2021

Conversation

seunghwak
Copy link
Contributor

@seunghwak seunghwak commented Mar 8, 2021

Partially addresses Issue #1442

Update graph partitioning scheme to better control memory footprint vs concurrency trade-offs for large-scale graph processing in large clusters. This new partitioning scheme also simplifies communication patterns among GPUs which can potentially improve scalability.

@seunghwak seunghwak added 2 - In Progress DO NOT MERGE Hold off on merging; see PR for details improvement Improvement / enhancement to an existing function and removed Fix labels Mar 8, 2021
@BradReesWork BradReesWork added this to the 0.19 milestone Mar 10, 2021
@seunghwak seunghwak changed the title [WIP] Update graph partitioning scheme Update graph partitioning scheme Apr 4, 2021
@codecov-io
Copy link

codecov-io commented Apr 4, 2021

Codecov Report

Merging #1443 (ac15619) into branch-0.19 (1f0f14e) will increase coverage by 2.01%.
The diff coverage is 45.11%.

Impacted file tree graph

@@               Coverage Diff               @@
##           branch-0.19    #1443      +/-   ##
===============================================
+ Coverage        58.24%   60.26%   +2.01%     
===============================================
  Files               71       70       -1     
  Lines             3281     3153     -128     
===============================================
- Hits              1911     1900      -11     
+ Misses            1370     1253     -117     
Impacted Files Coverage Δ
python/cugraph/dask/common/input_utils.py 22.32% <12.50%> (-0.76%) ⬇️
python/cugraph/dask/centrality/katz_centrality.py 29.16% <25.00%> (-5.62%) ⬇️
python/cugraph/dask/community/louvain.py 29.03% <25.00%> (-4.31%) ⬇️
python/cugraph/dask/link_analysis/pagerank.py 21.87% <25.00%> (-3.94%) ⬇️
python/cugraph/dask/traversal/bfs.py 27.58% <25.00%> (-4.56%) ⬇️
python/cugraph/dask/traversal/sssp.py 27.58% <25.00%> (-4.56%) ⬇️
python/cugraph/structure/number_map.py 63.82% <51.42%> (+4.61%) ⬆️
...ython/cugraph/centrality/betweenness_centrality.py 95.00% <0.00%> (-5.00%) ⬇️
python/cugraph/_version.py 44.80% <0.00%> (+0.39%) ⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 48bf058...ac15619. Read the comment docs.

@seunghwak seunghwak removed the DO NOT MERGE Hold off on merging; see PR for details label Apr 5, 2021
@seunghwak
Copy link
Contributor Author

rerun tests

row_comm.sync_stream(handle.get_stream()); // this is neessary as local_degrees will become
// out-of-scope once this function returns.
}
// FIXME: is this necessary?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments here are puzzling. The FIXME questions if this is necessary. The comments describe why this is necessary. Is the FIXME erroneous (as explained by the other comment), or is the FIXME challenging the assertion in the comment below about local_degrees going out-of-scope.

It is certainly true that local_degrees will go out of scope. Because the comms operations are asynchronous, it seems like the assertion in the comment is likely true.

@@ -334,8 +309,6 @@ class graph_view_t<vertex_t,
bool sorted_by_global_degree_within_vertex_partition,
bool do_expensive_check = false);

bool is_weighted() const { return adj_matrix_partition_weights_.size() > 0; }

// FIXME: this should be removed once MNMG Louvain is updated to use graph primitives
partition_t<vertex_t> get_partition() const { return partition_; }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be removed now? Or is it needed elsewhere and the FIXME is out of date or no longer required?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this should be removed, and I will remove this.

Copy link
Member

@afender afender left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of code in there. It's great but a bit challenging to review. Moving forward it would be better to have a finer PR granularity and a high-level description of the concrete steps taken to improve a given feature. We also need to start measuring improvements when it comes to performance and scalability optimizations.

cpp/include/experimental/detail/graph_utils.cuh Outdated Show resolved Hide resolved
@seunghwak
Copy link
Contributor Author

Lots of code in there. It's great but a bit challenging to review. Moving forward it would be better to have a finer PR granularity and a high-level description of the concrete steps taken to improve a given feature. We also need to start measuring improvements when it comes to performance and scalability optimizations.

Yeah... sorry for creating a huge PR. This was somewhat unavoidable as this changes the fundamental underlying data structure and multiple places get affected by this change. And I should fix bugs in multiple places to make this PR work. Yeah... but next time, I will try to plan ahead to avoid creating a huge PR; I will encounter a similar challenge when I need to bring DCSR/DCSC, and I will think about bringing this in multiple steps.

@seunghwak
Copy link
Contributor Author

I think I addressed all the comments and let me know if you want me to make additional changes. Once this PR gets merged, PR #1447 will become reviewable.

@seunghwak
Copy link
Contributor Author

rerun tests

@BradReesWork
Copy link
Member

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 9a1ab09 into rapidsai:branch-0.19 Apr 6, 2021
@seunghwak seunghwak deleted the enh_partitioning branch June 24, 2021 19:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants