Update graph partitioning scheme #1443

seunghwak · 2021-03-08T16:44:15Z

Partially addresses Issue #1442

Update graph partitioning scheme to better control memory footprint vs concurrency trade-offs for large-scale graph processing in large clusters. This new partitioning scheme also simplifies communication patterns among GPUs which can potentially improve scalability.

… graph_utils.cuh

…rtitioning

…oning scheme, there can be more than one partition per GPU

…rtitioning

codecov-io · 2021-04-04T07:29:12Z

Codecov Report

Merging #1443 (ac15619) into branch-0.19 (1f0f14e) will increase coverage by 2.01%.
The diff coverage is 45.11%.

@@               Coverage Diff               @@
##           branch-0.19    #1443      +/-   ##
===============================================
+ Coverage        58.24%   60.26%   +2.01%     
===============================================
  Files               71       70       -1     
  Lines             3281     3153     -128     
===============================================
- Hits              1911     1900      -11     
+ Misses            1370     1253     -117

Impacted Files	Coverage Δ
python/cugraph/dask/common/input_utils.py	`22.32% <12.50%> (-0.76%)`	⬇️
python/cugraph/dask/centrality/katz_centrality.py	`29.16% <25.00%> (-5.62%)`	⬇️
python/cugraph/dask/community/louvain.py	`29.03% <25.00%> (-4.31%)`	⬇️
python/cugraph/dask/link_analysis/pagerank.py	`21.87% <25.00%> (-3.94%)`	⬇️
python/cugraph/dask/traversal/bfs.py	`27.58% <25.00%> (-4.56%)`	⬇️
python/cugraph/dask/traversal/sssp.py	`27.58% <25.00%> (-4.56%)`	⬇️
python/cugraph/structure/number_map.py	`63.82% <51.42%> (+4.61%)`	⬆️
...ython/cugraph/centrality/betweenness_centrality.py	`95.00% <0.00%> (-5.00%)`	⬇️
python/cugraph/_version.py	`44.80% <0.00%> (+0.39%)`	⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 48bf058...ac15619. Read the comment docs.

seunghwak · 2021-04-05T13:23:49Z

rerun tests

ChuckHastings · 2021-04-05T17:19:46Z

cpp/include/experimental/detail/graph_utils.cuh

-      row_comm.sync_stream(handle.get_stream());  // this is neessary as local_degrees will become
-                                                  // out-of-scope once this function returns.
-  }
+  // FIXME: is this necessary?


The comments here are puzzling. The FIXME questions if this is necessary. The comments describe why this is necessary. Is the FIXME erroneous (as explained by the other comment), or is the FIXME challenging the assertion in the comment below about local_degrees going out-of-scope.

It is certainly true that local_degrees will go out of scope. Because the comms operations are asynchronous, it seems like the assertion in the comment is likely true.

ChuckHastings · 2021-04-05T17:25:22Z

cpp/include/experimental/graph_view.hpp

@@ -334,8 +309,6 @@ class graph_view_t<vertex_t,
               bool sorted_by_global_degree_within_vertex_partition,
               bool do_expensive_check = false);

-  bool is_weighted() const { return adj_matrix_partition_weights_.size() > 0; }
-
  // FIXME: this should be removed once MNMG Louvain is updated to use graph primitives
  partition_t<vertex_t> get_partition() const { return partition_; }


Should this be removed now? Or is it needed elsewhere and the FIXME is out of date or no longer required?

Yes, this should be removed, and I will remove this.

afender

Lots of code in there. It's great but a bit challenging to review. Moving forward it would be better to have a finer PR granularity and a high-level description of the concrete steps taken to improve a given feature. We also need to start measuring improvements when it comes to performance and scalability optimizations.

cpp/include/experimental/detail/graph_utils.cuh

seunghwak · 2021-04-05T21:06:37Z

Lots of code in there. It's great but a bit challenging to review. Moving forward it would be better to have a finer PR granularity and a high-level description of the concrete steps taken to improve a given feature. We also need to start measuring improvements when it comes to performance and scalability optimizations.

Yeah... sorry for creating a huge PR. This was somewhat unavoidable as this changes the fundamental underlying data structure and multiple places get affected by this change. And I should fix bugs in multiple places to make this PR work. Yeah... but next time, I will try to plan ahead to avoid creating a huge PR; I will encounter a similar challenge when I need to bring DCSR/DCSC, and I will think about bringing this in multiple steps.

seunghwak · 2021-04-05T21:16:19Z

I think I addressed all the comments and let me know if you want me to make additional changes. Once this PR gets merged, PR #1447 will become reviewable.

seunghwak · 2021-04-06T00:14:49Z

rerun tests

…rtitioning

BradReesWork · 2021-04-06T14:00:31Z

@gpucibot merge

seunghwak added 15 commits February 23, 2021 15:29

switch the graph partitioning scheme in graph(_view)_t

026387e

switch the graph partitioning schme in patter accelerator headers and…

41a238b

… graph_utils.cuh

function renaming

187a5f9

add additional utility functions to graph_view_t

cfe54ce

cosmetic updates

efc58c6

switch the graph partitioning scheme in graph functions

2e5e45c

Merge branch 'branch-0.19' of github.com:rapidsai/cugraph into enh_pa…

dbc83eb

…rtitioning

compile error fixes

8c30949

Merge branch 'branch-0.19' of github.com:rapidsai/cugraph into enh_pa…

110a287

…rtitioning

Merge branch 'branch-0.19' of github.com:rapidsai/cugraph into enh_pa…

1062b73

…rtitioning

Merge branch 'branch-0.19' of github.com:rapidsai/cugraph into enh_pa…

b11fe16

…rtitioning

Merge branch 'branch-0.19' of github.com:rapidsai/cugraph into enh_pa…

764900a

…rtitioning

fix compile errors

3ddff4c

refactor groupby and count based on key_to_id_op

a71b045

fix compile error due to recent API changes

d5d9a17

seunghwak added 2 - In Progress DO NOT MERGE Hold off on merging; see PR for details improvement Improvement / enhancement to an existing function and removed Fix labels Mar 8, 2021

BradReesWork added this to the 0.19 milestone Mar 10, 2021

seunghwak added 2 commits March 10, 2021 22:08

Merge branch 'branch-0.19' of github.com:rapidsai/cugraph into enh_pa…

42fafb4

…rtitioning

fix variable naming inconsistencies

e0efcd5

seunghwak mentioned this pull request Mar 11, 2021

Improve graph primitives performance on graphs with widely varying vertex degrees #1447

Merged

seunghwak added 4 commits March 15, 2021 12:58

resolve merge conflicts

0656ef7

minor cosmetic updates

5098224

update python binding (C++ part) to accomodate new partitioning scheme

7b36f8a

bug fixes

e5c17f3

seunghwak added 8 commits April 1, 2021 21:25

python binding bug fix

a7b6c8e

rename num_partition_edges to num_local_edges as with the new partiti…

496ff03

…oning scheme, there can be more than one partition per GPU

python binding bug fix in handling weights

156aa3d

Merge branch 'branch-0.19' of github.com:rapidsai/cugraph into enh_pa…

7a4153e

…rtitioning

bug fix

4d77b84

bug fix for a corner case in expensive check for renumber_edgelist

fdb309a

workaround for cuco static_map kernel launch with 0 grid size

d40320c

Merge branch 'branch-0.19' of github.com:rapidsai/cugraph into enh_pa…

ac15619

…rtitioning

seunghwak changed the title ~~[WIP] Update graph partitioning scheme~~ Update graph partitioning scheme Apr 4, 2021

seunghwak added 3 - Ready for Review and removed 2 - In Progress labels Apr 4, 2021

seunghwak removed the DO NOT MERGE Hold off on merging; see PR for details label Apr 5, 2021

BradReesWork requested review from aschaffer, afender and ChuckHastings April 5, 2021 17:02

ChuckHastings approved these changes Apr 5, 2021

View reviewed changes

afender approved these changes Apr 5, 2021

View reviewed changes

cpp/include/experimental/detail/graph_utils.cuh Outdated Show resolved Hide resolved

aschaffer approved these changes Apr 5, 2021

View reviewed changes

seunghwak added 2 commits April 5, 2021 17:13

remove unnecessary code

6f2a8d6

remove unnecessary synchronization

462331e

Merge branch 'branch-0.19' of github.com:rapidsai/cugraph into enh_pa…

3f311af

…rtitioning

rapids-bot bot merged commit 9a1ab09 into rapidsai:branch-0.19 Apr 6, 2021

seunghwak deleted the enh_partitioning branch June 24, 2021 19:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update graph partitioning scheme #1443

Update graph partitioning scheme #1443

seunghwak commented Mar 8, 2021 •

edited

Loading

codecov-io commented Apr 4, 2021 •

edited

Loading

seunghwak commented Apr 5, 2021

ChuckHastings Apr 5, 2021

ChuckHastings Apr 5, 2021

seunghwak Apr 5, 2021

afender left a comment

seunghwak commented Apr 5, 2021

seunghwak commented Apr 5, 2021

seunghwak commented Apr 6, 2021

BradReesWork commented Apr 6, 2021

Update graph partitioning scheme #1443

Update graph partitioning scheme #1443

Conversation

seunghwak commented Mar 8, 2021 • edited Loading

codecov-io commented Apr 4, 2021 • edited Loading

Codecov Report

seunghwak commented Apr 5, 2021

ChuckHastings Apr 5, 2021

Choose a reason for hiding this comment

ChuckHastings Apr 5, 2021

Choose a reason for hiding this comment

seunghwak Apr 5, 2021

Choose a reason for hiding this comment

afender left a comment

Choose a reason for hiding this comment

seunghwak commented Apr 5, 2021

seunghwak commented Apr 5, 2021

seunghwak commented Apr 6, 2021

BradReesWork commented Apr 6, 2021

seunghwak commented Mar 8, 2021 •

edited

Loading

codecov-io commented Apr 4, 2021 •

edited

Loading