Bipartite R-mat graph generation. #3512

seunghwak · 2023-04-25T18:42:27Z

Addresses #2075
Closes #3532

This function will generate (source, destination) vertex ID pairs. Source vertex IDs will have values in [0, 2^src_scale) and destination vertex IDs will have values in [0, 2^dst_scale).

Additionally,

scramble_vertex_ids function had unused input parameters and it was internally erroneously setting scale. Fixed this bug.
Rmat_Usecase was ignoring scramble_vertex_ids flag, fixed this bug.
Added scramble_vertex_ids that take a just single vertex list (instead of src, dst pair)
Update scramble_vertex_ids to take input vectors as R-values and return scrambled vectors (instead taking in/out parameters)

seunghwak · 2023-04-25T18:44:02Z

@BradReesWork Will this API be sufficient to address the issue #2075? This will take separate src_scale and dst_scale instead of just taking a single scale.

One may need to add 2^src_scale to generate destination vertex IDs if one wants vertex set U and vertex set V (in the bi-partite graph with two sets U & V) to have non-overlapping vertex IDs.

One may need to call this function twice after swapping src_scale and dst_scale to have edges from set U to set V and also the edges from set V to set U.

BradReesWork · 2023-04-25T22:34:27Z

@seunghwak the num_edges will be the number of edges between the sets? Also how are you enforcing that edges are only between sets and no within a set?

seunghwak · 2023-04-25T22:42:08Z

the num_edges will be the number of edges between the sets?

Yes.

Also how are you enforcing that edges are only between sets and no within a set?

Here, expectation is that source vertices belong to one set and destination vertices belong to the other set.

Say there are two sets U and V.

Say U has 2^u_scale vertices and V has 2^v_scale vertices.

To generate edges from U to V, you call this function with src_scale = u_scale and dst_scale = v_scale.

To generate edges from V to U, you call this function with src_scale = v_scale and dst_scale = u_scale.

And if the caller wants vertex IDs for U to be in [0, 2^u_scale) and V to be in [2^u_scale, 2^u_scale + 2^v_scale), they need to add 2^u_scale for either source or vertex IDs after calling this function.

naimnv · 2023-04-26T13:03:42Z

And if the caller wants vertex IDs for U to be in [0, 2^u_scale) and V to be in [2^u_scale, 2^u_scale + 2^v_scale), they need to add 2^u_scale for either source or vertex IDs after calling this function.

Can the caller have any other options than adding |U| to destination vertexIDs (or adding |V| to source vertexIds) if the output edge-lists are used to create graph using cugraph::create_graph_from_edgelist?

ajschmidt8 · 2023-04-26T14:32:08Z

@seunghwak, just an FYI. adding [skip-ci] to the PR title doesn't do anything for GitHub Actions.

please review the page below to figure out how to skip CI for GitHub Actions.

https://docs.rapids.ai/resources/github-actions/#skipping-ci-for-commits

seunghwak · 2023-04-26T16:26:26Z

Can the caller have any other options than adding |U| to destination vertexIDs (or adding |V| to source vertexIds) if the output edge-lists are used to create graph using cugraph::create_graph_from_edgelist?

What do you mean? To explicitly create a bi-partite graph? We currently do not store bi-partitie graphs in a different format, but in the future and if it turns out to be hugely beneficial, we may consider taking (vertex ID within a set, set number) pairs instead of simple vertex IDs for k-partitie graphs. Not considering this right at this moment.

…ipartite_rmat

seunghwak · 2023-04-26T17:07:03Z

@ChuckHastings Can we delete the deprecated R-mat generators (which takes a seed) or do we still need to keep them?

ChuckHastings · 2023-04-26T18:03:39Z

@ChuckHastings Can we delete the deprecated R-mat generators (which takes a seed) or do we still need to keep them?

I believe we need to keep them until the python code is modified to use the new C API approach. This work is tracked in #3480

naimnv · 2023-04-26T19:13:27Z

cpp/src/generators/generate_rmat_edgelist.cu

+    thrust::transform(
+      handle.get_thrust_policy(),
+      thrust::make_counting_iterator(size_t{0}),
+      thrust::make_counting_iterator(num_edges_to_generate),
+      pair_first,
+      // if a + b == 0.0, a_norm is irrelevant, if (1.0 - (a+b)) == 0.0, c_norm is irrelevant
+      [src_scale,
+       dst_scale,
+       rands    = rands.data(),
+       a_plus_b = a + b,
+       a_plus_c = a + c,
+       a_norm   = (a + b) > 0.0 ? a / (a + b) : 0.0,
+       c_norm   = (1.0 - (a + b)) > 0.0 ? c / (1.0 - (a + b)) : 0.0] __device__(auto i) {
+        vertex_t src{0};
+        vertex_t dst{0};
+        size_t rand_offset = i * (src_scale + dst_scale);
+        for (int level = 0; level < static_cast<int>(std::max(src_scale, dst_scale)); ++level) {
+          auto dst_threshold = a_plus_c;
+          if (level < src_scale) {
+            auto r = rands[rand_offset++];
+            auto src_bit_set = r > a_plus_b;
+            src += src_bit_set ? static_cast<vertex_t>(vertex_t{1} << (src_scale - (level + 1))) : 0;
+            dst_threshold = src_bit_set ? c_nrom : a_norm;
+          }
+          if (level < dst_scale) {
+            auto r = rands[rand_offset++];
+            auto dst_bit_set = r > dst_threshold;
+            dst += dst_bit_set ? static_cast<vertex_t>(vertex_t{1} << (dst_scale - (level + 1))) : 0;
+          }
+        }
+        return thrust::make_tuple(src, dst);
+      });
+    num_edges_generated += num_edges_to_generate;
+  }
+
+  return std::make_tuple(std::move(srcs), std::move(dsts));
+}


Just wondering, is it the algorithm used here?
https://www.cambridge.org/core/journals/network-science/article/linear-work-generation-of-rmat-graphs/68A0DDA58A7B84E9B3ACA2DBB123A16C

Possibly, but this code basically follows the algorithm used in graph 500 reference code.

Maybe not... I just skimmed the abstract, and it seems like the paper claims that edge generation time is just function of the number of edges to generate and irrelevant to scale.

Here, it is dependent on scale. Not sure the algorithm in the paper might actually be faster, but here, scale is pretty much limited, and R-mat graph generation is fast enough for our use cases.

naimnv · 2023-04-26T20:32:05Z

What do you mean?

Yeah right, the vertex IDs in U+V has be to be unique.

…n the object after scrambling

…arameter, fix it

…ipartite_rmat

BradReesWork · 2023-05-01T17:05:40Z

/merge

bi-partite R-mat graph generation API

6ebc067

seunghwak requested a review from a team as a code owner April 25, 2023 18:42

seunghwak self-assigned this Apr 25, 2023

seunghwak added feature request New feature or request non-breaking Non-breaking change labels Apr 25, 2023

seunghwak added this to the 23.06 milestone Apr 25, 2023

seunghwak requested review from BradReesWork, ChuckHastings, jnke2016 and naimnv April 25, 2023 18:43

seunghwak changed the title ~~[API] bi-partite R-mat graph generation.~~ [API][skip-ci] bi-partite R-mat graph generation. Apr 25, 2023

naimnv approved these changes Apr 26, 2023

View reviewed changes

seunghwak added 2 commits April 26, 2023 09:28

Merge branch 'branch-23.06' of github.com:rapidsai/cugraph into fea_b…

b7bc790

…ipartite_rmat

[no ci] initial implementation of bipartite R-mat generator

124702c

seunghwak changed the title ~~[API][skip-ci] bi-partite R-mat graph generation.~~ [skip-ci] bi-partite R-mat graph generation. Apr 26, 2023

seunghwak added 2 commits April 26, 2023 11:04

[no ci] fix copmile error

0c4cbad

remove unused seed from sramble_vertex_ids()

a6a4cc3

naimnv approved these changes Apr 26, 2023

View reviewed changes

seunghwak added 3 commits April 26, 2023 13:57

[no ci] delete inaccurate input parameter check

f7c2c73

update scramble_vertex_ids

5181f99

[no ci] update scramble_vertex_ids to take an R-value input and retur…

ebfc7e3

…n the object after scrambling

seunghwak added 2 commits April 26, 2023 15:11

[no ci] test R-mat generator was ignoring scramble_vertex_ids input p…

a930f6c

…arameter, fix it

[no ci] cleanup R-mat generator test

ca3572d

naimnv self-requested a review April 26, 2023 22:29

seunghwak added 3 commits April 26, 2023 17:50

create a separate file for bipartite R-mat grpah generator

0cdccab

cleanup R-mat test

79d5325

add bipartite R-mat generator code

c257d71

seunghwak requested a review from a team as a code owner April 27, 2023 00:51

seunghwak added 2 commits April 26, 2023 17:51

Merge branch 'branch-23.06' of github.com:rapidsai/cugraph into fea_b…

69c037e

…ipartite_rmat

copyright year

b364ca8

seunghwak changed the title ~~[skip-ci] bi-partite R-mat graph generation.~~ bi-partite R-mat graph generation. Apr 27, 2023

seunghwak changed the title ~~bi-partite R-mat graph generation.~~ Bipartite R-mat graph generation. Apr 27, 2023

seunghwak added 3 commits April 26, 2023 18:05

bi-partite to bipartite

8fea6f0

Merge branch 'branch-23.06' of github.com:rapidsai/cugraph into fea_b…

a5b9fd1

…ipartite_rmat

Merge branch 'branch-23.06' of github.com:rapidsai/cugraph into fea_b…

b9b900e

…ipartite_rmat

ChuckHastings approved these changes May 1, 2023

View reviewed changes

naimnv approved these changes May 1, 2023

View reviewed changes

BradReesWork approved these changes May 1, 2023

View reviewed changes

rapids-bot bot merged commit e271bad into rapidsai:branch-23.06 May 1, 2023

seunghwak deleted the fea_bipartite_rmat branch May 5, 2023 23:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bipartite R-mat graph generation. #3512

Bipartite R-mat graph generation. #3512

seunghwak commented Apr 25, 2023 •

edited by ChuckHastings

Loading

seunghwak commented Apr 25, 2023 •

edited

Loading

BradReesWork commented Apr 25, 2023

seunghwak commented Apr 25, 2023 •

edited

Loading

naimnv commented Apr 26, 2023 •

edited

Loading

ajschmidt8 commented Apr 26, 2023

seunghwak commented Apr 26, 2023

seunghwak commented Apr 26, 2023

ChuckHastings commented Apr 26, 2023

naimnv Apr 26, 2023

seunghwak Apr 27, 2023

seunghwak Apr 27, 2023

naimnv commented Apr 26, 2023

BradReesWork commented May 1, 2023

Bipartite R-mat graph generation. #3512

Bipartite R-mat graph generation. #3512

Conversation

seunghwak commented Apr 25, 2023 • edited by ChuckHastings Loading

seunghwak commented Apr 25, 2023 • edited Loading

BradReesWork commented Apr 25, 2023

seunghwak commented Apr 25, 2023 • edited Loading

naimnv commented Apr 26, 2023 • edited Loading

ajschmidt8 commented Apr 26, 2023

seunghwak commented Apr 26, 2023

seunghwak commented Apr 26, 2023

ChuckHastings commented Apr 26, 2023

naimnv Apr 26, 2023

Choose a reason for hiding this comment

seunghwak Apr 27, 2023

Choose a reason for hiding this comment

seunghwak Apr 27, 2023

Choose a reason for hiding this comment

naimnv commented Apr 26, 2023

BradReesWork commented May 1, 2023

seunghwak commented Apr 25, 2023 •

edited by ChuckHastings

Loading

seunghwak commented Apr 25, 2023 •

edited

Loading

seunghwak commented Apr 25, 2023 •

edited

Loading

naimnv commented Apr 26, 2023 •

edited

Loading