Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bipartite R-mat graph generation. #3512

Merged
merged 18 commits into from
May 1, 2023

Conversation

seunghwak
Copy link
Contributor

@seunghwak seunghwak commented Apr 25, 2023

Addresses #2075
Closes #3532

This function will generate (source, destination) vertex ID pairs. Source vertex IDs will have values in [0, 2^src_scale) and destination vertex IDs will have values in [0, 2^dst_scale).

Additionally,

  • scramble_vertex_ids function had unused input parameters and it was internally erroneously setting scale. Fixed this bug.

  • Rmat_Usecase was ignoring scramble_vertex_ids flag, fixed this bug.

  • Added scramble_vertex_ids that take a just single vertex list (instead of src, dst pair)

  • Update scramble_vertex_ids to take input vectors as R-values and return scrambled vectors (instead taking in/out parameters)

@seunghwak seunghwak requested a review from a team as a code owner April 25, 2023 18:42
@seunghwak seunghwak self-assigned this Apr 25, 2023
@seunghwak seunghwak added feature request New feature or request non-breaking Non-breaking change labels Apr 25, 2023
@seunghwak seunghwak added this to the 23.06 milestone Apr 25, 2023
@seunghwak
Copy link
Contributor Author

seunghwak commented Apr 25, 2023

@BradReesWork Will this API be sufficient to address the issue #2075? This will take separate src_scale and dst_scale instead of just taking a single scale.

One may need to add 2^src_scale to generate destination vertex IDs if one wants vertex set U and vertex set V (in the bi-partite graph with two sets U & V) to have non-overlapping vertex IDs.

One may need to call this function twice after swapping src_scale and dst_scale to have edges from set U to set V and also the edges from set V to set U.

@seunghwak seunghwak changed the title [API] bi-partite R-mat graph generation. [API][skip-ci] bi-partite R-mat graph generation. Apr 25, 2023
@BradReesWork
Copy link
Member

@seunghwak the num_edges will be the number of edges between the sets? Also how are you enforcing that edges are only between sets and no within a set?

@seunghwak
Copy link
Contributor Author

seunghwak commented Apr 25, 2023

the num_edges will be the number of edges between the sets?

Yes.

Also how are you enforcing that edges are only between sets and no within a set?

Here, expectation is that source vertices belong to one set and destination vertices belong to the other set.

Say there are two sets U and V.

Say U has 2^u_scale vertices and V has 2^v_scale vertices.

To generate edges from U to V, you call this function with src_scale = u_scale and dst_scale = v_scale.

To generate edges from V to U, you call this function with src_scale = v_scale and dst_scale = u_scale.

And if the caller wants vertex IDs for U to be in [0, 2^u_scale) and V to be in [2^u_scale, 2^u_scale + 2^v_scale), they need to add 2^u_scale for either source or vertex IDs after calling this function.

@naimnv
Copy link
Contributor

naimnv commented Apr 26, 2023

And if the caller wants vertex IDs for U to be in [0, 2^u_scale) and V to be in [2^u_scale, 2^u_scale + 2^v_scale), they need to add 2^u_scale for either source or vertex IDs after calling this function.

Can the caller have any other options than adding |U| to destination vertexIDs (or adding |V| to source vertexIds) if the output edge-lists are used to create graph using cugraph::create_graph_from_edgelist?

@ajschmidt8
Copy link
Member

@seunghwak, just an FYI. adding [skip-ci] to the PR title doesn't do anything for GitHub Actions.

please review the page below to figure out how to skip CI for GitHub Actions.

@seunghwak
Copy link
Contributor Author

Can the caller have any other options than adding |U| to destination vertexIDs (or adding |V| to source vertexIds) if the output edge-lists are used to create graph using cugraph::create_graph_from_edgelist?

What do you mean? To explicitly create a bi-partite graph? We currently do not store bi-partitie graphs in a different format, but in the future and if it turns out to be hugely beneficial, we may consider taking (vertex ID within a set, set number) pairs instead of simple vertex IDs for k-partitie graphs. Not considering this right at this moment.

@seunghwak seunghwak changed the title [API][skip-ci] bi-partite R-mat graph generation. [skip-ci] bi-partite R-mat graph generation. Apr 26, 2023
@seunghwak
Copy link
Contributor Author

@ChuckHastings Can we delete the deprecated R-mat generators (which takes a seed) or do we still need to keep them?

@ChuckHastings
Copy link
Collaborator

@ChuckHastings Can we delete the deprecated R-mat generators (which takes a seed) or do we still need to keep them?

I believe we need to keep them until the python code is modified to use the new C API approach. This work is tracked in #3480

Comment on lines 255 to 291
thrust::transform(
handle.get_thrust_policy(),
thrust::make_counting_iterator(size_t{0}),
thrust::make_counting_iterator(num_edges_to_generate),
pair_first,
// if a + b == 0.0, a_norm is irrelevant, if (1.0 - (a+b)) == 0.0, c_norm is irrelevant
[src_scale,
dst_scale,
rands = rands.data(),
a_plus_b = a + b,
a_plus_c = a + c,
a_norm = (a + b) > 0.0 ? a / (a + b) : 0.0,
c_norm = (1.0 - (a + b)) > 0.0 ? c / (1.0 - (a + b)) : 0.0] __device__(auto i) {
vertex_t src{0};
vertex_t dst{0};
size_t rand_offset = i * (src_scale + dst_scale);
for (int level = 0; level < static_cast<int>(std::max(src_scale, dst_scale)); ++level) {
auto dst_threshold = a_plus_c;
if (level < src_scale) {
auto r = rands[rand_offset++];
auto src_bit_set = r > a_plus_b;
src += src_bit_set ? static_cast<vertex_t>(vertex_t{1} << (src_scale - (level + 1))) : 0;
dst_threshold = src_bit_set ? c_nrom : a_norm;
}
if (level < dst_scale) {
auto r = rands[rand_offset++];
auto dst_bit_set = r > dst_threshold;
dst += dst_bit_set ? static_cast<vertex_t>(vertex_t{1} << (dst_scale - (level + 1))) : 0;
}
}
return thrust::make_tuple(src, dst);
});
num_edges_generated += num_edges_to_generate;
}

return std::make_tuple(std::move(srcs), std::move(dsts));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly, but this code basically follows the algorithm used in graph 500 reference code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not... I just skimmed the abstract, and it seems like the paper claims that edge generation time is just function of the number of edges to generate and irrelevant to scale.

Here, it is dependent on scale. Not sure the algorithm in the paper might actually be faster, but here, scale is pretty much limited, and R-mat graph generation is fast enough for our use cases.

@naimnv
Copy link
Contributor

naimnv commented Apr 26, 2023

What do you mean?

Yeah right, the vertex IDs in U+V has be to be unique.

@naimnv naimnv self-requested a review April 26, 2023 22:29
@seunghwak seunghwak requested a review from a team as a code owner April 27, 2023 00:51
@seunghwak seunghwak changed the title [skip-ci] bi-partite R-mat graph generation. bi-partite R-mat graph generation. Apr 27, 2023
@seunghwak seunghwak changed the title bi-partite R-mat graph generation. Bipartite R-mat graph generation. Apr 27, 2023
@BradReesWork
Copy link
Member

/merge

@rapids-bot rapids-bot bot merged commit e271bad into rapidsai:branch-23.06 May 1, 2023
@seunghwak seunghwak deleted the fea_bipartite_rmat branch May 5, 2023 23:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement C++ API for Bipartite RMAT graph generation
5 participants