-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bipartite R-mat graph generation. #3512
Bipartite R-mat graph generation. #3512
Conversation
@BradReesWork Will this API be sufficient to address the issue #2075? This will take separate One may need to add One may need to call this function twice after swapping |
@seunghwak the num_edges will be the number of edges between the sets? Also how are you enforcing that edges are only between sets and no within a set? |
Yes.
Here, expectation is that source vertices belong to one set and destination vertices belong to the other set. Say there are two sets U and V. Say U has 2^u_scale vertices and V has 2^v_scale vertices. To generate edges from U to V, you call this function with src_scale = u_scale and dst_scale = v_scale. To generate edges from V to U, you call this function with src_scale = v_scale and dst_scale = u_scale. And if the caller wants vertex IDs for U to be in [0, 2^u_scale) and V to be in [2^u_scale, 2^u_scale + 2^v_scale), they need to add 2^u_scale for either source or vertex IDs after calling this function. |
Can the caller have any other options than adding |U| to destination vertexIDs (or adding |V| to source vertexIds) if the output edge-lists are used to create graph using |
@seunghwak, just an FYI. adding please review the page below to figure out how to skip CI for GitHub Actions. |
What do you mean? To explicitly create a bi-partite graph? We currently do not store bi-partitie graphs in a different format, but in the future and if it turns out to be hugely beneficial, we may consider taking (vertex ID within a set, set number) pairs instead of simple vertex IDs for k-partitie graphs. Not considering this right at this moment. |
@ChuckHastings Can we delete the deprecated R-mat generators (which takes a seed) or do we still need to keep them? |
I believe we need to keep them until the python code is modified to use the new C API approach. This work is tracked in #3480 |
thrust::transform( | ||
handle.get_thrust_policy(), | ||
thrust::make_counting_iterator(size_t{0}), | ||
thrust::make_counting_iterator(num_edges_to_generate), | ||
pair_first, | ||
// if a + b == 0.0, a_norm is irrelevant, if (1.0 - (a+b)) == 0.0, c_norm is irrelevant | ||
[src_scale, | ||
dst_scale, | ||
rands = rands.data(), | ||
a_plus_b = a + b, | ||
a_plus_c = a + c, | ||
a_norm = (a + b) > 0.0 ? a / (a + b) : 0.0, | ||
c_norm = (1.0 - (a + b)) > 0.0 ? c / (1.0 - (a + b)) : 0.0] __device__(auto i) { | ||
vertex_t src{0}; | ||
vertex_t dst{0}; | ||
size_t rand_offset = i * (src_scale + dst_scale); | ||
for (int level = 0; level < static_cast<int>(std::max(src_scale, dst_scale)); ++level) { | ||
auto dst_threshold = a_plus_c; | ||
if (level < src_scale) { | ||
auto r = rands[rand_offset++]; | ||
auto src_bit_set = r > a_plus_b; | ||
src += src_bit_set ? static_cast<vertex_t>(vertex_t{1} << (src_scale - (level + 1))) : 0; | ||
dst_threshold = src_bit_set ? c_nrom : a_norm; | ||
} | ||
if (level < dst_scale) { | ||
auto r = rands[rand_offset++]; | ||
auto dst_bit_set = r > dst_threshold; | ||
dst += dst_bit_set ? static_cast<vertex_t>(vertex_t{1} << (dst_scale - (level + 1))) : 0; | ||
} | ||
} | ||
return thrust::make_tuple(src, dst); | ||
}); | ||
num_edges_generated += num_edges_to_generate; | ||
} | ||
|
||
return std::make_tuple(std::move(srcs), std::move(dsts)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wondering, is it the algorithm used here?
https://www.cambridge.org/core/journals/network-science/article/linear-work-generation-of-rmat-graphs/68A0DDA58A7B84E9B3ACA2DBB123A16C
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly, but this code basically follows the algorithm used in graph 500 reference code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe not... I just skimmed the abstract, and it seems like the paper claims that edge generation time is just function of the number of edges to generate and irrelevant to scale.
Here, it is dependent on scale. Not sure the algorithm in the paper might actually be faster, but here, scale is pretty much limited, and R-mat graph generation is fast enough for our use cases.
Yeah right, the vertex IDs in U+V has be to be unique. |
/merge |
Addresses #2075
Closes #3532
This function will generate (source, destination) vertex ID pairs. Source vertex IDs will have values in
[0, 2^src_scale)
and destination vertex IDs will have values in[0, 2^dst_scale)
.Additionally,
scramble_vertex_ids
function had unused input parameters and it was internally erroneously setting scale. Fixed this bug.Rmat_Usecase was ignoring scramble_vertex_ids flag, fixed this bug.
Added
scramble_vertex_ids
that take a just single vertex list (instead of src, dst pair)Update
scramble_vertex_ids
to take input vectors as R-values and return scrambled vectors (instead taking in/out parameters)