-
Notifications
You must be signed in to change notification settings - Fork 311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove edge renumber map from the homogeneous sampling API #4775
Remove edge renumber map from the homogeneous sampling API #4775
Conversation
cpp/src/c_api/neighbor_sampling.cpp
Outdated
@@ -1011,6 +1012,8 @@ struct neighbor_sampling_functor : public cugraph::c_api::abstract_functor { | |||
options_.with_replacement_}, | |||
do_expensive_check_); | |||
} else { | |||
raft::print_device_vector( | |||
"labels", (*start_vertex_labels).data(), (*start_vertex_labels).size(), std::cout); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to keep this? If this is just for debugging, better delete.
if (result_srcs.size() == 0) { | ||
// Update the 'edgelist_label_offsets' array to be proportional to the | ||
// number of labels | ||
result_offsets->resize(num_labels + 1, handle.get_stream()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any guarantee that if results_srcs.size() != 0, result_offsets.size() == num_labels + 1? Say, we have 100 labels but only one label has a sampled edge, will this still work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Say, we have 100 labels but only one label has a sampled edge, will this still work?
This is a valid edge case to consider. I believe the current implementation of neighborhood sampling only keeps track of vertices that have outgoing edges. @ChuckHastings any thought?
…2_remove-edge-renumber-map
@@ -105,6 +107,29 @@ neighbor_sample_impl(raft::handle_t const& handle, | |||
graph_view_t<vertex_t, edge_t, false, multi_gpu> modified_graph_view = graph_view; | |||
edge_masks_vector.reserve(num_edge_types); | |||
|
|||
label_t num_unique_labels = 0; | |||
|
|||
std::optional<rmm::device_uvector<label_t>> cp_starting_vertex_labels{std::nullopt}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, it seems like you are not using this array after computing num_unique_labels
. You can define this variable inside the if statement below.
if (edge_types) { (*level_result_edge_type_vectors).push_back(std::move(*edge_types)); } | ||
if (labels) { (*level_result_label_vectors).push_back(std::move(*labels)); } | ||
level_result_src.resize( | ||
std::reduce(level_sizes_edge_types.begin(), level_sizes_edge_types.end()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tested this with more than one hops? You're defining level_sizes_edge_types
outside the for (size_t hop = 0; hop < num_hops; ++hop)
loop. In hop, this vector will includes the sizes from hop 0. Better minimize the scope of variables, and you may better define this inside the outermost loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And can't this just be
auto old_size = level_result_src.size(); <-old_size for copying.
level_result_src.resizse(old_size + srcs.size())
? Should we really create level_sizes_edge_types
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No longer needed since I am now using old_size
handle.get_stream()); | ||
level_result_dst.resize( | ||
std::reduce(level_sizes_edge_types.begin(), level_sizes_edge_types.end()), | ||
handle.get_stream()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
level_result_dst.resize(level_result_src.size(), handle.get_stream());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using old_size + srcs.size()
std::reduce(level_sizes_edge_types.begin(), level_sizes_edge_types.end() - 1), | ||
dsts.begin(), | ||
srcs.size(), | ||
handle.get_stream()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar for all the copies and resizes here.
Also note that frequent resizes can be expensive (when # edge types is large). In this case, we may better create a large vector after iterating over the entire set of edge types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resize is basically create a new array and copying the current elements. This copy part can become quite costly if we are resizing many times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did the same for all copies
@@ -339,7 +457,7 @@ neighbor_sample_impl(raft::handle_t const& handle, | |||
if (return_hops) { | |||
result_hops = rmm::device_uvector<int32_t>(result_size, handle.get_stream()); | |||
output_offset = 0; | |||
for (size_t i = 0; i < fan_out.size(); ++i) { | |||
for (size_t i = 0; i < num_hops; ++i) { // FIXME: replace this by the number of hops |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this comment still relevant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No longer relevant
thrust::inclusive_scan(handle.get_thrust_policy(), | ||
result_offsets->begin() + 1, | ||
result_offsets->end(), | ||
result_offsets->begin() + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess result_offsets[0]
can hold a garbage value here.
std::move(result_labels), | ||
label_to_output_comm_rank); | ||
|
||
if (result_labels && (result_offsets->size() != num_unique_labels + 1)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure the code in this if block is correct.
num_unique_labels
is computed using starting_vertex_labels
. starting vertex labels
may include only a subset of the global labels.
Think about what will happen if unique labels are 0, 7, 13. Do you think this code will still work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I understand this edge case. Aren't we specifying the labels for only the starting vertices? And starting vertex labels offsets is being passed from the CAPI hence the labels are contiguous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By global do you mean the multi-gpu case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think about the case label_to_output_comm_rank
is defined.
In the multi-GPU case,
The unique labels in starting_vertex_labels
and the labels mapped to this GPU (in label_to_output_comm_rank
) may not coincide (I don't see any documentation about this).
What happens if the unique labels in starting_vertex_labels
are 0, 7, 13 and the labels mapped to this GPU are 1, 4, 5?
Do you think this code will still work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -105,6 +107,29 @@ neighbor_sample_impl(raft::handle_t const& handle, | |||
graph_view_t<vertex_t, edge_t, false, multi_gpu> modified_graph_view = graph_view; | |||
edge_masks_vector.reserve(num_edge_types); | |||
|
|||
label_t num_unique_labels = 0; | |||
|
|||
std::optional<rmm::device_uvector<label_t>> cp_starting_vertex_labels{std::nullopt}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd better reduce the scope of this variable.
You are using num_unique_labels
or cp_starting_vertex_labels
only after invoking shuffle_and_organize_output
.
The input vertex labels are const
and won't be modified in this function (even though you're updating starting_vertices
to point different arrays).
No reason to pre-allocate memory here and also define this variable way before it is actually used. This goes against minimizing memory footprint & code readability.
|
||
if (num_edge_types > 1) { modified_graph_view.clear_edge_mask(); } | ||
} | ||
|
||
level_sizes.push_back(level_result_src.size()); | ||
level_result_src_vectors.push_back(std::move(level_result_src)); | ||
level_result_dst_vectors.push_back(std::move(level_result_dst)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After calling std::move, the variable is in valid but arbitrary state. You should not use re-use the moved-out variable assuming that it's size is reset to 0. And to minimize the scope of per-level (per-hop) variables, we should better define these variables inside the for (auto hop = 0; hop < num_hops; hop++) { ... }
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The input vertex labels are const and won't be modified in this function (even though you're updating starting_vertices to point different arrays).
As you mentioned, vertex labels are constant hence it can't be sorted and I need the initial labels downstream. I reduced the scope of its copy
std::move(result_edge_types), | ||
std::move(result_hops), | ||
std::move(result_labels), | ||
std::move(result_offsets)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
result_offsets
here stores result label offsets, right? Shouldn't we better rename to result_label_offsets
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. It is more descriptive
…2_remove-edge-renumber-map
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except for minor cosmetic issues.
auto num_hops = ((fan_out.size() % num_edge_types) == 0) | ||
? (fan_out.size() / num_edge_types) | ||
: ((fan_out.size() / num_edge_types) + 1); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may better define this before defining level_result_src_vectors
and level_result_dst_vectors
. So, we can define all level_resutl_..._vectors
in one code block.
@@ -252,11 +337,6 @@ neighbor_sample_impl(raft::handle_t const& handle, | |||
|
|||
starting_vertices = | |||
raft::device_span<vertex_t const>(frontier_vertices.data(), frontier_vertices.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume we don't need this code anymore. We are directly creating a span from frontier_vertices
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I initialize frontier_vertices
to starting_vertices
?
/merge |
Adds support for heterogeneous distributed sampling to the cuGraph distributed sampler. Prerequisite for exposing this functionality to cuGraph-PyG. Has been initially tested with cuGraph-PyG. Updates the distributed sampler to use the new sampling API. Merge after #4775, #4827, #4820 Closes #4773 Closes #4401 Authors: - Alex Barghi (https://github.com/alexbarghi-nv) - Joseph Nke (https://github.com/jnke2016) - Ralph Liu (https://github.com/nv-rliu) Approvers: - Rick Ratzel (https://github.com/rlratzel) URL: #4795
edge renumbering is only supported in heterogeneous neighborhood sampling hence trying to extract it from the PLC API leads to a segmentation fault
This PR :