Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove edge renumber map from the homogeneous sampling API #4775

Conversation

jnke2016
Copy link
Contributor

@jnke2016 jnke2016 commented Nov 19, 2024

edge renumbering is only supported in heterogeneous neighborhood sampling hence trying to extract it from the PLC API leads to a segmentation fault

This PR :

  1. fixes a bug in homogeneous neighborhood sampling (both uniform and bias).
  2. properly handle vertex_type_offsets when performing heterogeneous renumbering
  3. fixes a typo in the homogeneous neighborhood sampling docstrings

@jnke2016 jnke2016 requested a review from a team as a code owner November 19, 2024 23:05
@jnke2016 jnke2016 changed the title Branch 24.12 remove edge renumber map Remove edge renumber map from the homogeneous sampling API Nov 19, 2024
@jnke2016 jnke2016 requested a review from a team as a code owner November 20, 2024 21:03
@ChuckHastings ChuckHastings added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Nov 20, 2024
@ChuckHastings ChuckHastings added this to the 24.12 milestone Nov 20, 2024
@@ -1011,6 +1012,8 @@ struct neighbor_sampling_functor : public cugraph::c_api::abstract_functor {
options_.with_replacement_},
do_expensive_check_);
} else {
raft::print_device_vector(
"labels", (*start_vertex_labels).data(), (*start_vertex_labels).size(), std::cout);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to keep this? If this is just for debugging, better delete.

if (result_srcs.size() == 0) {
// Update the 'edgelist_label_offsets' array to be proportional to the
// number of labels
result_offsets->resize(num_labels + 1, handle.get_stream());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any guarantee that if results_srcs.size() != 0, result_offsets.size() == num_labels + 1? Say, we have 100 labels but only one label has a sampled edge, will this still work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Say, we have 100 labels but only one label has a sampled edge, will this still work?

This is a valid edge case to consider. I believe the current implementation of neighborhood sampling only keeps track of vertices that have outgoing edges. @ChuckHastings any thought?

@@ -105,6 +107,29 @@ neighbor_sample_impl(raft::handle_t const& handle,
graph_view_t<vertex_t, edge_t, false, multi_gpu> modified_graph_view = graph_view;
edge_masks_vector.reserve(num_edge_types);

label_t num_unique_labels = 0;

std::optional<rmm::device_uvector<label_t>> cp_starting_vertex_labels{std::nullopt};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, it seems like you are not using this array after computing num_unique_labels. You can define this variable inside the if statement below.

if (edge_types) { (*level_result_edge_type_vectors).push_back(std::move(*edge_types)); }
if (labels) { (*level_result_label_vectors).push_back(std::move(*labels)); }
level_result_src.resize(
std::reduce(level_sizes_edge_types.begin(), level_sizes_edge_types.end()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tested this with more than one hops? You're defining level_sizes_edge_types outside the for (size_t hop = 0; hop < num_hops; ++hop) loop. In hop, this vector will includes the sizes from hop 0. Better minimize the scope of variables, and you may better define this inside the outermost loop.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And can't this just be

auto old_size = level_result_src.size(); <-old_size for copying.
level_result_src.resizse(old_size + srcs.size())

? Should we really create level_sizes_edge_types?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No longer needed since I am now using old_size

handle.get_stream());
level_result_dst.resize(
std::reduce(level_sizes_edge_types.begin(), level_sizes_edge_types.end()),
handle.get_stream());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

level_result_dst.resize(level_result_src.size(), handle.get_stream());

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using old_size + srcs.size()

cpp/src/sampling/neighbor_sampling_impl.hpp Outdated Show resolved Hide resolved
std::reduce(level_sizes_edge_types.begin(), level_sizes_edge_types.end() - 1),
dsts.begin(),
srcs.size(),
handle.get_stream());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar for all the copies and resizes here.

Also note that frequent resizes can be expensive (when # edge types is large). In this case, we may better create a large vector after iterating over the entire set of edge types.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resize is basically create a new array and copying the current elements. This copy part can become quite costly if we are resizing many times.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did the same for all copies

@@ -339,7 +457,7 @@ neighbor_sample_impl(raft::handle_t const& handle,
if (return_hops) {
result_hops = rmm::device_uvector<int32_t>(result_size, handle.get_stream());
output_offset = 0;
for (size_t i = 0; i < fan_out.size(); ++i) {
for (size_t i = 0; i < num_hops; ++i) { // FIXME: replace this by the number of hops
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment still relevant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No longer relevant

thrust::inclusive_scan(handle.get_thrust_policy(),
result_offsets->begin() + 1,
result_offsets->end(),
result_offsets->begin() + 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess result_offsets[0] can hold a garbage value here.

std::move(result_labels),
label_to_output_comm_rank);

if (result_labels && (result_offsets->size() != num_unique_labels + 1)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure the code in this if block is correct.

num_unique_labels is computed using starting_vertex_labels. starting vertex labels may include only a subset of the global labels.

Think about what will happen if unique labels are 0, 7, 13. Do you think this code will still work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand this edge case. Aren't we specifying the labels for only the starting vertices? And starting vertex labels offsets is being passed from the CAPI hence the labels are contiguous.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By global do you mean the multi-gpu case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think about the case label_to_output_comm_rank is defined.

In the multi-GPU case,

The unique labels in starting_vertex_labels and the labels mapped to this GPU (in label_to_output_comm_rank) may not coincide (I don't see any documentation about this).

What happens if the unique labels in starting_vertex_labels are 0, 7, 13 and the labels mapped to this GPU are 1, 4, 5?

Do you think this code will still work?

Copy link
Member

@alexbarghi-nv alexbarghi-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -105,6 +107,29 @@ neighbor_sample_impl(raft::handle_t const& handle,
graph_view_t<vertex_t, edge_t, false, multi_gpu> modified_graph_view = graph_view;
edge_masks_vector.reserve(num_edge_types);

label_t num_unique_labels = 0;

std::optional<rmm::device_uvector<label_t>> cp_starting_vertex_labels{std::nullopt};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd better reduce the scope of this variable.

You are using num_unique_labels or cp_starting_vertex_labels only after invoking shuffle_and_organize_output.

The input vertex labels are const and won't be modified in this function (even though you're updating starting_vertices to point different arrays).

No reason to pre-allocate memory here and also define this variable way before it is actually used. This goes against minimizing memory footprint & code readability.


if (num_edge_types > 1) { modified_graph_view.clear_edge_mask(); }
}

level_sizes.push_back(level_result_src.size());
level_result_src_vectors.push_back(std::move(level_result_src));
level_result_dst_vectors.push_back(std::move(level_result_dst));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After calling std::move, the variable is in valid but arbitrary state. You should not use re-use the moved-out variable assuming that it's size is reset to 0. And to minimize the scope of per-level (per-hop) variables, we should better define these variables inside the for (auto hop = 0; hop < num_hops; hop++) { ... }.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The input vertex labels are const and won't be modified in this function (even though you're updating starting_vertices to point different arrays).

As you mentioned, vertex labels are constant hence it can't be sorted and I need the initial labels downstream. I reduced the scope of its copy

std::move(result_edge_types),
std::move(result_hops),
std::move(result_labels),
std::move(result_offsets));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

result_offsets here stores result label offsets, right? Shouldn't we better rename to result_label_offsets?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. It is more descriptive

Copy link
Contributor

@seunghwak seunghwak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except for minor cosmetic issues.

auto num_hops = ((fan_out.size() % num_edge_types) == 0)
? (fan_out.size() / num_edge_types)
: ((fan_out.size() / num_edge_types) + 1);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may better define this before defining level_result_src_vectors and level_result_dst_vectors. So, we can define all level_resutl_..._vectors in one code block.

@@ -252,11 +337,6 @@ neighbor_sample_impl(raft::handle_t const& handle,

starting_vertices =
raft::device_span<vertex_t const>(frontier_vertices.data(), frontier_vertices.size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume we don't need this code anymore. We are directly creating a span from frontier_vertices.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I initialize frontier_vertices to starting_vertices?

@ChuckHastings
Copy link
Collaborator

/merge

@rapids-bot rapids-bot bot merged commit 50a11b1 into rapidsai:branch-24.12 Dec 4, 2024
73 checks passed
rapids-bot bot pushed a commit that referenced this pull request Jan 13, 2025
Adds support for heterogeneous distributed sampling to the cuGraph distributed sampler.  Prerequisite for exposing this functionality to cuGraph-PyG.  Has been initially tested with cuGraph-PyG.

Updates the distributed sampler to use the new sampling API.

Merge after #4775, #4827, #4820

Closes #4773 
Closes #4401

Authors:
  - Alex Barghi (https://github.com/alexbarghi-nv)
  - Joseph Nke (https://github.com/jnke2016)
  - Ralph Liu (https://github.com/nv-rliu)

Approvers:
  - Rick Ratzel (https://github.com/rlratzel)

URL: #4795
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuGraph improvement Improvement / enhancement to an existing function non-breaking Non-breaking change python
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants