Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Support Seed Retention for Sampling with Renumbering #4355

Merged
merged 40 commits into from
May 14, 2024

Conversation

alexbarghi-nv
Copy link
Member

@alexbarghi-nv alexbarghi-nv commented Apr 17, 2024

Exposes the ability to retain seeds even if they have no outgoing edges (and therefore are not sampled). Required to fix the current bug in cuGraph-PyG involving batch size and dropping seeds.

Currently, this functionality can't be exposed through the MG Python API (#4358) but exposing it through the pylibcugraph API is sufficient to resolve this issue. This PR does expose it through the SG Python API.

@alexbarghi-nv alexbarghi-nv added feature request New feature or request non-breaking Non-breaking change labels Apr 17, 2024
@alexbarghi-nv alexbarghi-nv added this to the 24.06 milestone Apr 17, 2024
@alexbarghi-nv alexbarghi-nv self-assigned this Apr 17, 2024
@alexbarghi-nv alexbarghi-nv marked this pull request as ready for review April 19, 2024 17:19
@alexbarghi-nv alexbarghi-nv requested review from a team as code owners April 19, 2024 17:19
Copy link
Contributor

@naimnv naimnv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Shouldn't we add C++ test for the added option?

Copy link
Contributor

@seunghwak seunghwak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (C/C++ part)

@ChuckHastings
Copy link
Collaborator

LGTM Shouldn't we add C++ test for the added option?

This is not an added C++ option, it already existed. It was never exposed/used through the C API. I believe (didn't exhaustively look) that it is already tested in sampling_post_processing_test.cu.

@alexbarghi-nv
Copy link
Member Author

/merge

Copy link
Contributor

@rlratzel rlratzel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just reviewed the python changes. I had one question and one optional suggestion, but neither should hold up approval.

Comment on lines +151 to +154
label_offsets: integer sequence, optional (default=None)
Offsets of each label within the start vertex list.
Only used if retain_seeds is True. Required if retain_seeds
is True.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a suggestion: maybe change this so that if this is specified, the retain_seeds=True behavior is assumed and then drop the retain_seeds arg completely? If you decide to do that, then be sure to also include the description of the retain_seeds behavior in this description.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a reasonable suggestion. I'll add that to the new PR.

@@ -342,13 +353,15 @@ def uniform_neighbor_sample(
else None,
h_fan_out=fanout_vals,
with_replacement=with_replacement,
do_expensive_check=False,
do_expensive_check=True,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was this intentional?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not intentional, whoops. We need to fix that.

@rapids-bot rapids-bot bot merged commit 45371cb into rapidsai:branch-24.06 May 14, 2024
130 of 131 checks passed
@alexbarghi-nv alexbarghi-nv deleted the support-seed-retention branch May 14, 2024 14:32
rapids-bot bot pushed a commit that referenced this pull request May 15, 2024
…ample (#4421)

#4355 accidentally added an expensive check to SG uniform neighbor sample.  This PR reverts that addition.

Authors:
  - Alex Barghi (https://github.com/alexbarghi-nv)

Approvers:
  - Rick Ratzel (https://github.com/rlratzel)

URL: #4421
rapids-bot bot pushed a commit that referenced this pull request May 30, 2024
Distributed sampling in cuGraph-PyG.  Also renames the existing API to clarify that it is dask based.
Adds a dependency on `tensordict` for `cuGraph-PyG` which supports the new `TensorDictFeatureStore`.
Also no longer installs `torch-cluster` and `torch-spline-conv` in CI for testing since that results in an `ImportError` and neither of those packages are needed.

Requires PyG 2.5.  Should be merged after #4335 

Merge after #4355 

Closes #4248 
Closes #4249 
Closes #3383 
Closes #3942 
Closes #3836 
Closes #4202 
Closes #4051 
Closes #4326 
Closes #4252 
Partially addresses #3805

Authors:
  - Alex Barghi (https://github.com/alexbarghi-nv)
  - Seunghwa Kang (https://github.com/seunghwak)
  - Tingyu Wang (https://github.com/tingyu66)
  - Ralph Liu (https://github.com/nv-rliu)

Approvers:
  - Tingyu Wang (https://github.com/tingyu66)
  - Brad Rees (https://github.com/BradReesWork)
  - Jake Awe (https://github.com/AyodeAwe)

URL: #4384
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuGraph feature request New feature or request non-breaking Non-breaking change python
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants