-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG]: MG ego_graph
Different from SG
#4190
Comments
What's the minimum scale (i.e. # GPUs) to reproduce this? Can you reproduce this with 2 GPUs? |
It's reproducible with 2-GPUs. This was the result from running on a lab machine.
|
After looking at a trend of 1 Node 8-GPU runs across multiple days, it appears that the failure is transient. |
You mean this is a heisenbug? Sounds worse but let me try to reproduce this first. |
How often can you reproduce this? (Say run test_egonet_mg.py 10 times, how many times you see at least one failure?) I am running this on my local system with 2 GPUs, and I can't reproduce the test failure. Let me try this on a DGX node as well. |
Never mind, I reproduced this. |
Closes #4190 cc: @jnke2016 This PR adds a function to `comms.py` which returns a mapping of workers to ranks. This is then sorted in `part_utils.py` before being used to submit jobs to `dask`. This should fix a bug in MG `ego_graph` where induced subgraphs were being returned in seemingly random orders (while the results are correct). Authors: - Ralph Liu (https://github.com/nv-rliu) Approvers: - Joseph Nke (https://github.com/jnke2016) - Don Acosta (https://github.com/acostadon) - Rick Ratzel (https://github.com/rlratzel) - Ray Douglass (https://github.com/raydouglass) URL: #4262
Version
24.04
Which installation method(s) does this occur on?
Source
Describe the bug.
Currently, the MG implementation of
ego_graph
returns a value that differs from the SG implementation when passed multiplen
values, akaseeds
.Minimum reproducible example
Relevant log output
Environment details
Other/Misc.
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: