Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AGX orin with nccl #1561

Open
zuyulong opened this issue Jan 3, 2025 · 0 comments
Open

AGX orin with nccl #1561

zuyulong opened this issue Jan 3, 2025 · 0 comments

Comments

@zuyulong
Copy link

zuyulong commented Jan 3, 2025

machine:
2 *Jetson AGX Orin 64GB

env:
Jetpack 5.1.1
Python 3.8.10
NCCL 2.11.4+cuda11.4
Pytorch v1.11.0

The pytorch i used is provided by NVIDIA;
PyTorch for Jetson
I try to build a distributed development environment based on AGX Orin, and communicate using nccl.
I’ve tried version 2.1 of pytorch in the past, But it doesn’t seem to provide a distribution module.

#pytorch v2.1.0

import pytorch
torch.distributed.is_available( )
False

Then i switched the version to v1.11.0, but i met the following problem:

#pytorch v1.11.0

import pytorch
torch.distributed.is_available( )
True
torch.distributed.is_nccl_available()
False
torch.cuda.nccl.is_available(torch.randn(1).cuda())
/usr/local/lib/python3.8/dist-packages/torch/cuda/nccl.py:15: UserWarning: PyTorch is not compiled with NCCL support
warnings.warn('PyTorch is not compiled with NCCL support')
False

I want to know dose the orin support NCCL? And how to solve the problem of use NCCL?Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant