You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I run the GraphCast model with mpirun --allow-run-as-root -np 3 python train_graphcast.py, I encounter an error. However, when I use mpirun --allow-run-as-root -np 2 python train_graphcast.py, the model runs without any issues.
I am seeking help to identify the potential cause of this problem. Below is the output log from my program:
I wanted to follow up on this issue. Upon further investigation, I realized that the problem was not with the project code but with my local environment. Therefore, I am closing this issue.
For anyone encountering similar issues, I found the cause and solution related to the environment in this discussion: NVIDIA/nccl#976.
Version
0.5.0
On which installation method(s) does this occur?
Docker
Describe the issue
When I run the GraphCast model with
mpirun --allow-run-as-root -np 3 python train_graphcast.py
, I encounter an error. However, when I usempirun --allow-run-as-root -np 2 python train_graphcast.py
, the model runs without any issues.I am seeking help to identify the potential cause of this problem. Below is the output log from my program:
Minimum reproducible example
Relevant log output
Environment details
No response
The text was updated successfully, but these errors were encountered: