You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Ok thanks. We often see two nodes Tree BusBW figures that exceed the theoretical limit of the network cards due to way it's calculated. But you're only getting 60GB/s from Ring - that seems a little low for 8x100Gbit/s NICs.
On an Nvidia DGX-A100 system we have 8x200Gbit/s IB cards and we measure 192GB/s AllReduce BusBW.
Do you have GDRDMA enabled? Is this RoCE or InfiniBand? Does this system have NVSwitches ?
Do you want to share a NCCL_DEBUG=INFO log and a NCCL_TOPO_DUMP_FILE=system.xml for us to check the configuration?
Hello, we found that BusBW exceeded the theoretical limit of the network card when executing 2-node Tree-based Allreduce, and saw your reply in other question. Could you give us more details? Thanks.
The text was updated successfully, but these errors were encountered:
Originally posted by @AddyLaddy in #812
Hello, we found that BusBW exceeded the theoretical limit of the network card when executing 2-node Tree-based Allreduce, and saw your reply in other question. Could you give us more details? Thanks.
The text was updated successfully, but these errors were encountered: