You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am running the 8B model as described in Table 1 of the Zero paper. on 8 GPUs.
I notice that the "contiguous_gradients" setting in the config seems to control whether "Reduce" or "Allreduce" is used for gradient reduction in backward pass. That is, I see the following in the NCCL Debug log only when "contiguous_gradients" is "true":
I see this being referenced in #264 but it wasn't clear why "contiguous_gradients" should control the communication pattern. As the answer in #264 mentioned it should only "defragment the memory during backward propagation".
The text was updated successfully, but these errors were encountered:
I am running the 8B model as described in Table 1 of the Zero paper. on 8 GPUs.
I notice that the "contiguous_gradients" setting in the config seems to control whether "Reduce" or "Allreduce" is used for gradient reduction in backward pass. That is, I see the following in the NCCL Debug log only when "contiguous_gradients" is "true":
I see this being referenced in #264 but it wasn't clear why "contiguous_gradients" should control the communication pattern. As the answer in #264 mentioned it should only "defragment the memory during backward propagation".
The text was updated successfully, but these errors were encountered: