[distributed] NCCL Backend doesn't support torch.bool data type #24137

apsdehal · 2019-08-10T00:19:50Z

🐛 Bug

In version 1.2.0, NCCL backend doesn't support torch.bool datatype. Broadcasting a tensor of this type throws error "RuntimeError: Unsupported data type for NCCL process group".

To Reproduce

Steps to reproduce the behavior:

Create a file test.py with following contents:

import torch
import argparse
from torch import distributed as dist


parser = argparse.ArgumentParser()
parser.add_argument("--local_rank", type=int)

args = parser.parse_args()

torch.distributed.init_process_group("nccl")

local_rank = args.local_rank

device = torch.device(local_rank)

if local_rank == 0:
    element = False
else:
    element = True


def broadcast_scalar(scalar, src=0, device="cpu"):
    scalar_tensor = torch.tensor(scalar).to(device)
    with torch.no_grad():
        scalar_tensor = dist.broadcast(scalar_tensor, src)
    return scalar_tensor.item()


broadcast_scalar(element, src=0, device=device)

Run it with following command:
python -u -m torch.distributed.launch --nproc_per_node 2 test.py

This has been tested on 2 GPUs.

Expected behavior

NCCL backend should support the bool datatype.
Current workaround fix: Change the datatype to int by doing .long() before broadcasting.

Environment

Is debug build: No
CUDA used to build PyTorch: 10.0.130

OS: Ubuntu 18.04.1 LTS
GCC version: (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
CMake version: version 3.10.2

Python version: 3.6
Is CUDA available: Yes
CUDA runtime version: 9.0.176
GPU models and configuration:
GPU 0: Quadro GP100
GPU 1: Quadro GP100

Nvidia driver version: 410.79
cuDNN version: Could not collect

Versions of relevant libraries:
[pip] numpy==1.16.3
[pip] torch==1.2.0
[pip] torchtext==0.3.1
[pip] torchvision==0.2.2
[conda] torch                     1.2.0                     <pip>
[conda] torchtext                 0.3.1                     <pip>
[conda] torchvision               0.2.2                     <pip>

The text was updated successfully, but these errors were encountered:

williamFalcon · 2019-08-27T21:37:24Z

And you get a warning saying .uint8 isn't supported so we should switch to .bool... but then NCCL doesn't support it.

soumith · 2019-08-28T10:20:40Z

cc: @mrshenli @pietern we should fix this with upcast + transfer + downcast

rohan-varma · 2019-09-20T01:28:05Z

@soumith, I'm working on this bug. Could you explain what you mean by "upcast + transfer + downcast"? It appears that the error is coming from here: https://github.com/pytorch/pytorch/blob/master/torch/lib/c10d/ProcessGroupNCCL.cpp#L45-L60, and seems to happen because ncclDataType_t doesn't have a bool type

soumith · 2019-09-20T04:51:49Z

@rohan-varma i mean that we should cast the buffer from bool to uint8, then all reduce, then cast it on the other side to bool again

pietern · 2019-09-20T08:24:59Z

Things to keep in mind:

Can only use uint8_t with up to 255 processes in the process group. We rely on every process contributing an integer equal to 1 if the equivalent boolean entry is set. With 256 processes we would overflow an 8-bit unsigned integer and get the wrong result. The change should either 1) assert that the process group size is small enough, or implement a separate code path that uses a 16-bit unsigned integer for larger process groups.
Semantics of the different reduction ops (each of which could use SUM as the underlying reduction):
- ReduceOp.SUM -- boolean OR, so the boolean output is output != 0
- ReduceOp.PRODUCT -- boolean AND, so the boolean output is output == pg->size
- ReduceOp.MIN -- boolean AND (see above)
- ReduceOp.MAX -- boolean OR (see above)

ekmb · 2019-12-03T18:20:07Z

@rohan-varma Is it fixed?

mrcslws · 2020-07-10T15:55:02Z

I would be really happy to see this fixed. We have lots of modules that store masks in buffers. Because of this issue, these modules are forced to use float16 or float32 mask buffers rather than bool buffers.

Closes #24137. Since bool is not supported as a native ncclDataType_t, we add some upcasting + downcasting logic to support it. Differential Revision: [D22496604](https://our.internmc.facebook.com/intern/diff/D22496604/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22496604/)! [ghstack-poisoned]

Closes #24137. Since bool is not supported as a native ncclDataType_t, we add some upcasting + downcasting logic to support it. Differential Revision: [D22496604](https://our.internmc.facebook.com/intern/diff/D22496604/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22496604/)! ghstack-source-id: 107598033 Pull Request resolved: #41318

Closes #24137. Since bool is not supported as a native ncclDataType_t, we add some upcasting + downcasting logic to support it. Differential Revision: [D22496604](https://our.internmc.facebook.com/intern/diff/D22496604/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22496604/)! [ghstack-poisoned]

Closes #24137. This PR adds support for the `torch.bool` tensor type to ProcessGroupNCCL. For most types we use the existing mapping, but since `bool` is not supported as a native `ncclDataType_t`, we add the following logic: 1) Detect if input tensors are of bool type. If so, cast inputs & outputs to int tensors. 2) Run the specified reduction. 3) If we had to cast, cast the outputs back to boolean tensors. If this collective does not operator in-place, then re-cast inputs back to bool so that they are not modified as a result of the op. The reduction logic (for example for reduce/allreduce) is as follows: sum, max = bitwise or product, min = bitwise and Note that this PR doesn't add support for BAND/BOR/BXOR. That is because these reduction ops currently are not supported by NCCL backend, see #41362 Tests are added to ensure that the reductions work as expected. Differential Revision: [D22496604](https://our.internmc.facebook.com/intern/diff/D22496604/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22496604/)! [ghstack-poisoned]

Pull Request resolved: #41318 Closes #24137. This PR adds support for the `torch.bool` tensor type to ProcessGroupNCCL. For most types we use the existing mapping, but since `bool` is not supported as a native `ncclDataType_t`, we add the following logic: 1) Detect if input tensors are of bool type. If so, cast inputs & outputs to int tensors. 2) Run the specified reduction. 3) If we had to cast, cast the outputs back to boolean tensors. If this collective does not operator in-place, then re-cast inputs back to bool so that they are not modified as a result of the op. The reduction logic (for example for reduce/allreduce) is as follows: sum, max = bitwise or product, min = bitwise and Tests are added to ensure that the reductions work as expected. ghstack-source-id: 107675254 Differential Revision: [D22496604](https://our.internmc.facebook.com/intern/diff/D22496604/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22496604/)!

Closes #24137. This PR adds support for the `torch.bool` tensor type to ProcessGroupNCCL. For most types we use the existing mapping, but since `bool` is not supported as a native `ncclDataType_t`, we add the following logic: 1) Detect if input tensors are of bool type. If so, cast inputs & outputs to int tensors. 2) Run the specified reduction. 3) If we had to cast, cast the outputs back to boolean tensors. If this collective does not operator in-place, then re-cast inputs back to bool so that they are not modified as a result of the op. The reduction logic (for example for reduce/allreduce) is as follows: sum, max = bitwise or product, min = bitwise and Note that this PR doesn't add support for BAND/BOR/BXOR. That is because these reduction ops currently are not supported by NCCL backend, see #41362 Tests are added to ensure that the reductions work as expected. Differential Revision: [D22496604](https://our.internmc.facebook.com/intern/diff/D22496604/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22496604/)! [ghstack-poisoned]

Pull Request resolved: #41318 Closes #24137. This PR adds support for the `torch.bool` tensor type to ProcessGroupNCCL. For most types we use the existing mapping, but since `bool` is not supported as a native `ncclDataType_t`, we add the following logic: 1) Detect if input tensors are of bool type. If so, cast inputs & outputs to int tensors. 2) Run the specified reduction. 3) If we had to cast, cast the outputs back to boolean tensors. If this collective does not operator in-place, then re-cast inputs back to bool so that they are not modified as a result of the op. The reduction logic (for example for reduce/allreduce) is as follows: sum, max = bitwise or product, min = bitwise and Tests are added to ensure that the reductions work as expected. ghstack-source-id: 107698101 Differential Revision: [D22496604](https://our.internmc.facebook.com/intern/diff/D22496604/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22496604/)!

Closes #24137. This PR adds support for the `torch.bool` tensor type to ProcessGroupNCCL. For most types we use the existing mapping, but since `bool` is not supported as a native `ncclDataType_t`, we add the following logic: 1) Detect if input tensors are of bool type. If so, cast inputs & outputs to int tensors. 2) Run the specified reduction. 3) If we had to cast, cast the outputs back to boolean tensors. If this collective does not operator in-place, then re-cast inputs back to bool so that they are not modified as a result of the op. The reduction logic (for example for reduce/allreduce) is as follows: sum, max = bitwise or product, min = bitwise and Note that this PR doesn't add support for BAND/BOR/BXOR. That is because these reduction ops currently are not supported by NCCL backend, see #41362 Tests are added to ensure that the reductions work as expected. Differential Revision: [D22496604](https://our.internmc.facebook.com/intern/diff/D22496604/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22496604/)! [ghstack-poisoned]

Pull Request resolved: #41318 Closes #24137. This PR adds support for the `torch.bool` tensor type to ProcessGroupNCCL. For most types we use the existing mapping, but since `bool` is not supported as a native `ncclDataType_t`, we add the following logic: 1) Detect if input tensors are of bool type. If so, cast inputs & outputs to int tensors. 2) Run the specified reduction. 3) If we had to cast, cast the outputs back to boolean tensors. If this collective does not operator in-place, then re-cast inputs back to bool so that they are not modified as a result of the op. The reduction logic (for example for reduce/allreduce) is as follows: sum, max = bitwise or product, min = bitwise and Tests are added to ensure that the reductions work as expected. ghstack-source-id: 107942247 Differential Revision: [D22496604](https://our.internmc.facebook.com/intern/diff/D22496604/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22496604/)!

Closes #24137. This PR adds support for the `torch.bool` tensor type to ProcessGroupNCCL. For most types we use the existing mapping, but since `bool` is not supported as a native `ncclDataType_t`, we add the following logic: 1) Detect if input tensors are of bool type. If so, cast inputs & outputs to int tensors. 2) Run the specified reduction. 3) If we had to cast, cast the outputs back to boolean tensors. If this collective does not operator in-place, then re-cast inputs back to bool so that they are not modified as a result of the op. The reduction logic (for example for reduce/allreduce) is as follows: sum, max = bitwise or product, min = bitwise and Note that this PR doesn't add support for BAND/BOR/BXOR. That is because these reduction ops currently are not supported by NCCL backend, see #41362 Tests are added to ensure that the reductions work as expected. Differential Revision: [D22496604](https://our.internmc.facebook.com/intern/diff/D22496604/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22496604/)! [ghstack-poisoned]

Pull Request resolved: #41318 Closes #24137. This PR adds support for the `torch.bool` tensor type to ProcessGroupNCCL. For most types we use the existing mapping, but since `bool` is not supported as a native `ncclDataType_t`, we add the following logic: 1) Detect if input tensors are of bool type. If so, cast inputs & outputs to int tensors. 2) Run the specified reduction. 3) If we had to cast, cast the outputs back to boolean tensors. If this collective does not operator in-place, then re-cast inputs back to bool so that they are not modified as a result of the op. The reduction logic (for example for reduce/allreduce) is as follows: sum, max = bitwise or product, min = bitwise and Tests are added to ensure that the reductions work as expected. ghstack-source-id: 108017010 Differential Revision: [D22496604](https://our.internmc.facebook.com/intern/diff/D22496604/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22496604/)!

Closes #24137. This PR adds support for the `torch.bool` tensor type to ProcessGroupNCCL. For most types we use the existing mapping, but since `bool` is not supported as a native `ncclDataType_t`, we add the following logic: 1) Map `at::kBool` to `ncclUint8` 2) During reduction (allreduce for example), if the operation is SUM, we instead override to to a MAX, to avoid overflow issues. The rest of the operations work with no changes. In the boolean case, changing sum to max makes no correctness difference since they both function as a bitwise OR. The reduction logic (for example for reduce/allreduce) is as follows: sum, max = bitwise or product, min = bitwise and Note that this PR doesn't add support for BAND/BOR/BXOR. That is because these reduction ops currently are not supported by NCCL backend, see #41362 Tests are added to ensure that the reductions work as expected. Differential Revision: [D22496604](https://our.internmc.facebook.com/intern/diff/D22496604/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22496604/)! [ghstack-poisoned]

Pull Request resolved: #41318 Closes #24137. This PR adds support for the `torch.bool` tensor type to ProcessGroupNCCL. For most types we use the existing mapping, but since `bool` is not supported as a native `ncclDataType_t`, we add the following logic: 1) Map `at::kBool` to `ncclUint8` 2) During reduction (allreduce for example), if the operation is SUM, we instead override to to a MAX, to avoid overflow issues. The rest of the operations work with no changes. In the boolean case, changing sum to max makes no correctness difference since they both function as a bitwise OR. The reduction logic (for example for reduce/allreduce) is as follows: sum, max = bitwise or product, min = bitwise and Tests are added to ensure that the reductions work as expected. ghstack-source-id: 108185942 Differential Revision: [D22496604](https://our.internmc.facebook.com/intern/diff/D22496604/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22496604/)!

Pull Request resolved: #41318 Closes #24137. This PR adds support for the `torch.bool` tensor type to ProcessGroupNCCL. For most types we use the existing mapping, but since `bool` is not supported as a native `ncclDataType_t`, we add the following logic: 1) Map `at::kBool` to `ncclUint8` 2) During reduction (allreduce for example), if the operation is SUM, we instead override to to a MAX, to avoid overflow issues. The rest of the operations work with no changes. In the boolean case, changing sum to max makes no correctness difference since they both function as a bitwise OR. The reduction logic (for example for reduce/allreduce) is as follows: sum, max = bitwise or product, min = bitwise and Tests are added to ensure that the reductions work as expected. ghstack-source-id: 108315417 Differential Revision: [D22496604](https://our.internmc.facebook.com/intern/diff/D22496604/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22496604/)!

Closes #24137. This PR adds support for the `torch.bool` tensor type to ProcessGroupNCCL. For most types we use the existing mapping, but since `bool` is not supported as a native `ncclDataType_t`, we add the following logic: 1) Map `at::kBool` to `ncclUint8` 2) During reduction (allreduce for example), if the operation is SUM, we instead override to to a MAX, to avoid overflow issues. The rest of the operations work with no changes. In the boolean case, changing sum to max makes no correctness difference since they both function as a bitwise OR. The reduction logic (for example for reduce/allreduce) is as follows: sum, max = bitwise or product, min = bitwise and Note that this PR doesn't add support for BAND/BOR/BXOR. That is because these reduction ops currently are not supported by NCCL backend, see #41362 Tests are added to ensure that the reductions work as expected. Differential Revision: [D22496604](https://our.internmc.facebook.com/intern/diff/D22496604/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22496604/)! [ghstack-poisoned]

sajadn · 2020-08-28T01:56:35Z

On pytorch 1.5.1 I still have problem with this. Is it going to be fixed?

rohan-varma · 2020-08-30T21:43:23Z

Hi @sajadn, this was landed ~1 month ago so it should be part of the next release, PT 1.7. Until then, you can try out the nightly build (see instructions at https://pytorch.org/) where this is fixed.

pytorchbot added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Aug 10, 2019

apsdehal mentioned this issue Aug 10, 2019

[enhancement] Upgrade to PyTorch 1.2.0 facebookresearch/mmf#76

Closed

mrshenli added enhancement Not as big of a feature, but technically not a bug. Should be easy to fix module: boolean tensor triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Aug 12, 2019

rohan-varma self-assigned this Sep 19, 2019

rohan-varma mentioned this issue Jul 12, 2020

NCCL Backend support for torch.bool #41318

Closed

facebook-github-bot closed this as completed in 3626473 Jul 23, 2020

rohan-varma mentioned this issue Jul 23, 2020

[Resubmit #41318] NCCL backend support for torch bool #41959

Closed

rohan-varma reopened this Jul 23, 2020

facebook-github-bot closed this as completed in 366c014 Jul 25, 2020

thomasw21 mentioned this issue Aug 8, 2021

use HuggingFace Datasets as source to build Megatron data files bigscience-workshop/Megatron-DeepSpeed#48

Merged

7 tasks

thomasw21 mentioned this issue Sep 21, 2021

Fix deepspeed prefix-lm bigscience-workshop/Megatron-DeepSpeed#107

Merged

thomasw21 mentioned this issue Oct 6, 2021

Big science fix passing multiple tensors deepspeedai/DeepSpeed#1400

Merged

thomasw21 mentioned this issue Oct 25, 2021

Bump minimum version for torch bigscience-workshop/Megatron-DeepSpeed#156

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[distributed] NCCL Backend doesn't support torch.bool data type #24137

[distributed] NCCL Backend doesn't support torch.bool data type #24137

apsdehal commented Aug 10, 2019 •

edited

Loading

williamFalcon commented Aug 27, 2019

soumith commented Aug 28, 2019

rohan-varma commented Sep 20, 2019

soumith commented Sep 20, 2019

pietern commented Sep 20, 2019

ekmb commented Dec 3, 2019

mrcslws commented Jul 10, 2020

sajadn commented Aug 28, 2020

rohan-varma commented Aug 30, 2020

[distributed] NCCL Backend doesn't support torch.bool data type #24137

[distributed] NCCL Backend doesn't support torch.bool data type #24137

Comments

apsdehal commented Aug 10, 2019 • edited Loading

🐛 Bug

To Reproduce

Expected behavior

Environment

williamFalcon commented Aug 27, 2019

soumith commented Aug 28, 2019

rohan-varma commented Sep 20, 2019

soumith commented Sep 20, 2019

pietern commented Sep 20, 2019

ekmb commented Dec 3, 2019

mrcslws commented Jul 10, 2020

sajadn commented Aug 28, 2020

rohan-varma commented Aug 30, 2020

apsdehal commented Aug 10, 2019 •

edited

Loading