Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

horovod bug: conflict of group argument in all_gather function #7853

Closed
marsggbo opened this issue Jun 7, 2021 · 3 comments
Closed

horovod bug: conflict of group argument in all_gather function #7853

marsggbo opened this issue Jun 7, 2021 · 3 comments
Labels
bug Something isn't working help wanted Open to be worked on
Milestone

Comments

@marsggbo
Copy link
Contributor

marsggbo commented Jun 7, 2021

🐛 Bug

...
from pytorch_lightning.utilities.distributed import group, rank_zero_only, ReduceOp
...

class HorovodPlugin(ParallelPlugin):
    ....

    def all_gather(
        self,
        result: Union[torch.Tensor],
        group: Optional[Any] = group.WORLD,
        sync_grads: bool = False
    ) -> torch.Tensor:
        if group is not None and group != group.WORLD:
            raise ValueError(
                "Horovod does not support allgather using a subcommunicator at this time. "
                "Unset `group`."
            )

        if len(result.shape) == 0:
            # Convert scalars to single dimension tensors
            result = result.reshape(1)

        # sync and gather all
        self.join()
        gathered = hvd.allgather(result)
        gathered_result = list(gathered.split(1, dim=0))
        return gathered_result

we can see that the input argument group is conflicted with the imported module group

@marsggbo marsggbo added bug Something isn't working help wanted Open to be worked on labels Jun 7, 2021
@marsggbo marsggbo changed the title horovod bug: overload horovod bug: conflict of group argument in all_gather function Jun 7, 2021
@tchaton
Copy link
Contributor

tchaton commented Jun 7, 2021

Dear @marsggbo,

Thanks for reporting this issue. Would you mind to make a PR ?

Best,
T.C

@marsggbo
Copy link
Contributor Author

marsggbo commented Jun 7, 2021

PR #7840 solves this problem. I just simply replace module group with GROUP

@awaelchli
Copy link
Contributor

@marsggbo thanks for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Open to be worked on
Projects
None yet
Development

No branches or pull requests

5 participants