Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

disable the ut test_dist_mnist_hallreduce temporarily #28129

Merged

Conversation

sandyhouse
Copy link

@sandyhouse sandyhouse commented Oct 20, 2020

PR types

Others

PR changes

Others

Describe

临时禁用单测test_dist_mnist_hallreduce
原因是CI系统机器只有2块GPU卡,而该单测创建的nccl rank数为4(4个进程),因此会出现单张GPU卡上存在多个rank的情况。但高版本nccl不支持这一情况: Using the same CUDA device multiple times as different ranks of the same NCCL communicator is not supported and may lead to hangs.

升级ci docker镜像的pr:#27589

@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

Copy link
Contributor

@gongweibao gongweibao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sandyhouse sandyhouse merged commit cd37244 into PaddlePaddle:develop Oct 20, 2020
@sandyhouse sandyhouse deleted the disable_test_dist_mnist_hallreduce branch October 20, 2020 14:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants