disable the ut test_dist_mnist_hallreduce temporarily #28129
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR types
Others
PR changes
Others
Describe
临时禁用单测test_dist_mnist_hallreduce
原因是CI系统机器只有2块GPU卡,而该单测创建的nccl rank数为4(4个进程),因此会出现单张GPU卡上存在多个rank的情况。但高版本nccl不支持这一情况: Using the same CUDA device multiple times as different ranks of the same NCCL communicator is not supported and may lead to hangs.
升级ci docker镜像的pr:#27589