Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reapply "defaults: make dmabuf opt-in" #739

Merged
merged 1 commit into from
Dec 5, 2024

Conversation

aws-nslick
Copy link
Contributor

This reverts commit 224593f.

Our shared development cluster seems to have issues with dmabuf when
running NCCL tests, for a handful of niche situations, ie: two nodes,
with MPI_Comm_split equal to the number of GPUs, at 16GB+. Other
environments seem not to have issues with the same workload, but out of
an abundance of caution and due to a lack of root cause, this is being
reverted again.

Signed-off-by: Nicholas Sielicki [email protected]

This reverts commit 224593f.

Our shared development cluster seems to have issues with dmabuf when
running NCCL tests, for a handful of niche situations, ie: two nodes,
with MPI_Comm_split equal to the number of GPUs, at 16GB+. Other
environments seem not to have issues with the same workload, but out of
an abundance of caution and due to a lack of root cause, this is being
reverted again.

Signed-off-by: Nicholas Sielicki <[email protected]>
@aws-nslick aws-nslick requested a review from a team as a code owner December 5, 2024 21:36
@aws-nslick aws-nslick merged commit 1a46a67 into aws:master Dec 5, 2024
22 of 23 checks passed
@aws-nslick aws-nslick deleted the revert-dmabuf-again branch December 5, 2024 22:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants