Remove memory access flags from cuda_async_memory_resource #1754

abellina · 2024-12-09T21:10:17Z

Description

Closes #1753
It is a follow up from #1743

I would like for rapidsai/cudf#17553 to merge first, that way I don't break the build.

I've learned that I was using cudaMemPoolSetAccess incorrectly. This API should only be used from a peer device, not from the device that created the pool. This is the reason why calling cudaMemPoolSetAccess with none throws an error as documented here #1753. I have tested that I can still export the fabric handles and import them using UCX in a peer device with the default access that pool owner device gets (read+write is the default). Note that this read+write default access cannot be revoked from the owner, as it wouldn't make sense to have memory that nobody has access to, but peers can call cudaMemPoolSetAccess to gain read+write access or to stop accessing (none) a peer's pool memory.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

Signed-off-by: Alessandro Bellina <[email protected]>

copy-pr-bot · 2024-12-09T21:10:22Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

abellina · 2024-12-09T21:12:05Z

I've set this to breaking, but has lived for < 1 day and only used from cuDF JNI. This PR: rapidsai/cudf#17553 should remove the only user that I am aware of.

abellina · 2024-12-09T21:12:27Z

/ok to test

bdice

This seems right. Thanks for the extensive descriptions in #1753, it makes it much easier to approve. 👍

abellina · 2024-12-09T23:04:14Z

/merge

abellina · 2024-12-09T23:04:50Z

Merging this, thanks @bdice for the review. If anyone else has questions or concerns, please comment here or the issues. I am sorry I am not waiting for more +1s, I just want to remove this from the api asap to remove confusion.

Remove memory access flags from cuda_async_memory_resource.

78b4ecd

Signed-off-by: Alessandro Bellina <[email protected]>

abellina requested a review from a team as a code owner December 9, 2024 21:10

abellina requested review from harrism and miscco December 9, 2024 21:10

github-actions bot added the cpp Pertains to C++ code label Dec 9, 2024

abellina added bug Something isn't working breaking Breaking change labels Dec 9, 2024

bdice approved these changes Dec 9, 2024

View reviewed changes

rapids-bot bot merged commit 8d41610 into rapidsai:branch-25.02 Dec 9, 2024
62 of 63 checks passed

abellina deleted the simplify_allow_fabric_handles branch December 9, 2024 23:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove memory access flags from cuda_async_memory_resource #1754

Remove memory access flags from cuda_async_memory_resource #1754

abellina commented Dec 9, 2024 •

edited

Loading

copy-pr-bot bot commented Dec 9, 2024

abellina commented Dec 9, 2024 •

edited

Loading

abellina commented Dec 9, 2024

bdice left a comment

abellina commented Dec 9, 2024

abellina commented Dec 9, 2024

Remove memory access flags from cuda_async_memory_resource #1754

Remove memory access flags from cuda_async_memory_resource #1754

Conversation

abellina commented Dec 9, 2024 • edited Loading

Description

Checklist

copy-pr-bot bot commented Dec 9, 2024

abellina commented Dec 9, 2024 • edited Loading

abellina commented Dec 9, 2024

bdice left a comment

Choose a reason for hiding this comment

abellina commented Dec 9, 2024

abellina commented Dec 9, 2024

abellina commented Dec 9, 2024 •

edited

Loading

abellina commented Dec 9, 2024 •

edited

Loading