Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove memory access flags from cuda_async_memory_resource #1754

Merged

Conversation

abellina
Copy link
Contributor

@abellina abellina commented Dec 9, 2024

Description

Closes #1753
It is a follow up from #1743

I would like for rapidsai/cudf#17553 to merge first, that way I don't break the build.

I've learned that I was using cudaMemPoolSetAccess incorrectly. This API should only be used from a peer device, not from the device that created the pool. This is the reason why calling cudaMemPoolSetAccess with none throws an error as documented here #1753. I have tested that I can still export the fabric handles and import them using UCX in a peer device with the default access that pool owner device gets (read+write is the default). Note that this read+write default access cannot be revoked from the owner, as it wouldn't make sense to have memory that nobody has access to, but peers can call cudaMemPoolSetAccess to gain read+write access or to stop accessing (none) a peer's pool memory.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@abellina abellina requested a review from a team as a code owner December 9, 2024 21:10
@abellina abellina requested review from harrism and miscco December 9, 2024 21:10
Copy link

copy-pr-bot bot commented Dec 9, 2024

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions bot added the cpp Pertains to C++ code label Dec 9, 2024
@abellina abellina added bug Something isn't working breaking Breaking change labels Dec 9, 2024
@abellina
Copy link
Contributor Author

abellina commented Dec 9, 2024

I've set this to breaking, but has lived for < 1 day and only used from cuDF JNI. This PR: rapidsai/cudf#17553 should remove the only user that I am aware of.

@abellina
Copy link
Contributor Author

abellina commented Dec 9, 2024

/ok to test

Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems right. Thanks for the extensive descriptions in #1753, it makes it much easier to approve. 👍

@abellina
Copy link
Contributor Author

abellina commented Dec 9, 2024

/merge

@rapids-bot rapids-bot bot merged commit 8d41610 into rapidsai:branch-25.02 Dec 9, 2024
62 of 63 checks passed
@abellina
Copy link
Contributor Author

abellina commented Dec 9, 2024

Merging this, thanks @bdice for the review. If anyone else has questions or concerns, please comment here or the issues. I am sorry I am not waiting for more +1s, I just want to remove this from the api asap to remove confusion.

@abellina abellina deleted the simplify_allow_fabric_handles branch December 9, 2024 23:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Breaking change bug Something isn't working cpp Pertains to C++ code
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

[BUG] cuda async pool cannot be initialized with access_flags::none
2 participants