-
Notifications
You must be signed in to change notification settings - Fork 204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Python interface for arena_memory_resource
#830
Comments
We use |
Might not be needing the feature right now .. Suggested by @jrhemstad the issue in my code is caused by pageable host memory. So I switched to pinned memory but its allocation cost is the bottleneck now. |
Two other points:
|
This issue has been labeled |
This issue has been labeled |
Close #830 . - Add the arena allocator to the public Python interface. - Small changes to the logger initialization to avoid exposing spdlog in the shared objects. Authors: - Jiaming Yuan (https://github.com/trivialfis) Approvers: - Lawrence Mitchell (https://github.com/wence-) - Vyas Ramasubramani (https://github.com/vyasr) URL: #1711
Is your feature request related to a problem? Please describe.
Hi, I'm trying to use rmm in a multi-threaded application where each thread pre-fetches some data using a cuda stream from a pool. The data is fetched in a child thread and used in the main thread with the same cuda stream, and it's released after usage in the main thread. During profiling, I found that the memory allocation and deallocation are not performed in parallel with pool memory resources. Later I found the
arena_memory_resource
in c++ and want to try it out.Describe the solution you'd like
So the feature request is a 2 part question. Firstly, does
arena_memory_resource
help with such use cases? If so is there any plan on exposing it to the Python interface?Describe alternatives you've considered
I tried pool memory allocator and cuda async memory resource, the performance is similar. From nsight system, the pool memory resource seems to be managing memory with locks and hence preventing parallel malloc and free. Also, the cudaEvent used in rmm also seems to be creating locks, but I'm not sure what's its effect on performance.
Additional context
Feel free to ping me if you need the profile result from nsight system.
The text was updated successfully, but these errors were encountered: