Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Python interface for arena_memory_resource #830

Closed
trivialfis opened this issue Jul 21, 2021 · 5 comments · Fixed by #1711
Closed

[FEA] Python interface for arena_memory_resource #830

trivialfis opened this issue Jul 21, 2021 · 5 comments · Fixed by #1711
Labels
? - Needs Triage Need team to review and classify feature request New feature or request inactive-30d inactive-90d

Comments

@trivialfis
Copy link
Member

trivialfis commented Jul 21, 2021

Is your feature request related to a problem? Please describe.

Hi, I'm trying to use rmm in a multi-threaded application where each thread pre-fetches some data using a cuda stream from a pool. The data is fetched in a child thread and used in the main thread with the same cuda stream, and it's released after usage in the main thread. During profiling, I found that the memory allocation and deallocation are not performed in parallel with pool memory resources. Later I found the arena_memory_resource in c++ and want to try it out.

Describe the solution you'd like
So the feature request is a 2 part question. Firstly, does arena_memory_resource help with such use cases? If so is there any plan on exposing it to the Python interface?

Describe alternatives you've considered
I tried pool memory allocator and cuda async memory resource, the performance is similar. From nsight system, the pool memory resource seems to be managing memory with locks and hence preventing parallel malloc and free. Also, the cudaEvent used in rmm also seems to be creating locks, but I'm not sure what's its effect on performance.

Additional context
Feel free to ping me if you need the profile result from nsight system.

@trivialfis trivialfis added ? - Needs Triage Need team to review and classify feature request New feature or request labels Jul 21, 2021
@rongou
Copy link
Contributor

rongou commented Jul 21, 2021

We use arena_memory_resource for Spark in java/scala so didn't have a need for the python wrapper. Probably don't have time to work on this in the near future. @trivialfis feel free to contribute. :)

@trivialfis
Copy link
Member Author

Might not be needing the feature right now .. Suggested by @jrhemstad the issue in my code is caused by pageable host memory. So I switched to pinned memory but its allocation cost is the bottleneck now.

@harrism
Copy link
Member

harrism commented Jul 22, 2021

Two other points:

  1. I believe arena_memory_resource uses separate read and write locks which may enable more concurrency between host threads. We can try something similar in pool_memory_resource.

  2. Just discovered today that multi-stream cycling through buffers can result in oversynchronization in stream_ordered_memory_resource. I think this can be improved by using an LRU cache or something similar to choose which stream to "steal" blocks from. This may also benefit multi-threaded use cases where each thread has its own stream (per-thread default stream).

@github-actions
Copy link

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

@github-actions
Copy link

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

@rapids-bot rapids-bot bot closed this as completed in #1711 Nov 1, 2024
rapids-bot bot pushed a commit that referenced this issue Nov 1, 2024
Close #830 .

- Add the arena allocator to the public Python interface.
- Small changes to the logger initialization to avoid exposing spdlog in the shared objects.

Authors:
  - Jiaming Yuan (https://github.com/trivialfis)

Approvers:
  - Lawrence Mitchell (https://github.com/wence-)
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: #1711
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
? - Needs Triage Need team to review and classify feature request New feature or request inactive-30d inactive-90d
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants