Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Consider creating a pinned host memory resource that can leverage all of the allocation strategies #618

Closed
felipeblazing opened this issue Oct 29, 2020 · 20 comments · Fixed by #1392
Assignees

Comments

@felipeblazing
Copy link

Is your feature request related to a problem? Please describe.
I wish I could allocate a pool of pinned host memory for allocations that will often times be short lived. An example of when we would use this is when sending information between two nodes using the tcp protocol. We have to copy the memory to host in order to send it over the wire. Having a pinned memory buffer would make this more efficient.

Describe the solution you'd like
Allow the memory providers to either specify specific low level allocators or some way of being able to do this for just the specific case of pinned memory.

Describe alternatives you've considered
We currently have a fixed size pinned memory pool that we use for this situation.

@felipeblazing felipeblazing added ? - Needs Triage Need team to review and classify feature request New feature or request labels Oct 29, 2020
@harrism
Copy link
Member

harrism commented Oct 29, 2020

Since pinned host memory is accessible as device memory, I think we should create a pinned_memory_resource that is a device_memory_resource. Then it can be used as upstream of any other resource. The only sticking point I can think of is naming, since we already have a pinned_memory_resource that is a host_memory_resource.

@github-actions
Copy link

This issue has been marked rotten due to no recent activity in the past 90d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

@github-actions
Copy link

This issue has been marked stale due to no recent activity in the past 30d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be marked rotten if there is no activity in the next 60d.

@harrism
Copy link
Member

harrism commented Feb 16, 2021

This is still relevant, but now depends on NVIDIA/libcudacxx#105

@github-actions
Copy link

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

@leofang
Copy link
Member

leofang commented Mar 19, 2021

Maybe relevant to cupy/cupy#4892?

@github-actions
Copy link

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

@github-actions
Copy link

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

@rongou rongou self-assigned this Jan 19, 2022
@rongou
Copy link
Contributor

rongou commented Jan 19, 2022

I'll take a crack at this since we also need it for Spark.

@harrism
Copy link
Member

harrism commented Jan 19, 2022

If you proceed ahead of cuda::memory_resource this will just create more for us to refactor.

@rongou
Copy link
Contributor

rongou commented Jan 19, 2022

Is there an ETA on cuda::memory_resource?

@harrism
Copy link
Member

harrism commented Jan 19, 2022

I thought we were close before Christmas. But we keep running into more design issues.

@jakirkham
Copy link
Member

Admittedly there may not be a clear answer to this, but are there ways for others interested in this to help pull cuda::memory_resource across the finish line?

@github-actions
Copy link

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

@rongou
Copy link
Contributor

rongou commented Mar 10, 2022

Chatted with @jrhemstad offline, one option is to move the current pinned_memory_resource from host to device, then we can wrap it with the existing pools.

@abellina

@rongou rongou removed their assignment Mar 10, 2022
@jrhemstad
Copy link
Contributor

Chatted with @jrhemstad offline, one option is to move the current pinned_memory_resource from host to device, then we can wrap it with the existing pools.

No need to move it, just make a new one that inherits from device_memory_resource instead of host_memory_resource. I think we could even make the existing inherit from both since the functions have different signatures in host vs device.

@github-actions
Copy link

github-actions bot commented Apr 9, 2022

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

@github-actions
Copy link

github-actions bot commented Jul 8, 2022

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

@abellina
Copy link
Contributor

abellina commented Oct 23, 2023

Chatted with @jrhemstad offline, one option is to move the current pinned_memory_resource from host to device, then we can wrap it with the existing pools.

No need to move it, just make a new one that inherits from device_memory_resource instead of host_memory_resource. I think we could even make the existing inherit from both since the functions have different signatures in host vs device.

I have been playing with this for a test rapidsai/cudf#14314 in cuDF. I think what @jrhemstad said here around the APIs being different between host and device allocations is important. Actually, why would we want allocate(size, stream) and deallocate(size, stream) and why would we want to maintain the stream ordered behavior of pool_memory_resource for the host?

Just for reference, what I did was I hacked out the streams and had a single free_list for the whole pool, then just had an allocate(size) and it did the same pool logic: allocate from blocks in the list, split, merge, etc just with the single pool.

I do think we want to invest time on this given the results I am seeing in rapidsai/cudf#14314, so I'd be curious to know if anyone has cycles to take a look at this.

@harrism
Copy link
Member

harrism commented Oct 24, 2023

Actually, why would we want allocate(size, stream) and deallocate(size, stream) and why would we want to maintain the stream ordered behavior of pool_memory_resource for the host?

Because pinned host memory is accessed by device copies and kernels, which are stream ordered. Any non-stream-ordered allocation would have to synchronize, losing a lot of the benefits of stream-ordered allocation. I think it would be a mistake not to provide a stream-ordered pinned host memory pool.

so I'd be curious to know if anyone has cycles to take a look at this.

I am planning to work on this because I am also finding it is needed to improve spilling performance for Python. Current plan is to do it once we adopt cuda::memory_resource, because it turns the kind of memory allocated by the MR into a (template) parameter, rather than having separate host_mr and device_mr types. This enables us to use any machinery built for allocating one kind of memory (e.g. device memory) for any other memory kind. cuda::memory_resource also separates synchronous and stream-ordered allocation APIs.

This said, we should also look into timing and cost (including opportunity cost) and

  1. whether a temporary stop-gap like described above is worth committing to RMM
  2. whether we should just wait for cuda::memory_resource to be merged and then build it "the right way"
  3. whether we should provide a stop-gap in a non-release branch of RMM and then do (2).

@harrism harrism removed the ? - Needs Triage Need team to review and classify label Nov 28, 2023
@harrism harrism self-assigned this Nov 28, 2023
rapids-bot bot pushed a commit that referenced this issue Jan 18, 2024
…ool_memory_resource`. (#1392)

Depends on #1417

Adds a new `host_pinned_memory_resource` that implements the new `cuda::mr::memory_resource` and `cuda::mr::async_memory_resource` concepts which makes it usable as an upstream MR for `rmm::mr::device_memory_resource`. 

Also tests a pool made with this new MR as the upstream.

Note that the tests explicitly set the initial and maximum pool sizes as using the defaults does not currently work. See #1388 .

Closes #618

Authors:
  - Mark Harris (https://github.com/harrism)
  - Lawrence Mitchell (https://github.com/wence-)

Approvers:
  - Michael Schellenberger Costa (https://github.com/miscco)
  - Alessandro Bellina (https://github.com/abellina)
  - Lawrence Mitchell (https://github.com/wence-)
  - Jake Hemstad (https://github.com/jrhemstad)
  - Bradley Dice (https://github.com/bdice)

URL: #1392
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

7 participants