-
Notifications
You must be signed in to change notification settings - Fork 932
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] pack
/unpack
functions to merge/split (multiple) device_buffer
(s)
#9726
Comments
As to |
This functionality is kind of outside the scope of RMM. The direction we'd like to go with RMM is really to just be as simple as possible by providing a set of resources, the tools to get/set the default resource, and containers like This kind of pack or concatenate functionality is really more the wheelhouse of a consumer of RMM like cuDF. In cuDF, this is just a |
That imposes a lot of challenges in order to construct your data in a way to allow using concatenate and then being able to unpack it cleanly. Maybe it's a new API that could live in cuDF, but really we're looking for some API that takes This is along similar lines as #3793 but is more generalized than |
This issue has been marked rotten due to no recent activity in the past 90d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. |
This issue has been marked stale due to no recent activity in the past 30d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be marked rotten if there is no activity in the next 60d. |
Seems like this issue should be moved to libcudf. |
I think we had this in RMM as opposed to libcudf because we wanted a place more general purpose than libcudf. I.E. Dask/Distributed would possibly be interested in using this for packing/unpacking buffers in communication, but cuDF is way too bulky of a dependency for them. |
Sure, but RMM isn't a catch-all for stuff we don't want to put into libcudf. It muddies the single responsibility principle to start putting random kernels into an allocator and container library (which currently has no kernels). |
Yup agreed. This felt like it was in the gray area of somewhat related to memory management so the issue was raised here, but happy to defer it to somewhere else, but cuDF is too large of a dependency unfortunately. |
I think |
This issue has been labeled |
This issue has been labeled |
NVIDIA/cub#359 would be the right way to do this now. |
FYI PR ( NVIDIA/cub#359 ) landed. Looks like this will be part of CUB 2.1.0 |
Is your feature request related to a problem? Please describe.
It would be useful to have a
pack
function to merge multipledevice_buffer
s into a singledevice_buffer
. This is helpful in situations where having one largedevice_buffer
to read from is more performant. However it ultimately consists of many smaller data segments that would need to be merged together. Example use cases include sending data with UCX and spilling data from device to host.Similarly it would be useful to have an
unpack
function to split adevice_buffer
into multipledevice_buffer
s. This is helpful in situations where having one largedevice_buffer
to write into is more performant. However it ultimately consists of many smaller data segments that may need to be freed at different times. Example use cases include receiving data with UCX and unspilling data from host to device.Describe the solution you'd like
For
pack
it would be nice if it simply takes severaldevice_buffer
s invector
form and return a single one. Additionally it would be nice ifpack
could recognize whendevice_buffer
s are contiguous in memory and avoid a copy. Though admittedly this last part is tricky (maybe less so ifunpack
is used regularly?). If we allowpack
to change the order (to benefit from contiguous memory for example), we may want additional information about where the data segments live in the largerdevice_buffer
.For
unpack
it would be nice if it takes a singledevice_buffer
andsize_t
s invector
form to split and return avector
of multipledevice_buffer
s. Additionally it would be nice ifunpack
did not perform any copies. Hopefully that is straightforward, but there may be things I'm not understanding.Describe alternatives you've considered
One might consider using variadics in C++ for the arguments. While nice at the C++ level, this seems tricky to use from the Cython and Python levels. Hence the suggestion to just use
vector
.pack
itself could be implemented by a user simply allocating a larger buffer and copying over. Would be nice to avoid the extra allocation when possible though (which may require knowledge that RMM has about the allocations).Additional context
Having
unpack
in particular would be helpful for aggregated receives. A natural extension of this would be to havepack
for aggregated sends. All-in-all this should allow transmitting a larger amount of data at once with UCX and thus benefiting from this use case it is more honed for. PR ( dask/distributed#3453 ) provides a WIP implementation of aggregated receives for context.Also having
pack
would be useful when spilling severaldevice_buffer
s from device to host as it would allow us to pack them into onedevice_buffer
before transferring ( rapidsai/dask-cuda#250 ). Havingunpack
would help us break up the allocation whenever the object is unspilled.This need has also come up in downstream contexts ( #3793 ). Maybe they would benefit from an upstream solution as well?
The text was updated successfully, but these errors were encountered: