You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
The underlying allocator may have sufficient alignment, but aligned_allocate always overallocates to guarantee the alignment, even if it may not be necessary:
. This wastes (a bit of) memory, and may cause performance issues with some MPI libraries.
Describe the solution you'd like
Avoid overallocation if the underlying allocator provides sufficiently aligned allocations.
Describe alternatives you've considered
Allow controlling alignment of backing buffers separately from alignment of user-facing allocations. The latter should probably never be larger than the former.
Additional context
This is really a feature request that comes from investigating what may be a bug in Cray MPICH, but I wanted to report it here as well since I think Umpire could in some situations do a better job (or I'm simply unaware of the knobs that Umpire has for controlling this, so looking for input in any case).
In our application we use Umpire's QuickPool to pool allocations of GPU buffers. QuickPool will use aligned_allocate to allocate backing buffers from e.g. CUDA, but if I ask for a 1 GiB buffer QuickPool will allocate 1 GiB plus alignment (16 by default) to guarantee that the allocation is aligned. It turns out that when using GPU-aware MPI communicating a buffer whose size isn't page-aligned (I think this is the requirement, but I'm still looking into the details) performance drops considerably. I'm separately reporting this issue to HPE.
I could set the alignment of the QuickPool to the page size to get an appropriately sized backing buffer, but if I understand correctly then all allocations on top of that will also have page-sized alignment, which is excessive for small allocations and can end up wasting a lot of memory. From what I can tell DynamicPoolList behaves the same as QuickPool (is there a reason to prefer one or the other by the way?).
Is there a way already to control the alignment of the backing buffers and "real" allocations on top of it separately? Is there another pool that we could use to get the behaviour we want?
Just out of curiousity since I couldn't find it, where is the code for ensuring that a QuickPool allocation starts at the correct alignment? I see the size is adjusted here:
. Edit: I realize this probably happens by construction. If the backing buffers have sufficient alignment and all the allocations have aligned sizes they'll be guaranteed to start aligned as well.
Thanks for your help!
The text was updated successfully, but these errors were encountered:
Ping. Just checking if this is something interesting to you? I may be inclined to attempt implementing one of the options above if it sounds good to you. We're currently still stuck with the workaround where we have to overallocate all allocations by 2 MiB (the large page size on Grace CPUs).
@msimberg we would definitely be interested in a fix to avoid over-allocating the underlying buffers, and if it would be useful, adjusting the pool to take two alignment parameters (one for the allocations, and one for the buffers). Thanks!
Is your feature request related to a problem? Please describe.
The underlying allocator may have sufficient alignment, but
aligned_allocate
always overallocates to guarantee the alignment, even if it may not be necessary:Umpire/src/umpire/strategy/mixins/AlignedAllocation.inl
Line 25 in 45159e8
Describe the solution you'd like
Avoid overallocation if the underlying allocator provides sufficiently aligned allocations.
Describe alternatives you've considered
Allow controlling alignment of backing buffers separately from alignment of user-facing allocations. The latter should probably never be larger than the former.
Additional context
This is really a feature request that comes from investigating what may be a bug in Cray MPICH, but I wanted to report it here as well since I think Umpire could in some situations do a better job (or I'm simply unaware of the knobs that Umpire has for controlling this, so looking for input in any case).
In our application we use Umpire's
QuickPool
to pool allocations of GPU buffers.QuickPool
will usealigned_allocate
to allocate backing buffers from e.g. CUDA, but if I ask for a 1 GiB bufferQuickPool
will allocate 1 GiB plus alignment (16 by default) to guarantee that the allocation is aligned. It turns out that when using GPU-aware MPI communicating a buffer whose size isn't page-aligned (I think this is the requirement, but I'm still looking into the details) performance drops considerably. I'm separately reporting this issue to HPE.I could set the alignment of the
QuickPool
to the page size to get an appropriately sized backing buffer, but if I understand correctly then all allocations on top of that will also have page-sized alignment, which is excessive for small allocations and can end up wasting a lot of memory. From what I can tellDynamicPoolList
behaves the same asQuickPool
(is there a reason to prefer one or the other by the way?).Is there a way already to control the alignment of the backing buffers and "real" allocations on top of it separately? Is there another pool that we could use to get the behaviour we want?
Just out of curiousity since I couldn't find it, where is the code for ensuring that a
QuickPool
allocation starts at the correct alignment? I see the size is adjusted here:Umpire/src/umpire/strategy/QuickPool.cpp
Line 46 in 45159e8
Thanks for your help!
The text was updated successfully, but these errors were encountered: