[FEA] Make better use of pinned memory with Spark shuffle #4516
Labels
cudf_dependency
An issue or PR with this label depends on a new feature in cudf
improve
performance
A performance related task/issue
Is your feature request related to a problem? Please describe.
With the RAPIDS shuffle manager (UCX) turned off, we don't use much pinned memory, and profiling shows there are a lot of
cudaMemcpy
calls done with pageable memory.Describe the solution you'd like
Make better use of pinned memory so that most, if not all
cudaMemcpy
calls take advantage of pinned memory.Describe alternatives you've considered
With UCX enabled, increasing the size of the pinned memory pool seems to greatly boost performance. We may want to consider allowing the RAPIDS shuffle manager without UCX, or creating an external shuffle manager that caches GPU memory buffers.
Additional context
PR for improving pinned memory usage with UCX enabled: #4497
The text was updated successfully, but these errors were encountered: