[BUG]: GH200 DeviceReduce performance: 14x (<1 GiB) and 2x (>1 GiB) lower than SOL #437
Open
2 of 5 tasks
Labels
bug
Something isn't working right.
Is this a duplicate?
Type of Bug
Performance
Component
Thrust
Describe the bug
Improve the performance of
thrust::reduce
andtransform_reduce
by 14x for < 1 GB input sizes, and by 2x for >1 GiB sizes.This requires fixing the following bugs:
DeviceReduce
.How to Reproduce
Run
DeviceReduce
and compare it against SOL throughput.Expected behavior
DeviceReduce
should not be more than an order-of-magnitude slower than SOL.Reproduction link
Internal link available.
Operating System
Linux.
nvidia-smi output
GH200
NVCC version
Any.
The text was updated successfully, but these errors were encountered: