Skip to content

Commit

Permalink
[ROCM] Navi21 Enablement 7: Sparse kernels
Browse files Browse the repository at this point in the history
This PR is a follow up to the following prs.
pytorch#69942
pytorch#72682
pytorch#72809
pytorch#73543
pytorch#73545
pytorch#73546

We are adding support to Navi21 GPUs which have a warpsize of 32. We cannot rely on a constant so we have to dynamically look up the warpsize when launching the kernel on the host side. Inside device functions this is not needed and the compiler can correctly detect the correct warpsize to replace the C10_WARP_SIZE constant.
Pull Request resolved: pytorch#73548
Approved by: https://github.com/ngimel
  • Loading branch information
micmelesse authored and pytorchmergebot committed Mar 25, 2022
1 parent c0491c9 commit cd929f4
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 3 deletions.
3 changes: 2 additions & 1 deletion aten/src/ATen/native/sparse/cuda/SparseCUDAApplyUtils.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

#include <ATen/cuda/detail/TensorInfo.cuh>
#include <ATen/cuda/CUDAApplyUtils.cuh>
#include <ATen/native/cuda/thread_constants.h>
#include <c10/macros/Macros.h>

namespace at { namespace native {
Expand Down Expand Up @@ -304,7 +305,7 @@ __global__ void indexSparseIntersectionKernel(
// }

template <typename Dtype, typename Acctype>
C10_LAUNCH_BOUNDS_1(C10_WARP_SIZE*4)
C10_LAUNCH_BOUNDS_1(num_threads())
__global__ void coalesceValuesKernel(
int64_t *segment_offsets, int64_t *value_indices,
Dtype *values, Dtype *newValues,
Expand Down
5 changes: 3 additions & 2 deletions aten/src/ATen/native/sparse/cuda/SparseCUDATensor.cu
Original file line number Diff line number Diff line change
Expand Up @@ -142,8 +142,9 @@ SparseTensor _coalesce_sparse_cuda(const SparseTensor& self) {
const int SZ = 4;
values = values.contiguous();
int64_t stride = c10::multiply_integers(values.sizes().slice(1));
dim3 grid(ceil_div(newNnz, (int64_t) SZ), ceil_div(stride, (int64_t) C10_WARP_SIZE*SZ));
dim3 block(C10_WARP_SIZE, SZ);
int warp_size = at::cuda::warp_size();
dim3 grid(ceil_div(newNnz, (int64_t) SZ), ceil_div(stride, (int64_t) warp_size*SZ));
dim3 block(warp_size, SZ);
AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND3(
at::ScalarType::Half, at::ScalarType::BFloat16, at::ScalarType::Bool, values.scalar_type(), "coalesce_sparse_cuda", [&] {
using cuda_accscalar_t = acc_type<scalar_t, /* is_cuda */ true>;
Expand Down

0 comments on commit cd929f4

Please sign in to comment.