Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

SPMV Invalid Configuration Argument #280

Closed
jdwapman opened this issue Apr 2, 2021 · 2 comments · Fixed by #163
Closed

SPMV Invalid Configuration Argument #280

jdwapman opened this issue Apr 2, 2021 · 2 comments · Fixed by #163
Assignees
Labels
type: bug: functional Does not work as intended.
Milestone

Comments

@jdwapman
Copy link

jdwapman commented Apr 2, 2021

Summary: The cub::DeviceSpmv::CsrMV function frequently fails with error: invalid configuration argument (9) due to incorrect kernel launch bounds.

Example Program:

#include <thrust/device_vector.h>
#include <iostream>
#include <cub/cub.cuh>  // or equivalently <cub/device/device_spmv.cuh>

#define CHECK_CUDA(func)                                                   \
  {                                                                        \
    cudaError_t status = (func);                                           \
    if (status != cudaSuccess) {                                           \
      printf("CUDA API failed at line %d with error: %s (%d)\n", __LINE__, \
             cudaGetErrorString(status), status);                          \
      return EXIT_FAILURE;                                                 \
    }                                                                      \
  }

int main(int argc, char** argv) {
  // Determine temporary device storage requirements
  void* d_temp_storage = NULL;
  size_t temp_storage_bytes = 0;

  int num_rows = 1000;
  int num_cols = 1000;
  int num_nonzeros = 20;

  thrust::device_vector<float> d_values(num_nonzeros, 0);
  thrust::sequence(d_values.begin(), d_values.end()); // 0,1,2,...

  // float* d_values;  // e.g., [1, 1, 1, 1, 1, 1, 1, 1,
  //        1, 1, 1, 1, 1, 1, 1, 1,
  //        1, 1, 1, 1, 1, 1, 1, 1]

  thrust::device_vector<int> d_column_indices(num_nonzeros, 0);
  thrust::sequence(d_column_indices.begin(), d_column_indices.end());
  // int*   d_column_indices; // e.g., [1, 3, 0, 2, 4, 1, 5, 0,
  //        4, 6, 1, 3, 5, 7, 2, 4,
  //        8, 3, 7, 4, 6, 8, 5, 7]

  thrust::device_vector<int> d_row_offsets(num_rows + 1, 0);
  thrust::sequence(d_row_offsets.begin(), d_row_offsets.end()); // 0,1,2,...
  // int*   d_row_offsets;    // e.g., [0, 2, 5, 7, 10, 14, 17, 19, 22, 24]

  thrust::device_vector<float> d_vector_x(num_rows, 0);
  thrust::fill(d_vector_x.begin(), d_vector_x.end(), 1);
  // float* d_vector_x;       // e.g., [1, 1, 1, 1, 1, 1, 1, 1, 1]

  thrust::device_vector<float> d_vector_y(num_rows, 0);
  // float* d_vector_y;       // e.g., [ ,  ,  ,  ,  ,  ,  ,  ,  ]

  CHECK_CUDA(cub::DeviceSpmv::CsrMV(d_temp_storage, temp_storage_bytes,
                                    d_values.data().get(), d_row_offsets.data().get(), d_column_indices.data().get(),
                                    d_vector_x.data().get(), d_vector_y.data().get(), num_rows, num_cols,
                                    num_nonzeros, 0, true));

                                    CHECK_CUDA(cudaDeviceSynchronize());

          
  // Allocate temporary storage
  CHECK_CUDA(cudaMalloc(&d_temp_storage, temp_storage_bytes));

  printf("Allocated %d bytes of temp storage\n", temp_storage_bytes);

  // Run SpMV
  CHECK_CUDA(cub::DeviceSpmv::CsrMV(d_temp_storage, temp_storage_bytes,
                                    d_values.data().get(), d_row_offsets.data().get(), d_column_indices.data().get(),
                                    d_vector_x.data().get(), d_vector_y.data().get(), num_rows, num_cols,
                                    num_nonzeros, 0, true));

  CHECK_CUDA(cudaDeviceSynchronize());

  CHECK_CUDA(cudaFree(d_temp_storage));
}

Output:

Allocated 1535 bytes of temp storage
Invoking spmv_kernel<<<{2,-1,1}, 128, 0, 0>>>(), 7 items per thread, 12 SM occupancy
CUDA API failed at line 62 with error: invalid configuration argument (9)

I believe this error occurs due to an incorrect gridDim.y calculated parameter. By varying num_rows, num_cols, and num_nonzeros in the example, it is possible to have this parameter be 1, 0, or -1.

This behavior occurs with both the CUDA 11.2 included CUB as well as the most recent CUB dependency within the Thrust library downloaded and compiled from Github. This error additionally occurs with many of the matrixes in the suitesparse collection, and is not an artifact of the toy matrix defined in the above program.

System Specs:
CPU: Intel Xeon E5-2698
GPU: V100
nvcc 11.2.152
g++ 9.3.0

@alliepiper alliepiper added the type: bug: functional Does not work as intended. label Apr 2, 2021
@alliepiper alliepiper self-assigned this Jul 30, 2021
@alliepiper alliepiper added this to the 1.14.0 milestone Jul 30, 2021
@alliepiper
Copy link
Collaborator

It looks like there are a few things going on here...

The gridDim.y issue appears to have been resolved by 63e2ad4. I consistently get {1, 1, 1} for the grid dimensions on the current main branch.

Once that was fixed, new errors were emitted from the issue described in #162. Applying #163 resolves that.

Looks like this can be closed once #163 is merged.

@alliepiper alliepiper linked a pull request Jul 30, 2021 that will close this issue
@alliepiper
Copy link
Collaborator

FYI, I've edited the repro script to fix a bug:

  thrust::device_vector<int> d_row_offsets(num_rows, 0);

  thrust::device_vector<int> d_row_offsets(num_rows + 1, 0);

(row_offsets must contain an extra element with the total number of non-zero elements)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
type: bug: functional Does not work as intended.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants