HIP assert - Memory access fault by GPU node-8 on address (nil). Reason: Page not present or supervisor privilege. #2676

zjin-lcf · 2022-05-13T02:36:09Z

Running the following HIP program shows memory access fault message. Can you reproduce the error ? Thanks.

The hipcc version is 4.5.2

/*
 * Copyright 1993-2015 NVIDIA Corporation.  All rights reserved.
 *
 * Please refer to the NVIDIA end user license agreement (EULA) associated
 * with this source code for terms and conditions that govern your use of
 * this software. Any use, reproduction, disclosure, or distribution of
 * this software and related documentation outside the terms of the EULA
 * is strictly prohibited.
 *
 */
#include <stdio.h>
#include <assert.h>
#include <hip/hip_runtime.h>

//! Tests assert function.
//! Thread whose id > N will print assertion failed error message.
__global__ void testKernel(int N)
{
  int gtid = blockIdx.x*blockDim.x + threadIdx.x ;
  assert(gtid < N) ;
}

// Declaration, forward
bool runTest(int argc, char **argv);

int main(int argc, char **argv)
{
  bool testResult = runTest(argc, argv);

  printf("Test assert completed, returned %s\n",
      testResult ? "OK" : "ERROR!");
  exit(testResult ? EXIT_SUCCESS : EXIT_FAILURE);
}

bool runTest(int argc, char **argv)
{
  int Nblocks = 2;
  int Nthreads = 32;
  hipError_t error ;

  // Kernel configuration, where a one-dimensional
  // grid and one-dimensional blocks are configured.
  dim3 dimGrid(Nblocks);
  dim3 dimBlock(Nthreads);

  printf("Launch kernel to generate assertion failures\n");
  hipLaunchKernelGGL(testKernel, dimGrid, dimBlock, 0, 0, 60);

  //Synchronize (flushes assert output).
  printf("\n-- Begin assert output\n\n");
  error = hipDeviceSynchronize();
  printf("\n-- End assert output\n\n");

  //Check for errors and failed asserts in asynchronous kernel launch.
  if (error == hipErrorAssert)
  {
    printf("Device assert failed as expected, "
        "HIP error message is: %s\n\n",
        hipGetErrorString(error));
  }

  return (error == hipErrorAssert);
}

The text was updated successfully, but these errors were encountered:

weihanmines · 2022-05-13T03:26:17Z

I think that the issue is the assert statement in the kernel. Some threads in a warp throw an exception and some of them succeeds. I don't think this is valid on a GPU. May I ask why you need the assert in the kernel? Why cannot you use a if statement instead?

zjin-lcf · 2022-05-13T03:38:19Z

Thank you for explaining the cause. Developers and researchers may call 'assert' in a kernel for testing and debugging. May the cuda version be an example for adding HIP support of 'assert' in a kernel ? Thanks.

weihanmines · 2022-05-13T03:43:12Z

Thank you for explaining the cause. Developers and researchers may call 'assert' in a kernel for testing and debugging. May the cuda version be an example for adding HIP support of 'assert' in a kernel ? Thanks.

Please refer to CUDA programming guide section B29. The following information is from CUDA programming guide.
void assert(int expression);
stops the kernel execution if expression is equal to zero.
Any subsequent host-side synchronization calls made for the same device will return cudaErrorAssert. No more commands can be sent to this device until cudaDeviceReset() is called to reinitialize the device.

zjin-lcf · 2022-05-13T11:35:34Z

For the HIP program, the expected behavior is not memory access fault, is it ?

zjin-lcf · 2022-05-13T14:05:30Z

Regardless of the 'yes' or 'no' answer, I added an example that might be useful for you to evaluate assert in cuda and hip.

https://github.com/zjin-lcf/HeCBench/tree/master/assert-hip
https://github.com/zjin-lcf/HeCBench/tree/master/assert-cuda

b-sumner · 2022-05-13T14:41:40Z

assert is already supported. I believe the issue is that the compiler detection of the mechanism that assert uses was flawed. You can probably work around for now by adding a call to printf somewhere, e.g. in a kernel that is never called. This will be fixed in a future release.

zjin-lcf · 2022-05-13T15:02:11Z

Thanks.

zjin-lcf closed this as completed May 13, 2022

zjin-lcf mentioned this issue May 13, 2022

Support assert on HIP backend intel/llvm#5990

Closed

JackAKirk mentioned this issue Sep 28, 2023

[SYCL][HIP] Unresolved Assert/.* tests failures intel/llvm#7634

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HIP assert - Memory access fault by GPU node-8 on address (nil). Reason: Page not present or supervisor privilege. #2676

HIP assert - Memory access fault by GPU node-8 on address (nil). Reason: Page not present or supervisor privilege. #2676

zjin-lcf commented May 13, 2022

weihanmines commented May 13, 2022 •

edited

Loading

zjin-lcf commented May 13, 2022

weihanmines commented May 13, 2022 •

edited

Loading

zjin-lcf commented May 13, 2022 •

edited

Loading

zjin-lcf commented May 13, 2022

b-sumner commented May 13, 2022

zjin-lcf commented May 13, 2022

HIP assert - Memory access fault by GPU node-8 on address (nil). Reason: Page not present or supervisor privilege. #2676

HIP assert - Memory access fault by GPU node-8 on address (nil). Reason: Page not present or supervisor privilege. #2676

Comments

zjin-lcf commented May 13, 2022

weihanmines commented May 13, 2022 • edited Loading

zjin-lcf commented May 13, 2022

weihanmines commented May 13, 2022 • edited Loading

zjin-lcf commented May 13, 2022 • edited Loading

zjin-lcf commented May 13, 2022

b-sumner commented May 13, 2022

zjin-lcf commented May 13, 2022

weihanmines commented May 13, 2022 •

edited

Loading

weihanmines commented May 13, 2022 •

edited

Loading

zjin-lcf commented May 13, 2022 •

edited

Loading