Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIP assert - Memory access fault by GPU node-8 on address (nil). Reason: Page not present or supervisor privilege. #2676

Closed
zjin-lcf opened this issue May 13, 2022 · 7 comments

Comments

@zjin-lcf
Copy link

Running the following HIP program shows memory access fault message. Can you reproduce the error ? Thanks.

The hipcc version is 4.5.2

/*
 * Copyright 1993-2015 NVIDIA Corporation.  All rights reserved.
 *
 * Please refer to the NVIDIA end user license agreement (EULA) associated
 * with this source code for terms and conditions that govern your use of
 * this software. Any use, reproduction, disclosure, or distribution of
 * this software and related documentation outside the terms of the EULA
 * is strictly prohibited.
 *
 */
#include <stdio.h>
#include <assert.h>
#include <hip/hip_runtime.h>

//! Tests assert function.
//! Thread whose id > N will print assertion failed error message.
__global__ void testKernel(int N)
{
  int gtid = blockIdx.x*blockDim.x + threadIdx.x ;
  assert(gtid < N) ;
}

// Declaration, forward
bool runTest(int argc, char **argv);

int main(int argc, char **argv)
{
  bool testResult = runTest(argc, argv);

  printf("Test assert completed, returned %s\n",
      testResult ? "OK" : "ERROR!");
  exit(testResult ? EXIT_SUCCESS : EXIT_FAILURE);
}

bool runTest(int argc, char **argv)
{
  int Nblocks = 2;
  int Nthreads = 32;
  hipError_t error ;

  // Kernel configuration, where a one-dimensional
  // grid and one-dimensional blocks are configured.
  dim3 dimGrid(Nblocks);
  dim3 dimBlock(Nthreads);

  printf("Launch kernel to generate assertion failures\n");
  hipLaunchKernelGGL(testKernel, dimGrid, dimBlock, 0, 0, 60);

  //Synchronize (flushes assert output).
  printf("\n-- Begin assert output\n\n");
  error = hipDeviceSynchronize();
  printf("\n-- End assert output\n\n");

  //Check for errors and failed asserts in asynchronous kernel launch.
  if (error == hipErrorAssert)
  {
    printf("Device assert failed as expected, "
        "HIP error message is: %s\n\n",
        hipGetErrorString(error));
  }

  return (error == hipErrorAssert);
}
@weihanmines
Copy link

weihanmines commented May 13, 2022

I think that the issue is the assert statement in the kernel. Some threads in a warp throw an exception and some of them succeeds. I don't think this is valid on a GPU. May I ask why you need the assert in the kernel? Why cannot you use a if statement instead?

@zjin-lcf
Copy link
Author

Thank you for explaining the cause. Developers and researchers may call 'assert' in a kernel for testing and debugging. May the cuda version be an example for adding HIP support of 'assert' in a kernel ? Thanks.

@weihanmines
Copy link

weihanmines commented May 13, 2022

Thank you for explaining the cause. Developers and researchers may call 'assert' in a kernel for testing and debugging. May the cuda version be an example for adding HIP support of 'assert' in a kernel ? Thanks.

Please refer to CUDA programming guide section B29. The following information is from CUDA programming guide.
void assert(int expression);
stops the kernel execution if expression is equal to zero.
Any subsequent host-side synchronization calls made for the same device will return cudaErrorAssert. No more commands can be sent to this device until cudaDeviceReset() is called to reinitialize the device.

@zjin-lcf
Copy link
Author

zjin-lcf commented May 13, 2022

For the HIP program, the expected behavior is not memory access fault, is it ?

@zjin-lcf
Copy link
Author

Regardless of the 'yes' or 'no' answer, I added an example that might be useful for you to evaluate assert in cuda and hip.

https://github.com/zjin-lcf/HeCBench/tree/master/assert-hip
https://github.com/zjin-lcf/HeCBench/tree/master/assert-cuda

@b-sumner
Copy link
Contributor

assert is already supported. I believe the issue is that the compiler detection of the mechanism that assert uses was flawed. You can probably work around for now by adding a call to printf somewhere, e.g. in a kernel that is never called. This will be fixed in a future release.

@zjin-lcf
Copy link
Author

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants