Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue]: assert doesn't appear to be implemented on CUDA backend #3719

Open
neworderofjamie opened this issue Jan 13, 2025 · 8 comments
Open

Comments

@neworderofjamie
Copy link

neworderofjamie commented Jan 13, 2025

Problem Description

using assert in HIP doesn't work with the CUDA backend

Operating System

Ubuntu 20.04.6 LTS (Focal Fossa)

CPU

Intel(R) Xeon(R) Gold 6134 CPU @ 3.20GHz

GPU

NVIDIA RTX A5000

ROCm Version

ROCm 6.2.4

ROCm Component

HIPCC

Steps to Reproduce

compile and run the following with CUDA:

#include <stdexcept>
#include <cassert>

__global__ void test(uint32_t rowLength)
 {
    printf("CUDA: %u < 4\n", rowLength);
    assert(rowLength < 4);
}
int main()
{
    const dim3 threads(32, 1);
    const dim3 grid(2, 1);
    
    test<<<grid, threads>>>(2);
    
    const cudaError_t error = cudaDeviceSynchronize();
    if (error != cudaSuccess) {\
        throw std::runtime_error("CUDA error " + std::to_string(error) + ": " + cudaGetErrorString(error));\
    }
}

using

nvcc -x cu -arch sm_86 -I/opt/rocm -std=c++11 cuda.cc -o cuda
./cuda

Observe that the kernel runs correctly and that it prints "CUDA 2 < 4" 32 times.

compile and run the following with HIPCC using the CUDA backend:

#include <hip/hip_runtime.h>

__global__ void test(uint32_t rowLength)
 {
    printf("HIP: %u < 4\n", rowLength);
    assert(rowLength < 4);
}

int main()
{
    const dim3 threads(32, 1);
    const dim3 grid(1, 1);
    
    test<<<grid, threads>>>(2);
    const hipError_t error = hipDeviceSynchronize();
    if (error != hipSuccess) {\
        throw std::runtime_error("HIP error " + std::to_string(error) + ": " + hipGetErrorString(error));\
    }
}

using:

hipcc -arch sm_86 -I/opt/rocm -std=c++11 hip.cc -o hip
./hip

and observe that while "HIP 2 < 4" is printed 32 times, the kernel exits with a unspecified launch failure.

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

If you look at the generated PTX, the reason for this seems clear. When compiling with nvcc you see:

setp.lt.u32 	%p1, %r1, 4;
@%p1 bra 	$L__BB0_2;

mov.u64 	%rd5, $str$1;
cvta.global.u64 	%rd6, %rd5;
mov.u64 	%rd7, $str$2;
cvta.global.u64 	%rd8, %rd7;
mov.u64 	%rd9, __unnamed_1;
cvta.global.u64 	%rd10, %rd9;
{ // callseq 1, 0
.reg .b32 temp_param_reg;
.param .b64 param0;
st.param.b64 	[param0+0], %rd6;
	.param .b64 param1;
st.param.b64 	[param1+0], %rd8;
.param .b32 param2;
st.param.b32 	[param2+0], 7;
.param .b64 param3;
st.param.b64 	[param3+0], %rd10;
.param .b64 param4;
st.param.b64 	[param4+0], 1;
call.uni 
__assertfail, 
(
param0, 
param1, 
param2, 
param3, 
param4
);
} // callseq 1
$L__BB0_2:

which appears to be a branch on the condition %r1 < 4 and some sort of function call to assert. However, when compiling with hipcc, there is no conditional code, just a trap:

// begin inline asm
trap;
// end inline asm
@b-sumner
Copy link
Contributor

@neworderofjamie the HIP implementation is operating as designed where an assert by the device communicates with the host at the point of the problem and then traps much as assert works in host code.

@neworderofjamie
Copy link
Author

neworderofjamie commented Jan 13, 2025

I cannot believe that that is true as:

  1. This assert should never trigger (2 < 4)
  2. The generated PTX seems entirely incorrect
  3. Even if the assert did trigger, the return code is not the correct one for an assert

@neworderofjamie
Copy link
Author

I just found ROCm/hipother#1 - I think this is the cause of the bug.

@zichguan-amd
Copy link

Hi @neworderofjamie, you are right, if you don't include any assertion library, asserts will be expanded according to the PR that you linked. Using -E to check:

$ hipcc -std=c++11 -E hip.cc | grep -n -C4 "rowLength < 4"
56747-    
56748-# 8 "hip.cc" 3 4
56749-   { if (!
56750-# 8 "hip.cc"
56751:   rowLength < 4
56752-# 8 "hip.cc" 3 4
56753-   ) { { asm("trap;"); }; } }
56754-# 8 "hip.cc"
56755-                        ;

The condition (!rowLength < 4) get's evaluated to 1 and triggers the assert, it should indeed be if (!(rowLength < 4)).

If you add #include <cassert>:

$ hipcc -std=c++11 -E hip.cc | grep -n -C4 "rowLength < 4"
56755-    
56756-# 8 "hip.cc" 3 4
56757-   (static_cast  (
56758-# 8 "hip.cc"
56759:   rowLength < 4
56760-# 8 "hip.cc" 3 4
56761-   ) ? void (0) : __assert_fail (
56762-# 8 "hip.cc"
56763:   "rowLength < 4"
56764-# 8 "hip.cc" 3 4
56765-   , __builtin_FILE (), __builtin_LINE (), __extension__ __PRETTY_FUNCTION__))
56766-# 8 "hip.cc"
56767-                        ;

then it works the same way as your cuda code.

I'll work with the HIP team to get the PR merged.

@b-sumner
Copy link
Contributor

@neworderofjamie you're right. I misunderstood the question. And thanks for the patch...not sure why this was not observed earlier.

I do want to reiterate that behavior of asserts (when correctly triggered) from HIP kernels running on ROCm will be different than Cuda kernels running on Nvidia and that difference is by design.

@neworderofjamie
Copy link
Author

Thanks, @zichguan-amd, #include <cassert> results in much better behaviour. It might be good to clarify the documentation as it suggests that assert is a function (so I wasn't on the lookout for macro issues) and says:

Another is the device version of assert, which is implemented in hip/hip_runtime.h. Users need to include assert.h to use assert. For assert to work in both device and host functions, users need to include “hip/hip_runtime.h”.

Which, I read to mean that you only needed to include hip/hip_runtime.h for device asserts.

@b-sumner, can you clarify what you mean about the behaviour of asserts? Unlike in the CUDA documentation, the behaviour is not described so this would be good to know.

@neworderofjamie
Copy link
Author

neworderofjamie commented Jan 15, 2025

What is the intended purpose of the macro-based assert? I am now trying to apply the fix to our actual code and it turns out we are including <cassert>, just before the HIP headers which undef assert and replace it with the broken macro. It would seem much better to get rid of the macro altogether and, like CUDA, asserts only work when <cassert> is included

@zichguan-amd
Copy link

If you trace down the include headers from hip/hip_runtime.h, assert is defined as a macro in assert.h:

#  define assert(expr)							\
     (static_cast <bool> (expr)						\
      ? void (0)							\
      : __assert_fail (#expr, __ASSERT_FILE, __ASSERT_LINE,             \
                       __ASSERT_FUNCTION))

Device-side implementation implements _assert_fail here https://github.com/ROCm/clr/blob/9d8d35ae4041ef0f37430b1265e0ad60698d5b51/hipamd/include/hip/amd_detail/hip_assert.h#L44, which is function-like macro just like cassert (https://en.cppreference.com/w/cpp/error/assert). So, CUDA also goes through this macro API, HIP just includes cassert for you.

I guess the question is why in nvidia_hip_runtime.h it's designed differently to purposely not use the CUDA's device-side _assert_fail. Maybe @b-sumner have more insights.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants