[Issue]: assert doesn't appear to be implemented on CUDA backend #3719

neworderofjamie · 2025-01-13T14:44:31Z

Problem Description

using assert in HIP doesn't work with the CUDA backend

Operating System

Ubuntu 20.04.6 LTS (Focal Fossa)

CPU

Intel(R) Xeon(R) Gold 6134 CPU @ 3.20GHz

GPU

NVIDIA RTX A5000

ROCm Version

ROCm 6.2.4

ROCm Component

HIPCC

Steps to Reproduce

compile and run the following with CUDA:

#include <stdexcept>
#include <cassert>

__global__ void test(uint32_t rowLength)
 {
    printf("CUDA: %u < 4\n", rowLength);
    assert(rowLength < 4);
}
int main()
{
    const dim3 threads(32, 1);
    const dim3 grid(2, 1);
    
    test<<<grid, threads>>>(2);
    
    const cudaError_t error = cudaDeviceSynchronize();
    if (error != cudaSuccess) {\
        throw std::runtime_error("CUDA error " + std::to_string(error) + ": " + cudaGetErrorString(error));\
    }
}

using

nvcc -x cu -arch sm_86 -I/opt/rocm -std=c++11 cuda.cc -o cuda
./cuda

Observe that the kernel runs correctly and that it prints "CUDA 2 < 4" 32 times.

compile and run the following with HIPCC using the CUDA backend:

#include <hip/hip_runtime.h>

__global__ void test(uint32_t rowLength)
 {
    printf("HIP: %u < 4\n", rowLength);
    assert(rowLength < 4);
}

int main()
{
    const dim3 threads(32, 1);
    const dim3 grid(1, 1);
    
    test<<<grid, threads>>>(2);
    const hipError_t error = hipDeviceSynchronize();
    if (error != hipSuccess) {\
        throw std::runtime_error("HIP error " + std::to_string(error) + ": " + hipGetErrorString(error));\
    }
}

using:

hipcc -arch sm_86 -I/opt/rocm -std=c++11 hip.cc -o hip
./hip

and observe that while "HIP 2 < 4" is printed 32 times, the kernel exits with a unspecified launch failure.

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

If you look at the generated PTX, the reason for this seems clear. When compiling with nvcc you see:

setp.lt.u32 	%p1, %r1, 4;
@%p1 bra 	$L__BB0_2;

mov.u64 	%rd5, $str$1;
cvta.global.u64 	%rd6, %rd5;
mov.u64 	%rd7, $str$2;
cvta.global.u64 	%rd8, %rd7;
mov.u64 	%rd9, __unnamed_1;
cvta.global.u64 	%rd10, %rd9;
{ // callseq 1, 0
.reg .b32 temp_param_reg;
.param .b64 param0;
st.param.b64 	[param0+0], %rd6;
	.param .b64 param1;
st.param.b64 	[param1+0], %rd8;
.param .b32 param2;
st.param.b32 	[param2+0], 7;
.param .b64 param3;
st.param.b64 	[param3+0], %rd10;
.param .b64 param4;
st.param.b64 	[param4+0], 1;
call.uni 
__assertfail, 
(
param0, 
param1, 
param2, 
param3, 
param4
);
} // callseq 1
$L__BB0_2:

which appears to be a branch on the condition %r1 < 4 and some sort of function call to assert. However, when compiling with hipcc, there is no conditional code, just a trap:

// begin inline asm
trap;
// end inline asm

The text was updated successfully, but these errors were encountered:

b-sumner · 2025-01-13T15:08:57Z

@neworderofjamie the HIP implementation is operating as designed where an assert by the device communicates with the host at the point of the problem and then traps much as assert works in host code.

neworderofjamie · 2025-01-13T15:14:19Z

I cannot believe that that is true as:

This assert should never trigger (2 < 4)
The generated PTX seems entirely incorrect
Even if the assert did trigger, the return code is not the correct one for an assert

neworderofjamie · 2025-01-14T15:29:34Z

I just found ROCm/hipother#1 - I think this is the cause of the bug.

zichguan-amd · 2025-01-14T18:29:49Z

Hi @neworderofjamie, you are right, if you don't include any assertion library, asserts will be expanded according to the PR that you linked. Using -E to check:

$ hipcc -std=c++11 -E hip.cc | grep -n -C4 "rowLength < 4"
56747-    
56748-# 8 "hip.cc" 3 4
56749-   { if (!
56750-# 8 "hip.cc"
56751:   rowLength < 4
56752-# 8 "hip.cc" 3 4
56753-   ) { { asm("trap;"); }; } }
56754-# 8 "hip.cc"
56755-                        ;

The condition (!rowLength < 4) get's evaluated to 1 and triggers the assert, it should indeed be if (!(rowLength < 4)).

If you add #include <cassert>:

$ hipcc -std=c++11 -E hip.cc | grep -n -C4 "rowLength < 4"
56755-    
56756-# 8 "hip.cc" 3 4
56757-   (static_cast  (
56758-# 8 "hip.cc"
56759:   rowLength < 4
56760-# 8 "hip.cc" 3 4
56761-   ) ? void (0) : __assert_fail (
56762-# 8 "hip.cc"
56763:   "rowLength < 4"
56764-# 8 "hip.cc" 3 4
56765-   , __builtin_FILE (), __builtin_LINE (), __extension__ __PRETTY_FUNCTION__))
56766-# 8 "hip.cc"
56767-                        ;

then it works the same way as your cuda code.

I'll work with the HIP team to get the PR merged.

b-sumner · 2025-01-14T18:52:27Z

@neworderofjamie you're right. I misunderstood the question. And thanks for the patch...not sure why this was not observed earlier.

I do want to reiterate that behavior of asserts (when correctly triggered) from HIP kernels running on ROCm will be different than Cuda kernels running on Nvidia and that difference is by design.

neworderofjamie · 2025-01-15T14:21:23Z

Thanks, @zichguan-amd, #include <cassert> results in much better behaviour. It might be good to clarify the documentation as it suggests that assert is a function (so I wasn't on the lookout for macro issues) and says:

Another is the device version of assert, which is implemented in hip/hip_runtime.h. Users need to include assert.h to use assert. For assert to work in both device and host functions, users need to include “hip/hip_runtime.h”.

Which, I read to mean that you only needed to include hip/hip_runtime.h for device asserts.

@b-sumner, can you clarify what you mean about the behaviour of asserts? Unlike in the CUDA documentation, the behaviour is not described so this would be good to know.

neworderofjamie · 2025-01-15T14:42:22Z

What is the intended purpose of the macro-based assert? I am now trying to apply the fix to our actual code and it turns out we are including <cassert>, just before the HIP headers which undef assert and replace it with the broken macro. It would seem much better to get rid of the macro altogether and, like CUDA, asserts only work when <cassert> is included

zichguan-amd · 2025-01-15T15:44:48Z

If you trace down the include headers from hip/hip_runtime.h, assert is defined as a macro in assert.h:

#  define assert(expr)							\
     (static_cast <bool> (expr)						\
      ? void (0)							\
      : __assert_fail (#expr, __ASSERT_FILE, __ASSERT_LINE,             \
                       __ASSERT_FUNCTION))

Device-side implementation implements _assert_fail here https://github.com/ROCm/clr/blob/9d8d35ae4041ef0f37430b1265e0ad60698d5b51/hipamd/include/hip/amd_detail/hip_assert.h#L44, which is function-like macro just like cassert (https://en.cppreference.com/w/cpp/error/assert). So, CUDA also goes through this macro API, HIP just includes cassert for you.

I guess the question is why in nvidia_hip_runtime.h it's designed differently to purposely not use the CUDA's device-side _assert_fail. Maybe @b-sumner have more insights.

ppanchad-amd added the Under Investigation label Jan 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Issue]: assert doesn't appear to be implemented on CUDA backend #3719

[Issue]: assert doesn't appear to be implemented on CUDA backend #3719

neworderofjamie commented Jan 13, 2025 •

edited

Loading

b-sumner commented Jan 13, 2025

neworderofjamie commented Jan 13, 2025 •

edited

Loading

neworderofjamie commented Jan 14, 2025

zichguan-amd commented Jan 14, 2025

b-sumner commented Jan 14, 2025

neworderofjamie commented Jan 15, 2025

neworderofjamie commented Jan 15, 2025 •

edited

Loading

zichguan-amd commented Jan 15, 2025

[Issue]: assert doesn't appear to be implemented on CUDA backend #3719

[Issue]: assert doesn't appear to be implemented on CUDA backend #3719

Comments

neworderofjamie commented Jan 13, 2025 • edited Loading

Problem Description

Operating System

CPU

GPU

ROCm Version

ROCm Component

Steps to Reproduce

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

Additional Information

b-sumner commented Jan 13, 2025

neworderofjamie commented Jan 13, 2025 • edited Loading

neworderofjamie commented Jan 14, 2025

zichguan-amd commented Jan 14, 2025

b-sumner commented Jan 14, 2025

neworderofjamie commented Jan 15, 2025

neworderofjamie commented Jan 15, 2025 • edited Loading

zichguan-amd commented Jan 15, 2025

neworderofjamie commented Jan 13, 2025 •

edited

Loading

neworderofjamie commented Jan 13, 2025 •

edited

Loading

neworderofjamie commented Jan 15, 2025 •

edited

Loading