-
Notifications
You must be signed in to change notification settings - Fork 12.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AMDGPU: weird miscompilations when using __ballot HIP function #62477
Comments
@llvm/issue-subscribers-backend-amdgpu |
Another perplexing bug (maybe the same?):
|
This is the class of problem convergence tokens are intended to solve https://discourse.llvm.org/t/rfc-introduce-convergence-control-intrinsics/69613/3 |
OK, glad to see that it is a know issue and that it is being worked on. |
Your only options right now are to apply more instances of the "ockl ballot hoisting hack" that you noticed, or add in some wave barrier / subgroup barrier calls depending on the situation. By wrapping the ballot with another function, you're losing the hack in earlier control flow optimizations. |
What's a wave barrier in HIP? There doesn't seem to be the How would I apply the "ockl ballot hoisting hack" for code I write? There doesn't seem to be anything in the HIP implementation of |
By wave barrier, I mean __builtin_amdgcn_wave_barrier which is the analog of __syncwarp. Not sure if HIP currently exposes a proper subgroup API The hack is just using inline asm with a tied output constraint that copies its input, like: This is buried in OCKL, not in the hip headers |
Thanks a lot, the hoisting hack works! One last question, is it safe that at https://github.com/ROCm-Developer-Tools/clr/blob/develop/hipamd/include/hip/amd_detail/amd_device_functions.h the |
It's not really a principled enough hack to justify spreading it everywhere. As you've discovered, it's at most pushing the symptoms out one layer. We need convergence tokens to solve this in general |
Hi @arsenm , |
The base infrastructure has been committed, but clang isn't producing convergence tokens yet cc @sameerds |
Well, the short answer is "don't hold your breathe just yet" for convergence tokens. There is quite some work left in both frontend and backend to get this right. |
I think this bug no longer depends on convergence control tokens, after the following change: https://reviews.llvm.org/D144756 There have been similar recent changes like this in LLVM, that conservatively prevent optimizations around convergent operations. The control tokens will eventually improve these conservative changes to allow some optimizations in the right places. If the bug is still reproducible in the main branch, we should take closer look and just fix it without depending on tokens. |
I just tried with a very recent version of Clang from LLVM main (using HIP_CLANG_PATH in hipcc). The test passed. |
@ssahasra , happy to read that :-) . I will take your word for it (I won't try on LLVM master myself), and close the issue. |
Hi,
I am facing some weird mis-compilation for the following program when compiling for gfx90c/gfx908:
The assertion
ASSERT_EQUAL(results[9], 0xFFFFull, "partial ballot");
gets triggered.I should get the same value in
results[8]
andresults[9]
as it is basically the same expression being computed if we inline the different calls.What is really weird is that if I change
__hip_warp_ext_match_any
tothen it works. However it doesn't inspire confidence to say the least, as it should give the exact same result.
Maybe the "ockl ballot hoisting hack" that I see in the generated assembly of the non-reduced case is at fault?
Using
or
Best regards,
Epliz
The text was updated successfully, but these errors were encountered: