-
Notifications
You must be signed in to change notification settings - Fork 396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider ROCm releases with upstream llvm compatibility #1904
Comments
I was speaking with some of the compiler guys and I believe the intention is to move towards using stable LLVM or at least allow for compilation against stable LLVM. Unfortunately, there's no timeline and it seems like a bit of an involved task due to how their code flows processes work. For now, I've been just maintaining my own branch of compiler support: It's mostly reverts or cherry-picks to get it to work. It's not perfect, but it gets the Fedora packages moving. Some downstream ROCm components might expect a newer LLVM, but this component is definitely the most sensitive. I suspect when I eventually get around to packaging HIP in Fedora, the package set will be more sensitive to the LLVM version. |
Hope stable releases of downstream projects could be synced with LLVM. It would be better if they got upstream (especially rocm-device-libs)
From my experience rocm-device-libs and rocm-compilersupport has the closest relationship with LLVM. The first one provides bitcodes, and the second currently contains comgr, in my sense a library that calls clang driver (using C++ API) to compile and inspect GPU code objects. HIP is a runtime API + hipcc the perl wrapper (for AOT) + hiprtc (the wrapper for comgr? For JIT), and it's relationship with llvm/clang is somehow weaker compared to the previous two. |
llvm/llvm-project#60313 & https://github.com/RadeonOpenCompute/ROCm-CompilerSupport/issues/45 4 of comgr test case use a feature to call amdgpu kernel within a amdgpu kernel, but the feature is not well implemented and not upstreamed. Last time a check fedora has patched out these 4 tests. In Gentoo I patched the test to avoid calling other kernels inside a kernel, so the compilations are OK with upstream clang. This strips out the test failures, and keeps the purpose of testing basic compilation capabilities of comgr. |
And in developing branch a potential breakage is found between llvm/clang and rocm-device-libs: https://reviews.llvm.org/D142507 & ROCm/ROCm-Device-Libs@8dc779e |
@searlmc1 as FYI
Yes that was me, and I just adopted your patch. Thank you!
Thanks for pointing this out. I guess it means we should keep device libs at 5.4 for LLVM 16? I'm a bit uncomfortable with this response:
|
And in amd-stg-open branch they added
Yes. While there's uncertainties. If they (ROCm) may revert ROCm/ROCm-Device-Libs@8dc779e on ROCm-5.5.x releases then things will be fine. Or we have to revert it ourselves for llvm-16 compatibility. |
Ok after some thought, for the meantime I think I'm going to try to maintain some branches for LLVM versions: https://github.com/Mystro256/ROCm-Device-Libs/tree/release/16.x Feel free to contribute if you are interested. I'm going to try evaluating it when Fedora bumps to llvm 16. |
Haven't cut a release yet, but for the HIP portion of ROCm I now have Bazel build files in rules_ll. This project makes it possible to write a target like: ll_binary(name="hip_example", srcs=["hello.hip"], compilation_mode="hip_amdgpu") Which can be built and run with:
With a single command invocation this all happens automaticlly 🥳 This is of course highly experimental. at the moment it only works with out-of-tree rules_ll, and not with the regular installation instructions. I'll create a release in the next few days and then create an issue that explains this in more detail. |
@littlewu2508 Is this ticket still relevant? Thanks! |
One of major issues for ROCm distribution packaging is the compatibility with upstream llvm due to the asynchronous development cycle.
For example, feature A pushed to ROCm's llvm, and then pushed to upstream llvm. If unfortunate, A needs to wait about half a year to enter next llvm stable release; while for ROCm the next release will come sooner, shipping A and components depending on A.
What's more, A is implemented upon llvm version X, but when upstreamed it enters the X+1 release. However the ROCm new release Y ships A with llvm X (ROCm forked). Distribution: cannot ship llvm-X & ROCm-Y becuase ROCm-Y depend on A. llvm-X+1 is also incompatible with ROCm-Y because it is based on llvm-X.
Situation nowadays are much better, only few patches can do the hack. But it still makes maintaining ROCm distribution packages difficult. It would be better if ROCm latest source releases can be compatible with the upstream llvm release.
@emollier @cgmb @Mystro256 @Madouura @tpkessler Distribution packagers can also reference this issue when reporting and fixing such incompatibility, and sharing information to reduce workload.
The text was updated successfully, but these errors were encountered: