[AMDGPU][LTO] Disable adding `AMDGPULowerModuleLDSPass` to LTO #243

ergawy · 2025-01-14T14:10:30Z

This is a workaround to resolve https://ontrack-internal.amd.com/browse/SWDEV-502923

The AMDGPULowerModuleLDS pass is run 2 times. Both times are run on the combined/linked module for the entire input during LTO.

The first run happens when lto::backend executes the the optimization pipeline (through calling opt)
(https://github.com/llvm/llvm-project/blob/main/llvm/lib/LTO/LTOBackend.cpp#L551) and the second run happens when lto::backend later calls codegen (https://github.com/llvm/llvm-project/blob/main/llvm/lib/LTO/LTOBackend.cpp#L558).

A crash happens in the second run because between the first and second runs, a new GV is added to the combined module:

@llvm.amdgcn.kernel.__omp_offloading_fc00_600030__QQmain_l5.lds = internal addrspace(3) global %llvm.amdgcn.kernel.__omp_offloading_fc00_600030__QQmain_l5.lds.t poison, align 16, !absolute_symbol !0

This is the only absolute variable and it is not there yet during the first run of AMDGPULowerModuleLDS.

In addition to the above absolute variable, we have other GVs that remain in the combined module (during the second run of the pass):

@_ZZ35__kmpc_nvptx_teams_reduce_nowait_v2E14ChunkTeamCount = internal unnamed_addr addrspace(3) global i32 undef, align 4
@_ZZ35__kmpc_nvptx_teams_reduce_nowait_v2E5Bound = internal unnamed_addr addrspace(3) global i32 undef, align 4

which are not absolute, causing the issue (https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/AMDGPUMemoryUtils.cpp#L250).

Disussing this with Jon Chesterfield, running the pass twice is actually incorrect (it should be run only once during code-gen, the first run of the pass
creates
"@llvm.amdgcn.kernel.__omp_offloading_fc00_600030__QQmain_l5.lds" which makes the second run fail). Commenting out these 2 lines "fixes" the issue:
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp#L863-L864. We raised an issue upstream to further discuss proper fixes: llvm#122891.

This is a workaround to resolve https://ontrack-internal.amd.com/browse/SWDEV-502923 The AMDGPULowerModuleLDS pass is run 2 times. Both times are run on the combined/linked module for the entire input during LTO. The first run happens when lto::backend executes the the optimization pipeline (through calling opt) (https://github.com/llvm/llvm-project/blob/main/llvm/lib/LTO/LTOBackend.cpp#L551) and the second run happens when lto::backend later calls codegen (https://github.com/llvm/llvm-project/blob/main/llvm/lib/LTO/LTOBackend.cpp#L558). A crash happens in the second run because between the first and second runs, a new GV is added to the combined module: ``` @llvm.amdgcn.kernel.__omp_offloading_fc00_600030__QQmain_l5.lds = internal addrspace(3) global %llvm.amdgcn.kernel.__omp_offloading_fc00_600030__QQmain_l5.lds.t poison, align 16, !absolute_symbol !0 ``` This is the only absolute variable and it is not there yet during the first run of AMDGPULowerModuleLDS. In addition to the above absolute variable, we have other GVs that remain in the combined module (during the second run of the pass): ``` @_ZZ35__kmpc_nvptx_teams_reduce_nowait_v2E14ChunkTeamCount = internal unnamed_addr addrspace(3) global i32 undef, align 4 @_ZZ35__kmpc_nvptx_teams_reduce_nowait_v2E5Bound = internal unnamed_addr addrspace(3) global i32 undef, align 4 ``` which are not absolute, causing the issue (https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/AMDGPUMemoryUtils.cpp#L250). Disussing this with Jon Chesterfield, running the pass twice is actually incorrect (it should be run only once during code-gen, the first run of the pass creates "@llvm.amdgcn.kernel.__omp_offloading_fc00_600030__QQmain_l5.lds" which makes the second run fail). Commenting out these 2 lines "fixes" the issue: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp#L863-L864. We raised an issue upstream to further discuss proper fixes: llvm#122891.

ergawy · 2025-01-14T14:15:57Z

Running check-mlir/flang/clang/llvm seem to run fine except for the test that was added specifically to test the commented out code path, I XFAILed that test.

JonChesterfield · 2025-01-14T15:19:27Z

This should break thin lto while unbreaking everything else.

ergawy · 2025-01-14T15:33:51Z

This should break thin lto while unbreaking everything else.

@dpalermo do we use thin LTO at all at this point? Is the change good to go from your perspective?

bcahoon · 2025-01-14T16:16:32Z

I think this patch should be added to https://github.com/llvm/llvm-project? Pierre is best person to ask about this as I believe the intent was to only run the pass once? But, the reason for Pierre's change was to support lto partitions, which is needed to reduce link time seen when using relocatable device code.

ergawy · 2025-01-14T17:47:00Z

I think this patch should be added to https://github.com/llvm/llvm-project?

I did not do that because I am sure Pierre had a good reason to do so, even if there is a bug that needs a proper fix. I opened this here to take into our team's interal nighlty builds to see if this helps with customer codes or not.

If it turns out we simply need to remove the pass, I can open it upstream as well.

ergawy · 2025-01-14T17:52:50Z

@dpalermo is fine with mergin the PR to see how it affects our internal nightlies and customer apps under test (might be a nice experimentation for discussions upstream as well).

ergawy requested review from dpalermo and JonChesterfieldAMD January 14, 2025 14:10

ergawy merged commit a84ed63 into ROCm:amd-trunk-dev Jan 14, 2025
3 of 5 checks passed

JonChesterfield mentioned this pull request Jan 14, 2025

Running lowermodulelds multiple times needs to be thinlto only, breaking fortran at present llvm/llvm-project#122891

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMDGPU][LTO] Disable adding `AMDGPULowerModuleLDSPass` to LTO #243

[AMDGPU][LTO] Disable adding `AMDGPULowerModuleLDSPass` to LTO #243

ergawy commented Jan 14, 2025

ergawy commented Jan 14, 2025

JonChesterfield commented Jan 14, 2025

ergawy commented Jan 14, 2025

bcahoon commented Jan 14, 2025

ergawy commented Jan 14, 2025

ergawy commented Jan 14, 2025

[AMDGPU][LTO] Disable adding AMDGPULowerModuleLDSPass to LTO #243

[AMDGPU][LTO] Disable adding AMDGPULowerModuleLDSPass to LTO #243

Conversation

ergawy commented Jan 14, 2025

ergawy commented Jan 14, 2025

JonChesterfield commented Jan 14, 2025

ergawy commented Jan 14, 2025

bcahoon commented Jan 14, 2025

ergawy commented Jan 14, 2025

ergawy commented Jan 14, 2025

[AMDGPU][LTO] Disable adding `AMDGPULowerModuleLDSPass` to LTO #243

[AMDGPU][LTO] Disable adding `AMDGPULowerModuleLDSPass` to LTO #243