-
Notifications
You must be signed in to change notification settings - Fork 12.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AMDGPU] Use correct operand order for shifts #68299
Conversation
In a special case in frame index elimination (when the offset is 0), we generate either a S_LSHR_B32 or a V_LSHRREV_B32 using the same code. However, they don't expect their operands in the same order - S_LSHR_B32 takes the value to be shifted first and then the shift amount, whereas V_LSHRREV_B32 has the operands reversed (hence the REV in its name). Update the code & tests to take this into account. Also remove an outdated comment (this code is definitely reachable now that non-entry functions no longer have a fixed emergency scavenge slot).
@llvm/pr-subscribers-backend-amdgpu ChangesIn a special case in frame index elimination (when the offset is 0), we generate either a S_LSHR_B32 or a V_LSHRREV_B32 using the same code. However, they don't expect their operands in the same order - S_LSHR_B32 takes the value to be shifted first and then the shift amount, whereas V_LSHRREV_B32 has the operands reversed (hence the REV in its name). Update the code & tests to take this into account. Also remove an outdated comment (this code is definitely reachable now that non-entry functions no longer have a fixed emergency scavenge slot). Full diff: https://github.com/llvm/llvm-project/pull/68299.diff 2 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp b/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
index e065b104db4eb33..d0a81673d6528c2 100644
--- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
@@ -2432,10 +2432,13 @@ bool SIRegisterInfo::eliminateFrameIndex(MachineBasicBlock::iterator MI,
if (Offset == 0) {
unsigned OpCode = IsSALU && !LiveSCC ? AMDGPU::S_LSHR_B32
: AMDGPU::V_LSHRREV_B32_e64;
- // XXX - This never happens because of emergency scavenging slot at 0?
- auto Shift = BuildMI(*MBB, MI, DL, TII->get(OpCode), ResultReg)
- .addImm(ST.getWavefrontSizeLog2())
- .addReg(FrameReg);
+ auto Shift = BuildMI(*MBB, MI, DL, TII->get(OpCode), ResultReg);
+ if (OpCode == AMDGPU::V_LSHRREV_B32_e64)
+ // For V_LSHRREV, the operands are reversed (the shift count goes
+ // first).
+ Shift.addImm(ST.getWavefrontSizeLog2()).addReg(FrameReg);
+ else
+ Shift.addReg(FrameReg).addImm(ST.getWavefrontSizeLog2());
if (IsSALU && !LiveSCC)
Shift.getInstr()->getOperand(3).setIsDead(); // Mark SCC as dead.
if (IsSALU && LiveSCC) {
diff --git a/llvm/test/CodeGen/AMDGPU/frame-index.mir b/llvm/test/CodeGen/AMDGPU/frame-index.mir
index 31bc2b04dde8c13..d8736c5276a26d2 100644
--- a/llvm/test/CodeGen/AMDGPU/frame-index.mir
+++ b/llvm/test/CodeGen/AMDGPU/frame-index.mir
@@ -53,7 +53,7 @@ body: |
; GCN-LABEL: name: func_add_constant_to_fi_uniform_i32
; GCN: liveins: $sgpr30_sgpr31
; GCN-NEXT: {{ $}}
- ; GCN-NEXT: $sgpr0 = S_LSHR_B32 6, $sgpr32, implicit-def dead $scc
+ ; GCN-NEXT: $sgpr0 = S_LSHR_B32 $sgpr32, 6, implicit-def dead $scc
; GCN-NEXT: renamable $sgpr4 = nuw S_ADD_I32 killed $sgpr0, 4, implicit-def dead $scc
; GCN-NEXT: renamable $vgpr0 = COPY killed renamable $sgpr4, implicit $exec
; GCN-NEXT: $m0 = S_MOV_B32 -1
@@ -89,7 +89,7 @@ body: |
; GCN-LABEL: name: func_add_constant_to_fi_uniform_SCC_clobber_i32
; GCN: liveins: $sgpr30_sgpr31
; GCN-NEXT: {{ $}}
- ; GCN-NEXT: $sgpr0 = S_LSHR_B32 6, $sgpr32, implicit-def dead $scc
+ ; GCN-NEXT: $sgpr0 = S_LSHR_B32 $sgpr32, 6, implicit-def dead $scc
; GCN-NEXT: renamable $sgpr4 = nuw S_ADD_U32 killed $sgpr0, 4, implicit-def $scc
; GCN-NEXT: renamable $sgpr5 = S_ADDC_U32 $sgpr4, 1234567, implicit-def $scc, implicit $scc
; GCN-NEXT: $sgpr0 = S_LSHR_B32 $sgpr32, 6, implicit-def $scc
|
@@ -2432,10 +2432,13 @@ bool SIRegisterInfo::eliminateFrameIndex(MachineBasicBlock::iterator MI, | |||
if (Offset == 0) { | |||
unsigned OpCode = IsSALU && !LiveSCC ? AMDGPU::S_LSHR_B32 | |||
: AMDGPU::V_LSHRREV_B32_e64; | |||
// XXX - This never happens because of emergency scavenging slot at 0? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this reachable now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I hit this in an OpenMP test after a change to FixSGPRCopies.
…#66882)" Teach the si-fix-sgpr-copies pass to deal with REG_SEQUENCE, PHI or INSERT_SUBREG where the result is an SGPR, but some of the inputs are constants materialized into VGPRs. This may happen in cases where for instance several instructions use an immediate zero and SelectionDAG chooses to put it in a VGPR to satisfy all of them. This however causes the si-fix-sgpr-copies to try to switch the whole chain to VGPR and may lead to illegal VGPR-to-SGPR copies. Rematerializing the constant into an SGPR fixes the issue. This was originally reverted because it triggered an unrelated bug in PEI on one of the OpenMP buildbots. That bug has been fixed in #68299, so it should be ok to try again.
In a special case in frame index elimination (when the offset is 0), we generate either a S_LSHR_B32 or a V_LSHRREV_B32 using the same code. However, they don't expect their operands in the same order - S_LSHR_B32 takes the value to be shifted first and then the shift amount, whereas V_LSHRREV_B32 has the operands reversed (hence the REV in its name). Update the code & tests to take this into account. Also remove an outdated comment (this code is definitely reachable now that non-entry functions no longer have a fixed emergency scavenge slot).