-
Notifications
You must be signed in to change notification settings - Fork 12.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AMDGPU] When allocating VGPRs, VGPR spills are not part of the prologue #109439
Conversation
PRs llvm#69924 and llvm#72140 modified SIInstrInfo::isBasicBlockPrologue to skip over EXEC modifications and spills when allocating VGPRs. But treating VGPR spills as part of the prologue can confuse the register allocator as in llvm#109294, so restrict it to SGPR spills, which were inserted during SGPR allocation which is done in an earlier pass. Fixes: llvm#109294 Fixes: SWDEV-485841
@llvm/pr-subscribers-backend-amdgpu Author: Jay Foad (jayfoad) ChangesPRs #69924 and #72140 modified SIInstrInfo::isBasicBlockPrologue to skip Fixes: #109294 Full diff: https://github.com/llvm/llvm-project/pull/109439.diff 1 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
index 97e8b08270d615..509c5c56e15f57 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
@@ -8884,8 +8884,9 @@ bool SIInstrInfo::isBasicBlockPrologue(const MachineInstr &MI,
// FIXME: Copies inserted in the block prolog for live-range split should also
// be included.
return IsNullOrVectorRegister &&
- (isSpill(Opcode) || (!MI.isTerminator() && Opcode != AMDGPU::COPY &&
- MI.modifiesRegister(AMDGPU::EXEC, &RI)));
+ (isSGPRSpill(Opcode) ||
+ (!MI.isTerminator() && Opcode != AMDGPU::COPY &&
+ MI.modifiesRegister(AMDGPU::EXEC, &RI)));
}
MachineInstrBuilder
|
I don't know how to write a small test for this. The test case from #109294 is about 600K. |
You can try llvm-reducing mir test using a strict -stress-regalloc value, it sometimes works |
@alex-t might have a smaller test case for this. He had the same observation and proposed the same patch. |
Found #109514 while reducing this. Also ran into another edge case failure in the coalescer |
@@ -8884,8 +8884,9 @@ bool SIInstrInfo::isBasicBlockPrologue(const MachineInstr &MI, | |||
// FIXME: Copies inserted in the block prolog for live-range split should also |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This FIXME isn't relevant anymore. We are including only the SGPR spills to the prolog.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
reproducer.zip I tried to reduce it further. But couldn't. Tried even -stress-regalloc. But nothing worked out to reduce it further. |
There's a simpler reproducer in #109678 I'm working on reducing |
As I mentioned in the other PR, I think we also need to include wwm register reload. |
The WWM reloads will happen for all lanes with the manipulated exec mask. Why do you think they should be included as well? |
I have tested exactly the same change myself as an alternative to the #108596 |
I think it could be possible that the sgpr_input of the
My point is the wwm_vgpr_reload is part of the block prologue, right? |
Yes. In such cases, the wwm-spill-restore should precede the We could conditionally add the wwm-spill-restore to the block begin when there is already an instruction in the bb-prolog that uses this restored register. The |
The blender hang on my Navi21 was due to the old/incompatible runtime. So, we can go with this fix. |
The isSGPRSpill is still too large a hummer. SGPR spills have nothing to do with the prologue. We only need SGPR reloads that define registers used by other prologue instructions. I tried a more selective algorithm but it caused a segmentation fault in the blender. |
What do you think about committing this patch as a small step in the right direction? We can refine it more later. |
The isSGPRSpill is still too large a hummer. SGPR spills have nothing to do with the prologue. We only need SGPR reloads that define registers used by other prologue instructions. I tried a more selective algorithm but it caused a segmentation fault in the blender.
I agree with that. I would have enough time to sort out what is wrong with the more selective approach provided this is committed and no more app crashes happen. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think this prolog concept is a bit broken. This is also really tough to get a test out of, but I'm still trying (I'm hoping #110229 helps reduce it)
rever: hangs ocl tests at -O0 735a5f6 [AMDGPU] When allocating VGPRs, VGPR spills are not part of the prologue (llvm#109439) Change-Id: If6452f3c5943af849b606cbe6f1262597c5e0f2f
…gue (llvm#109439) PRs llvm#69924 and llvm#72140 modified SIInstrInfo::isBasicBlockPrologue to skip over EXEC modifications and spills when allocating VGPRs. But treating VGPR spills as part of the prologue can confuse the register allocator as in llvm#109294, so restrict it to SGPR spills, which were inserted during SGPR allocation which is done in an earlier pass. Fixes: llvm#109294 Fixes: SWDEV-485841
…not part of the prologue (llvm#109439)" This is supposed to be fixed by llvm@6636f32. More context: https://ontrack-internal.amd.com/browse/SWDEV-489621 Change-Id: I49ec81989eec8fda897dffb1bd7b4dbb76a98c46
#111496 addresses the issue caused by the scenario when WWM reload must be inserted before the block prologue because they reload operands for the prologue instructions. This breaks the prologue and consequent VGPR reloads are inserted before the EXEC restoring. |
Cherry-pick llvm#109439 Change-Id: I555ef6c0a2ffe806aa40c0e0c639fdb12c45c27c
…gue (llvm#109439) PRs llvm#69924 and llvm#72140 modified SIInstrInfo::isBasicBlockPrologue to skip over EXEC modifications and spills when allocating VGPRs. But treating VGPR spills as part of the prologue can confuse the register allocator as in llvm#109294, so restrict it to SGPR spills, which were inserted during SGPR allocation which is done in an earlier pass. Fixes: llvm#109294 Fixes: SWDEV-485841 Change-Id: Ice1ab75074aa380c13e07c452a2854f78ff37ce7
PRs #69924 and #72140 modified SIInstrInfo::isBasicBlockPrologue to skip
over EXEC modifications and spills when allocating VGPRs. But treating
VGPR spills as part of the prologue can confuse the register allocator
as in #109294, so restrict it to SGPR spills, which were inserted during
SGPR allocation which is done in an earlier pass.
Fixes: #109294
Fixes: SWDEV-485841