-
Notifications
You must be signed in to change notification settings - Fork 12.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RISCV] MachineScheduler.cpp:1368: updatePressureDiffs assertion failure #68730
Comments
@llvm/issue-subscribers-backend-risc-v Author: Philip Reames (preames)
Reduce test case for a crash observed in an LTO build of spec2017 namd. This is a newish failure, but not new. I waited a bit before bothering to reduce this under the hopes that the triggering change would be reverted.
@fpetrogalli I'm guessing this is related to your large rewrite, but haven't yet confirmed that.
|
Note that this problem does not appear to be in 17.x. The following commit does not crash: |
git bisect led to the following result:
@luke957 FYI. This was a surprise to me. I don't see anything immediately obvious in this change, but am going to take a closer look. This may simple be exposing another issue. |
I reverted 3a6cc52 to get the tree back into a green state through the dev conference. Note that I don't actually believe there's anything wrong with that change at the moment. I think it's just exposing an underlying issue. |
No not really. I notice that at the point of failure |
One extra data point for the investigation: removing the following line from the reduced test case prevents the crash:
You need of course to rename %2 and %3 to something else, for example %x and %y |
For the record, here is the non-crashing test (uncomment the %1, and it will crash):
|
Spent a bunch of time digging into this yesterday, and today. I have a pretty good feel for what's going on, but not really what the "right" fix is. Here is the machine IR, and live intervals for this example immediately before machine-scheduler.
There's something surprising in those live intervals. I'm saying surprising here carefully - I don't know if it's correct or not. There's a segment for %12 from 160B to 272r for sub-reg1. If you look at the definition given on MachineOperand::readsReg, the undef should imply there's no read of the register for the sub-register write. The alternate segment would terminate on 240r. This doesn't occur with -riscv-enable-subreg-liveness=0. This removes the undef from 272, and thus makes the live intervals non-surprising. Looking at From there, we end up with a mismatch between RegisterPressure's expectations and the actual LiveInterval. RegisterPressure determines that %12 is live out of block bb.1. This is very odd as there's no user outside this block. What appears to be happening is that isLiveThroughAt is given the slot index of the current instruction being processed, not the end of the region. When processing 272, we correctly recognize the undef sub-register write as killing the register. However, when processing 240B, we see a "new" use and decide that %12 must be live out. This happens because we ask not if %12 is live at 336d, but at 272B. This decision is the root of our later troubles. Later on, MachineScheduler has the following line:
updatePressureDiffs then looks up the live interval for current position of the BotRPTracker - NOT the same as the current position of the RPTracker used in the backwards walk above - which is the end of the block. LIS correctly says that our register isn't live at 336B and we fail consistency asserts. The question is how to fix this. I see a couple options. Option 1 - Adjust isLiveThroughAt to use the BottomIdx (i.e. the bottom of the region). This results in %12 not being consider lived at the end of the block. However, this code is quite old, so I'm a bit distrustful that the seemingly "obviously" wrong check isn't important for something I'm missing. Option 2 - Bail out in updatePressureDiffs We could simply skip the entry which isn't live at the end of the region. This is the most localized fix, but pretty questionable. Option 3 - Declare the "surprising" live interval result wrong. Argument for this is that the surprise lead to a crash here, there may be other code which is surprised by this result as well. I haven't found the bit of code which formed this range. p.s. When reading this code, ignore all of the ShouldTrackLaneMasks codepaths. As far as I can tell, that configuration is completely unreachable in tree? |
I think you might be misreading the dump here. This is the main range:
The subrange does indeed have a segment that ends at 240r. |
Reduce test case for a crash observed in an LTO build of spec2017 namd. This is a newish failure, but not new. I waited a bit before bothering to reduce this under the hopes that the triggering change would be reverted.
@fpetrogalli I'm guessing this is related to your large rewrite, but haven't yet confirmed that.
The text was updated successfully, but these errors were encountered: