-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: Remove Compiler::fgConnectFallThrough and BBJ_COND fix-ups in loop phases #97191
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsNext step for #93020. This removes the implicit fallthrough invariant for
|
@@ -4988,7 +4987,7 @@ bool FlowGraphNaturalLoop::AnalyzeIteration(NaturalLoopIterInfo* info) | |||
return false; | |||
} | |||
|
|||
assert(ContainsBlock(cond->GetTrueTarget()) != ContainsBlock(cond->GetFalseTarget())); | |||
// assert(ContainsBlock(cond->GetTrueTarget()) != ContainsBlock(cond->GetFalseTarget())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a correctness issue, but I have a local change that should make things work out even with your change here. Hopefully I'll get that in early next week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! No rush on that
cc @dotnet/jit-contrib. I'm going to hold off on requesting reviews until I merge in the fix Jakob mentioned above. Diffs are pretty noisy -- lots of size improvements and regressions. I suspect we are encountering more false branches that don't fall through by the time we get to block reordering, creating more opportunities for behavioral diffs. |
/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress, runtime-coreclr outerloop, Fuzzlyn |
Azure Pipelines successfully started running 4 pipeline(s). |
/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress, runtime-coreclr outerloop, Fuzzlyn |
Azure Pipelines successfully started running 4 pipeline(s). |
/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress, runtime-coreclr outerloop, Fuzzlyn |
Azure Pipelines successfully started running 4 pipeline(s). |
Diff results for #97191Assembly diffsAssembly diffs for linux/arm64 ran on windows/x64Diffs are based on 2,501,660 contexts (1,003,806 MinOpts, 1,497,854 FullOpts). MISSED contexts: base: 3,546 (0.14%), diff: 3,557 (0.14%) Overall (-126,148 bytes)
FullOpts (-126,148 bytes)
Assembly diffs for linux/x64 ran on windows/x64Diffs are based on 2,595,038 contexts (1,052,329 MinOpts, 1,542,709 FullOpts). MISSED contexts: base: 3,596 (0.14%), diff: 3,597 (0.14%) Overall (-208,870 bytes)
FullOpts (-208,870 bytes)
Assembly diffs for osx/arm64 ran on windows/x64Diffs are based on 2,263,021 contexts (930,876 MinOpts, 1,332,145 FullOpts). MISSED contexts: base: 2,925 (0.13%), diff: 2,944 (0.13%) Overall (-97,340 bytes)
FullOpts (-97,340 bytes)
Assembly diffs for windows/arm64 ran on windows/x64Diffs are based on 2,318,299 contexts (931,543 MinOpts, 1,386,756 FullOpts). MISSED contexts: base: 2,587 (0.11%), diff: 2,595 (0.11%) Overall (-117,288 bytes)
FullOpts (-117,288 bytes)
Assembly diffs for windows/x64 ran on windows/x64Diffs are based on 2,492,951 contexts (983,689 MinOpts, 1,509,262 FullOpts). MISSED contexts: base: 3,859 (0.15%), diff: 3,860 (0.15%) Overall (-150,214 bytes)
FullOpts (-150,214 bytes)
Details here Assembly diffs for linux/arm ran on windows/x86Diffs are based on 2,238,225 contexts (827,812 MinOpts, 1,410,413 FullOpts). MISSED contexts: base: 74,052 (3.20%), diff: 74,053 (3.20%) Overall (-72,938 bytes)
FullOpts (-72,938 bytes)
Assembly diffs for windows/x86 ran on windows/x86Diffs are based on 2,297,926 contexts (841,817 MinOpts, 1,456,109 FullOpts). MISSED contexts: base: 2,090 (0.09%), diff: 3,444 (0.15%) Overall (-179,094 bytes)
FullOpts (-179,094 bytes)
Details here Throughput diffsThroughput diffs for linux/arm64 ran on linux/x64Overall (-0.39% to -0.12%)
FullOpts (-0.52% to -0.16%)
Throughput diffs for linux/x64 ran on linux/x64Overall (-0.42% to -0.14%)
FullOpts (-0.54% to -0.19%)
Details here Throughput diffs for linux/arm ran on windows/x86Overall (-0.49% to -0.07%)
MinOpts (+0.02% to +0.06%)
FullOpts (-0.49% to -0.07%)
Throughput diffs for windows/x86 ran on windows/x86Overall (-0.31% to -0.09%)
MinOpts (+0.03% to +0.10%)
FullOpts (-0.33% to -0.09%)
Details here Throughput diffs for linux/arm64 ran on windows/x64Overall (-0.35% to -0.11%)
MinOpts (-0.01% to -0.00%)
FullOpts (-0.48% to -0.13%)
Throughput diffs for linux/x64 ran on windows/x64Overall (-0.37% to -0.13%)
MinOpts (-0.01% to +0.00%)
FullOpts (-0.48% to -0.15%)
Throughput diffs for osx/arm64 ran on windows/x64Overall (-0.36% to -0.11%)
MinOpts (-0.01% to +0.00%)
FullOpts (-0.47% to -0.13%)
Throughput diffs for windows/arm64 ran on windows/x64Overall (-0.38% to -0.12%)
MinOpts (-0.01% to +0.00%)
FullOpts (-0.49% to -0.13%)
Throughput diffs for windows/x64 ran on windows/x64Overall (-0.45% to -0.14%)
MinOpts (-0.01% to -0.00%)
FullOpts (-0.51% to -0.15%)
Details here |
Diff results for #97191Throughput diffsThroughput diffs for linux/arm64 ran on linux/x64Overall (-0.34% to -0.10%)
FullOpts (-0.45% to -0.12%)
Throughput diffs for linux/x64 ran on linux/x64Overall (-0.35% to -0.12%)
FullOpts (-0.46% to -0.14%)
Details here |
Diff results for #97191Throughput diffsThroughput diffs for linux/arm64 ran on linux/x64Overall (-0.34% to -0.10%)
FullOpts (-0.45% to -0.12%)
Throughput diffs for linux/x64 ran on linux/x64Overall (-0.35% to -0.12%)
FullOpts (-0.46% to -0.14%)
Details here |
// Block layout should not change at this point. | ||
// Do one last pass to reverse any conditions that might yield more fall-through branches. | ||
if (opts.OptimizationEnabled()) | ||
{ | ||
for (BasicBlock* const block : Blocks()) | ||
{ | ||
if (block->KindIs(BBJ_COND) && block->CanRemoveJumpToTarget(block->GetTrueTarget(), this)) | ||
{ | ||
// Reverse the jump condition | ||
GenTree* test = block->lastNode(); | ||
assert(test->OperIsConditionalJump()); | ||
GenTree* cond = gtReverseCond(test); | ||
assert(cond == test); // Ensure `gtReverseCond` did not create a new node. | ||
|
||
BasicBlock* newFalseTarget = block->GetTrueTarget(); | ||
BasicBlock* newTrueTarget = block->GetFalseTarget(); | ||
block->SetTrueTarget(newTrueTarget); | ||
block->SetFalseTarget(newFalseTarget); | ||
assert(block->CanRemoveJumpToTarget(newFalseTarget, this)); | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a proper phase ensuring post-phase checks run, and ensuring the changed IR nodes are dumped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was planning on removing this change from this PR, and possibly introducing it if/when we remove the condition reversals done in earlier phases. I'll add it as a new phase then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good... sorry about the draft review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries! I appreciate the feedback -- this should be ready soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be ready soon.
Famous last words... closing this in favor of #97488.
Part of #93020. This change adds back in most of #97191 and #96609, except for any significant changes to the flowgraph optimization passes to reduce churn. With this change, the false target of a BBJ_COND can diverge from the next block until Compiler::optOptimizeLayout, in which we reestablish implicit fall-through with fgConnectFallThrough to preserve the existing block reordering behavior. Note that the deferral of these fall-through fixups causes diffs in the edge weights, which can alter the behavior of fgReorderBlocks, hence some of the size regressions
Next step for #93020. This removes the implicit fallthrough invariant for
bbFalseTarget
(as well as the corresponding logic for maintaining that invariant) from the loop representation. At this point, I noticed we can get rid ofCompiler::fgConnectFallThrough
, so I thought I'd include that in this change. There's still some cruft left in earlier phases for fixing up fallthrough into the false target withBBJ_ALWAYS
blocks, but this PR is big enough -- I'll save the remaining cleanup for a follow-up PR.