-
Notifications
You must be signed in to change notification settings - Fork 12.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LoopRotate] Set loop back edge weight to not less than exit weight #86496
Conversation
Branch weight from sample-based PGO may be not inaccurate due to sampling. This test tracks such case where updateBranchWeights wraps unsigned.
Branch weight from sample-based PGO may be not inaccurate due to sampling. If the loop body must be executed, then origin loop back edge weight must be not less than exit weight.
@llvm/pr-subscribers-llvm-transforms Author: Haohai Wen (HaohaiWen) ChangesBranch weight from sample-based PGO may be not inaccurate due to Full diff: https://github.com/llvm/llvm-project/pull/86496.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/Utils/LoopRotationUtils.cpp b/llvm/lib/Transforms/Utils/LoopRotationUtils.cpp
index bc671171137199..dd31cbd8376c4b 100644
--- a/llvm/lib/Transforms/Utils/LoopRotationUtils.cpp
+++ b/llvm/lib/Transforms/Utils/LoopRotationUtils.cpp
@@ -347,9 +347,19 @@ static void updateBranchWeights(BranchInst &PreHeaderBI, BranchInst &LoopBI,
// probabilities as if there are only 0-trip and 1-trip cases.
ExitWeight0 = OrigLoopExitWeight - OrigLoopBackedgeWeight;
}
+ } else {
+ if (OrigLoopExitWeight > OrigLoopBackedgeWeight) {
+ LLVM_DEBUG(
+ dbgs() << "WARNING: Bad loop back edge weight. Adjust it from "
+ << OrigLoopBackedgeWeight << " to " << OrigLoopExitWeight
+ << "\n");
+ OrigLoopBackedgeWeight = OrigLoopExitWeight;
+ }
}
+ assert(OrigLoopExitWeight >= ExitWeight0 && "Bad branch weight");
ExitWeight1 = OrigLoopExitWeight - ExitWeight0;
EnterWeight = ExitWeight1;
+ assert(OrigLoopBackedgeWeight >= EnterWeight && "Bad branch weight");
LoopBackWeight = OrigLoopBackedgeWeight - EnterWeight;
} else if (OrigLoopExitWeight == 0) {
if (OrigLoopBackedgeWeight == 0) {
diff --git a/llvm/test/Transforms/LoopRotate/update-branch-weights.ll b/llvm/test/Transforms/LoopRotate/update-branch-weights.ll
index 5d742b64e0adbf..9a1f36ec5ff2be 100644
--- a/llvm/test/Transforms/LoopRotate/update-branch-weights.ll
+++ b/llvm/test/Transforms/LoopRotate/update-branch-weights.ll
@@ -232,6 +232,46 @@ loop_exit:
ret void
}
+; BFI_BEFORE-LABEL: block-frequency-info: func6_inaccurate_branch_weight
+; BFI_BEFORE: - entry: {{.*}} count = 1024
+; BFI_BEFORE: - loop_header: {{.*}} count = 2047
+; BFI_BEFORE: - loop_body: {{.*}} count = 1023
+; BFI_BEFORE: - loop_exit: {{.*}} count = 1024
+
+; BFI_AFTER-LABEL: block-frequency-info: func6_inaccurate_branch_weight
+; BFI_AFTER: - entry: {{.*}} count = 1024
+; BFI_AFTER: - loop_body: {{.*}} count = 1024
+; BFI_AFTER: - loop_exit: {{.*}} count = 1024
+
+; IR-LABEL: define void @func6_inaccurate_branch_weight(
+; IR: entry:
+; IR: br label %loop_body
+; IR: loop_body:
+; IR: br i1 %cmp, label %loop_body, label %loop_exit, !prof [[PROF_FUNC6_0:![0-9]+]]
+; IR: loop_exit:
+; IR: ret void
+
+; Branch weight from sample-based PGO may be inaccurate due to sampling.
+; Count for loop_body in following case should be not less than loop_exit.
+; However this may not hold for Sample-based PGO.
+define void @func6_inaccurate_branch_weight() !prof !3 {
+entry:
+ br label %loop_header
+
+loop_header:
+ %i = phi i32 [0, %entry], [%i_inc, %loop_body]
+ %cmp = icmp slt i32 %i, 2
+ br i1 %cmp, label %loop_body, label %loop_exit, !prof !9
+
+loop_body:
+ store volatile i32 %i, ptr @g, align 4
+ %i_inc = add i32 %i, 1
+ br label %loop_header
+
+loop_exit:
+ ret void
+}
+
!0 = !{!"function_entry_count", i64 1}
!1 = !{!"branch_weights", i32 1000, i32 1}
!2 = !{!"branch_weights", i32 3000, i32 1000}
@@ -241,6 +281,7 @@ loop_exit:
!6 = !{!"branch_weights", i32 0, i32 1}
!7 = !{!"branch_weights", i32 1, i32 0}
!8 = !{!"branch_weights", i32 0, i32 0}
+!9 = !{!"branch_weights", i32 1023, i32 1024}
; IR: [[PROF_FUNC0_0]] = !{!"branch_weights", i32 2000, i32 1000}
; IR: [[PROF_FUNC0_1]] = !{!"branch_weights", i32 999, i32 1}
@@ -251,3 +292,4 @@ loop_exit:
; IR: [[PROF_FUNC3_0]] = !{!"branch_weights", i32 0, i32 1}
; IR: [[PROF_FUNC4_0]] = !{!"branch_weights", i32 1, i32 0}
; IR: [[PROF_FUNC5_0]] = !{!"branch_weights", i32 0, i32 0}
+; IR: [[PROF_FUNC6_0]] = !{!"branch_weights", i32 0, i32 1024}
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
✅ With the latest revision this PR passed the Python code formatter. |
Any more comments? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix. I suggest you wait for @MatzeB to take a look.
LLVM_DEBUG( | ||
dbgs() << "WARNING: Bad loop back edge weight. Adjust it from " | ||
<< OrigLoopBackedgeWeight << " to " << OrigLoopExitWeight | ||
<< "\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need debug print warning for this. There is a lot of guessing and fixing already in the heuristic above, and it'd be inconsistent to just print debug warning for this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The heuristic wasn't based on guessing. I think those three expressions were bullet proofed (@MatzeB , please correct me if I am wrong). It just have a pre request the profile data is accurate. What I'm trying is to fix inaccurate profile data.
// - x == x0 + x1 # counts to "exit" must stay the same.
// - y0 == x - x0 == x1 # how often loop was entered at all.
// - y1 == y - y0 # How often loop was repeated (after first iter.).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The heuristic wasn't based on guessing. I think those three expressions were bullet proofed
I was referring to the existing heuristic. With loop rotate, you cannot recover ground truth counts. And you have to make assumptions like "as if 0-trip count nearly never happens"/ "as if there are only 0-trip and 1-trip cases".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I'm a bit confused looking at the test case..
-
You can have zero-trip loop, in which case
loop_exit
can be reached without going intoloop_body
, why loop back edge needs to be not less than exit weight? -
In principle, profile inference for sample PGO should make input counts consistent. If there's input count inconsistency, we need track down the source of inconsistency. Fixing the input count inconsistency in a transformation isn't the right approach IMO. (The existing heuristic are making guesses for loop rotation which is different and is fine)
I don't follow why this would be a situation that needs fixup. While it is less common, it is perfectly fine for a loop header to exit more often than entering the body and using the loop backedge (when 0-trip counts happen often enough for a given loop to push the average trip count below 1). |
Oh I see, this is for the case where the condition can be constant folded. I guess this makes sense in that case and you want to avoid an underflow? (may be good to mention in a comment/commit message). |
I suspect some amount of noise in the sample / mapping samples to actual blocks etc. is unavoidable and I am not sure this situation (that only becomes "bad" when you consider the fact that the particular loop condition cannot be false in the first iteration) will be considered by smoothing algos. So seems fine to me, but leaving final call to @WenleiHe about whether some inaccuracies like this are considered normal (and hence must be worked around). |
Entering line 351 means HasConditionalPreHeader is false. That means this loop body must be executed at least once. Each execution of loop body must have a back edge to latch. Then it either go back to loop body or exit.
Ideally, It should be fixed when loading profile in SampleProfileLoader. For this case, unless SampleProfilerLoader detecting the loop body must be executed at least (I think it's not easy for SampleProfileLoader), it have no idea whether this profile data is good. |
Ok, thanks for clarification. Please add a comment in the code explaining this case. Also clean up the debug prints. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latest revision LGTM, thanks.
We see regressions of up to 50% (depending on microarchitecture) in llvm-test-suite microbenchmarks, when compiled with PGO+thinlto:
I don't think we consider these important, but I thought you might be interested in this observation. |
Branch weight from sample-based PGO may be not inaccurate due to
sampling. If the loop body must be executed, then origin loop back
edge weight must be not less than exit weight.