-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Morph layout_raw
query into layout_of
.
#88308
Conversation
@bors try @rust-timer queue |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit edb4b2d with merge ef5a2f51e873d28ef516ea93637b800575586660... |
☀️ Try build successful - checks-actions |
Queued ef5a2f51e873d28ef516ea93637b800575586660 with parent b5fe3bc, future comparison URL. |
Finished benchmarking try commit (ef5a2f51e873d28ef516ea93637b800575586660): comparison url. Summary: This change led to very large relevant improvements 🎉 in compiler performance.
If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR led to changes in compiler perf. @bors rollup=never |
fn layout_of(&self, ty: Ty<'tcx>) -> Self::TyAndLayout { | ||
let param_env = self.param_env.with_reveal_all_normalized(self.tcx); | ||
let ty = self.tcx.normalize_erasing_regions(param_env, ty); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the reasoning behind having this outside the query is that it increases the amount of cache hits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but you're doing 2 cache hits (normalize + layout) instead of 1 (just layout).
This PR only penalizes the initial cache miss (layout gets cached for both normalized and unnormalized versions of the type), but benefits all further hits (since it reduces them to a single lookup).
@bors r+ |
📌 Commit edb4b2d has been approved by |
☀️ Test successful - checks-actions |
While the instruction counts look good this has regressed the wall times of await-call-tree, deeply-nested-closures and deeply-nested-async by up to 30% The timeline graphs show that this isn't noise. They're synthetic benchmarks so this might be acceptable. |
Both instructions and cycles are significantly down, which would mean that the perf machines were running at a low frequency for the test or that there are significantly more stalls or we're getting de-scheduled more often (due to e.g. holding a lock). Sadly we do not track context switches in perf :( |
(EDIT: @nagisa posted a more helpful comment after I had written mine, below, but before posting it, keep that in mind)
Wait, but a lot of these are... Looking at the query data for Looking at the diff again, the only cascade effect I could think of would be the size of the |
I believe the wall-time regression was caused by the CPU governor on the perf machine, as mentioned here. For more details see #83698 (comment), a (big) comment on the PR that was merged (and perf-tested) immediately before this one, and thus shared in the blame for this regression. |
Before this PR,
LayoutCx::layout_of
wrapped thelayout_raw
query, to:record_layout_for_printing
, for-Zprint-type-sizes
Moving those two responsibilities into the query may reduce overhead (due to cached calls skipping those steps), but I want to do a perf run to know.
One of the changes I had to make was changing the return type of the query, to be able to both get out the type produced by normalizing inside the query and to match the signature of the old
TyCtxt::layout_of
. This change may be worse, perf-wise, so that's another reason I want to check.r? @nagisa cc @oli-obk