mir_transform: implement `#[rustc_force_inline]` #134082

davidtwco · 2024-12-09T15:33:40Z

Adds #[rustc_force_inline] which is similar to always inlining but reports an error if the inlining was not possible.

#[rustc_force_inline] can only be applied to free functions to guarantee that the MIR inliner will be able to resolve calls.
rustc_mir_transform::inline::Inline is refactored into two passes (Inline and ForceInline), sharing the vast majority of the implementation.
- rustc_mir_transform::inline::ForceInline can't be disabled so annotated items are always inlined.
- rustc_mir_transform::inline::ForceInline runs regardless of optimisation level.
#[rustc_force_inline] won't inline unless target features match, as with normal inlining.
MIR validation will ICE if a #[rustc_force_inline] isn't inlined, to guarantee that it will never be codegened independently. As a further guarantee, monomorphisation collection will always decide that #[rustc_force_inline] functions cannot be codegened locally.
Like other intrinsics, #[rustc_force_inline] annotated functions cannot be cast to function pointers.
As with other rustc attrs, this cannot be used by users, just within the compiler and standard library.
This is only implemented within rustc, so should avoid any limitations of LLVM's inlining.

It is intended that this attribute be used with intrinsics that must be inlined for security reasons. For example, pointer authentication intrinsics would allow Rust users to make use of pointer authentication instructions, but if these intrinsic functions were in the binary then they could be used as gadgets with ROP attacks, defeating the point of introducing them. We don't have any intrinsics like this today, but I expect to upstream some once a force inlining mechanism such as this is available.

cc #131687 rust-lang/rfcs#3711 - this approach should resolve the concerns from these previous attempts

r? @saethlin

rustbot · 2024-12-09T15:33:51Z

Some changes occurred in compiler/rustc_codegen_gcc

cc @antoyo, @GuillaumeGomez

Some changes occurred to MIR optimizations

cc @rust-lang/wg-mir-opt

saethlin · 2024-12-09T23:27:12Z

This should have zero perf impact because the attribute isn't used, but let's just double check.
@bors try @rust-timer queue

bors · 2024-12-09T23:28:24Z

⌛ Trying commit ef59df9 with merge 4ce6fc9...

mir_transform: implement `#[rustc_force_inline]` Adds `#[rustc_force_inline]` which is similar to always inlining but reports an error if the inlining was not possible. - `#[rustc_force_inline]` can only be applied to free functions to guarantee that the MIR inliner will be able to resolve calls. - `rustc_mir_transform::inline::Inline` is refactored into two passes (`Inline` and `ForceInline`), sharing the vast majority of the implementation. - `rustc_mir_transform::inline::ForceInline` can't be disabled so annotated items are always inlined. - `rustc_mir_transform::inline::ForceInline` runs regardless of optimisation level. - `#[rustc_force_inline]` won't inline unless target features match, as with normal inlining. - MIR validation will ICE if a `#[rustc_force_inline]` isn't inlined, to guarantee that it will never be codegened independently. As a further guarantee, monomorphisation collection will always decide that `#[rustc_force_inline]` functions cannot be codegened locally. - Like other intrinsics, `#[rustc_force_inline]` annotated functions cannot be cast to function pointers. - As with other rustc attrs, this cannot be used by users, just within the compiler and standard library. - This is only implemented within rustc, so should avoid any limitations of LLVM's inlining. It is intended that this attribute be used with intrinsics that must be inlined for security reasons. For example, pointer authentication intrinsics would allow Rust users to make use of pointer authentication instructions, but if these intrinsic functions were in the binary then they could be used as gadgets with ROP attacks, defeating the point of introducing them. We don't have any intrinsics like this today, but I expect to upstream some once a force inlining mechanism such as this is available. cc rust-lang#131687 rust-lang/rfcs#3711 - this approach should resolve the concerns from these previous attempts r? `@saethlin`

bors · 2024-12-10T01:13:00Z

☀️ Try build successful - checks-actions
Build commit: 4ce6fc9 (4ce6fc94291fc8b5e5d6a55c12504bb4b99a139b)

rust-timer · 2024-12-10T03:09:14Z

Finished benchmarking commit (4ce6fc9): comparison URL.

Overall result: ❌ regressions - please read the text below

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.

	mean	range	count
Regressions ❌ (primary)	0.7%	[0.2%, 4.0%]	49
Regressions ❌ (secondary)	1.0%	[0.2%, 2.6%]	30
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-0.4%	[-0.4%, -0.3%]	2
All ❌✅ (primary)	0.7%	[0.2%, 4.0%]	49

Max RSS (memory usage)

Results (primary 0.5%, secondary -0.8%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	1.6%	[1.2%, 2.0%]	5
Regressions ❌ (secondary)	4.9%	[4.9%, 4.9%]	1
Improvements ✅ (primary)	-2.4%	[-2.5%, -2.3%]	2
Improvements ✅ (secondary)	-3.6%	[-5.5%, -1.7%]	2
All ❌✅ (primary)	0.5%	[-2.5%, 2.0%]	7

Cycles

Results (primary 1.4%, secondary 2.6%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	1.9%	[0.8%, 3.3%]	7
Regressions ❌ (secondary)	2.6%	[2.1%, 2.8%]	5
Improvements ✅ (primary)	-2.6%	[-2.6%, -2.6%]	1
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	1.4%	[-2.6%, 3.3%]	8

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 767.262s -> 765.996s (-0.17%)
Artifact size: 330.86 MiB -> 330.90 MiB (0.01%)

davidtwco · 2024-12-10T08:59:36Z

As far as I can tell, perf impact is just from the overhead from the force inline pass running to check if there are any annotated callees.

compiler/rustc_codegen_ssa/src/codegen_attrs.rs

compiler/rustc_mir_transform/src/inline.rs

compiler/rustc_mir_transform/src/validate.rs

compiler/rustc_mir_transform/src/lib.rs

tmiasko · 2024-12-11T11:27:52Z

According to pull request description the motivating use case are intrinsics that should be generated inline. The most obvious implementation strategy would be to do exactly that, in the contrast to the usual practice of creating a function wrapper that contains a call to an intrinsic. Can you elaborate on the advantages of the proposed implementation strategy?

Any public function marked with rustc_force_inline implicitly defines conditions under which it can be called without generating an error. Those conditions depend on implementation details of MIR inliner, the actual definition of functions marked with rustc_force_inline, the definitions of other functions called transitively, the optimization level used when building crate that defines those functions (consider consequences of optimizing function marked with rustc_force_inline). All those aspect inevitably become part of a public contract.

davidtwco · 2024-12-11T15:46:54Z

Can you elaborate on the advantages of the proposed implementation strategy?

In this particular case, the "intrinsics" I have in mind for this aren't just wrapper functions.

They use a bunch of declarative macros to generate inline asm (using inline asm to avoid identical calls being optimised away, values being spilled, and to have more codegen guarantees, necessary to maintain the security properties).

I looked into implementing these as MIR intrinsics prior to revisiting forced inlining, but it looked like it would end up being quite complex. These are also very architecture-specific and it felt strange to be implementing something like that into the compiler rather than being able to implement it normally.

I also considered using a macro rather than a function, but having a proper function signature with const generics was just a lot better - it also allow things like assert_instr be used for testing, etc.

Those conditions depend on implementation details of MIR inliner, the actual definition of functions marked with rustc_force_inline, the definitions of other functions called transitively, the optimization level used when building crate that defines those functions (consider consequences of optimizing function marked with rustc_force_inline). All those aspect inevitably become part of a public contract.

These are all limitations - I think a lot of these can be mitigated by being very careful in what we apply this attribute to.

saethlin · 2024-12-12T05:22:47Z

I think a 4% compile time hit on real code is much too high for the implementation of a compiler feature that's not even being used.

So I'd like to see that number brought down quite a lot. I think there are a number of ways to improve the compile time hit, at a glance most of the overhead comes from the resolve_instance_raw query, which I think we can just not do. Since we should be rejecting any attempt to put #[rustc_force_inline] on a function with a generic parameter, it should suffice to just check codegen_fn_attrs... right?

davidtwco · 2024-12-12T12:49:12Z

@bors try @rust-timer queue

bors · 2024-12-12T12:50:23Z

⌛ Trying commit 08f365c with merge 3487c46...

mir_transform: implement `#[rustc_force_inline]` Adds `#[rustc_force_inline]` which is similar to always inlining but reports an error if the inlining was not possible. - `#[rustc_force_inline]` can only be applied to free functions to guarantee that the MIR inliner will be able to resolve calls. - `rustc_mir_transform::inline::Inline` is refactored into two passes (`Inline` and `ForceInline`), sharing the vast majority of the implementation. - `rustc_mir_transform::inline::ForceInline` can't be disabled so annotated items are always inlined. - `rustc_mir_transform::inline::ForceInline` runs regardless of optimisation level. - `#[rustc_force_inline]` won't inline unless target features match, as with normal inlining. - MIR validation will ICE if a `#[rustc_force_inline]` isn't inlined, to guarantee that it will never be codegened independently. As a further guarantee, monomorphisation collection will always decide that `#[rustc_force_inline]` functions cannot be codegened locally. - Like other intrinsics, `#[rustc_force_inline]` annotated functions cannot be cast to function pointers. - As with other rustc attrs, this cannot be used by users, just within the compiler and standard library. - This is only implemented within rustc, so should avoid any limitations of LLVM's inlining. It is intended that this attribute be used with intrinsics that must be inlined for security reasons. For example, pointer authentication intrinsics would allow Rust users to make use of pointer authentication instructions, but if these intrinsic functions were in the binary then they could be used as gadgets with ROP attacks, defeating the point of introducing them. We don't have any intrinsics like this today, but I expect to upstream some once a force inlining mechanism such as this is available. cc rust-lang#131687 rust-lang/rfcs#3711 - this approach should resolve the concerns from these previous attempts r? `@saethlin`

bors · 2024-12-12T14:34:57Z

☀️ Try build successful - checks-actions
Build commit: 3487c46 (3487c46568df1d65fba8d2d59ad6c9318908ed04)

Adds `#[rustc_force_inline]` which is similar to always inlining but reports an error if the inlining was not possible, and which always attempts to inline annotated items, regardless of optimisation levels. It can only be applied to free functions to guarantee that the MIR inliner will be able to resolve calls.

davidtwco · 2025-01-10T18:41:48Z

If you misspell EMIT_MIR as EMIT_MIT then it doesn't work as it turns out.

@bors r=saethlin

bors · 2025-01-10T18:41:51Z

📌 Commit cc9a9ec has been approved by saethlin

It is now in the queue for this repository.

bors · 2025-01-10T19:04:29Z

⌛ Testing commit cc9a9ec with merge b1a7dfb...

bors · 2025-01-10T22:07:44Z

☀️ Test successful - checks-actions
Approved by: saethlin
Pushing b1a7dfb to master...

rust-timer · 2025-01-10T23:24:10Z

Finished benchmarking commit (b1a7dfb): comparison URL.

Overall result: ❌ regressions - please read the text below

Our benchmarks found a performance regression caused by this PR.
This might be an actual regression, but it can also be just noise.

Next Steps:

If the regression was expected or you think it can be justified,
please write a comment with sufficient written justification, and add
@rustbot label: +perf-regression-triaged to it, to mark the regression as triaged.
If you think that you know of a way to resolve the regression, try to create
a new PR with a fix for the regression.
If you do not understand the regression or you think that it is just noise,
you can ask the @rust-lang/wg-compiler-performance working group for help (members of this group
were already notified of this PR).

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

This is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.

	mean	range	count
Regressions ❌ (primary)	0.3%	[0.2%, 0.3%]	10
Regressions ❌ (secondary)	0.3%	[0.2%, 0.3%]	15
Improvements ✅ (primary)	-0.1%	[-0.1%, -0.1%]	1
Improvements ✅ (secondary)	-0.3%	[-0.4%, -0.3%]	2
All ❌✅ (primary)	0.2%	[-0.1%, 0.3%]	11

Max RSS (memory usage)

Results (primary 1.9%, secondary 3.9%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	1.9%	[0.9%, 3.2%]	5
Regressions ❌ (secondary)	3.9%	[2.3%, 5.5%]	2
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	1.9%	[0.9%, 3.2%]	5

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 763.192s -> 764.797s (0.21%)
Artifact size: 325.76 MiB -> 325.90 MiB (0.04%)

rylev · 2025-01-14T09:35:13Z

@davidtwco looks like some regressions snuck back in since the last run - I'm unsure if something happened in a rebase to reintroduce them. The actual regressions are much smaller than the original perf run, and I don't think they're big enough to demand an investigation so I'll mark this as triaged.

@rustbot label: +perf-regression-triaged.

rustbot assigned saethlin Dec 9, 2024

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Dec 9, 2024

This comment was marked as resolved.

Sign in to view

davidtwco force-pushed the forced-inlining branch from 3ddee57 to ef59df9 Compare December 9, 2024 16:20

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Dec 9, 2024

This comment has been minimized.

Sign in to view

rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Dec 10, 2024

tmiasko reviewed Dec 10, 2024

View reviewed changes

davidtwco force-pushed the forced-inlining branch from ef59df9 to 7a5ff95 Compare December 10, 2024 13:52

davidtwco force-pushed the forced-inlining branch from 7a5ff95 to 08f365c Compare December 12, 2024 12:48

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Dec 12, 2024

This comment has been minimized.

Sign in to view

davidtwco added 10 commits January 10, 2025 18:37

codegen_attrs: force inlining takes precedence

02d423c

inline: force inlining shims

4507939

inline: re-introduce some callee body checks

e4bae91

inline: remove unnecessary promoted check

90066c0

validator: move force inline check

5f316f5

inline: move should inline check

dbec6be

don't collect #[rustc_force_inline] in eager mode

3169a44

clarify target_feature + forced inlining

ce602ac

mir_build: check annotated functions w/out callers

cc9a9ec

davidtwco force-pushed the forced-inlining branch from 6bef877 to cc9a9ec Compare January 10, 2025 18:41

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 10, 2025

bors added the merged-by-bors This PR was explicitly merged by bors. label Jan 10, 2025

bors merged commit b1a7dfb into rust-lang:master Jan 10, 2025
7 checks passed

rustbot added this to the 1.86.0 milestone Jan 10, 2025

davidtwco deleted the forced-inlining branch January 10, 2025 22:12

This was referenced Jan 10, 2025

Stabilize target_feature_11 #134090

Open

Treat safe target_feature functions as unsafe by default [less invasive variant] #134353

Merged

Rename BitSet to DenseBitSet #135205

Merged

rustbot added the perf-regression Performance regression. label Jan 10, 2025

rustbot added the perf-regression-triaged The performance regression has been triaged. label Jan 14, 2025

antoyo mentioned this pull request Jan 15, 2025

Fix #[inline(always)] for recursive functions rust-lang/rustc_codegen_gcc#599

Open

clubby789 mentioned this pull request Jan 22, 2025

Add #[optimize(none)] #128657

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mir_transform: implement `#[rustc_force_inline]` #134082

mir_transform: implement `#[rustc_force_inline]` #134082

davidtwco commented Dec 9, 2024 •

edited

Loading

rustbot commented Dec 9, 2024

This comment was marked as resolved.

saethlin commented Dec 9, 2024

This comment has been minimized.

bors commented Dec 9, 2024

bors commented Dec 10, 2024

This comment has been minimized.

rust-timer commented Dec 10, 2024

davidtwco commented Dec 10, 2024

tmiasko commented Dec 11, 2024

davidtwco commented Dec 11, 2024

saethlin commented Dec 12, 2024

davidtwco commented Dec 12, 2024

This comment has been minimized.

bors commented Dec 12, 2024

bors commented Dec 12, 2024

This comment has been minimized.

davidtwco commented Jan 10, 2025

bors commented Jan 10, 2025

bors commented Jan 10, 2025

bors commented Jan 10, 2025

rust-timer commented Jan 10, 2025

rylev commented Jan 14, 2025

mir_transform: implement #[rustc_force_inline] #134082

mir_transform: implement #[rustc_force_inline] #134082

Conversation

davidtwco commented Dec 9, 2024 • edited Loading

rustbot commented Dec 9, 2024

This comment was marked as resolved.

saethlin commented Dec 9, 2024

This comment has been minimized.

bors commented Dec 9, 2024

bors commented Dec 10, 2024

This comment has been minimized.

rust-timer commented Dec 10, 2024

Overall result: ❌ regressions - please read the text below

davidtwco commented Dec 10, 2024

tmiasko commented Dec 11, 2024

davidtwco commented Dec 11, 2024

saethlin commented Dec 12, 2024

davidtwco commented Dec 12, 2024

This comment has been minimized.

bors commented Dec 12, 2024

bors commented Dec 12, 2024

This comment has been minimized.

davidtwco commented Jan 10, 2025

bors commented Jan 10, 2025

bors commented Jan 10, 2025

bors commented Jan 10, 2025

rust-timer commented Jan 10, 2025

Overall result: ❌ regressions - please read the text below

rylev commented Jan 14, 2025

mir_transform: implement `#[rustc_force_inline]` #134082

mir_transform: implement `#[rustc_force_inline]` #134082

davidtwco commented Dec 9, 2024 •

edited

Loading