Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mir_transform: implement #[rustc_force_inline] #134082

Merged
merged 10 commits into from
Jan 10, 2025

Conversation

davidtwco
Copy link
Member

@davidtwco davidtwco commented Dec 9, 2024

Adds #[rustc_force_inline] which is similar to always inlining but reports an error if the inlining was not possible.

  • #[rustc_force_inline] can only be applied to free functions to guarantee that the MIR inliner will be able to resolve calls.
  • rustc_mir_transform::inline::Inline is refactored into two passes (Inline and ForceInline), sharing the vast majority of the implementation.
    • rustc_mir_transform::inline::ForceInline can't be disabled so annotated items are always inlined.
    • rustc_mir_transform::inline::ForceInline runs regardless of optimisation level.
  • #[rustc_force_inline] won't inline unless target features match, as with normal inlining.
  • MIR validation will ICE if a #[rustc_force_inline] isn't inlined, to guarantee that it will never be codegened independently. As a further guarantee, monomorphisation collection will always decide that #[rustc_force_inline] functions cannot be codegened locally.
  • Like other intrinsics, #[rustc_force_inline] annotated functions cannot be cast to function pointers.
  • As with other rustc attrs, this cannot be used by users, just within the compiler and standard library.
  • This is only implemented within rustc, so should avoid any limitations of LLVM's inlining.

It is intended that this attribute be used with intrinsics that must be inlined for security reasons. For example, pointer authentication intrinsics would allow Rust users to make use of pointer authentication instructions, but if these intrinsic functions were in the binary then they could be used as gadgets with ROP attacks, defeating the point of introducing them. We don't have any intrinsics like this today, but I expect to upstream some once a force inlining mechanism such as this is available.

cc #131687 rust-lang/rfcs#3711 - this approach should resolve the concerns from these previous attempts

r? @saethlin

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Dec 9, 2024
@rustbot
Copy link
Collaborator

rustbot commented Dec 9, 2024

Some changes occurred in compiler/rustc_codegen_gcc

cc @antoyo, @GuillaumeGomez

Some changes occurred to MIR optimizations

cc @rust-lang/wg-mir-opt

@rust-log-analyzer

This comment was marked as resolved.

@saethlin
Copy link
Member

saethlin commented Dec 9, 2024

This should have zero perf impact because the attribute isn't used, but let's just double check.
@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Dec 9, 2024
@bors
Copy link
Contributor

bors commented Dec 9, 2024

⌛ Trying commit ef59df9 with merge 4ce6fc9...

bors added a commit to rust-lang-ci/rust that referenced this pull request Dec 9, 2024
mir_transform: implement `#[rustc_force_inline]`

Adds `#[rustc_force_inline]` which is similar to always inlining but reports an error if the inlining was not possible.

- `#[rustc_force_inline]` can only be applied to free functions to guarantee that the MIR inliner will be able to resolve calls.
- `rustc_mir_transform::inline::Inline` is refactored into two passes (`Inline` and `ForceInline`), sharing the vast majority of the implementation.
  - `rustc_mir_transform::inline::ForceInline` can't be disabled so annotated items are always inlined.
  - `rustc_mir_transform::inline::ForceInline` runs regardless of optimisation level.
- `#[rustc_force_inline]` won't inline unless target features match, as with normal inlining.
- MIR validation will ICE if a `#[rustc_force_inline]` isn't inlined, to guarantee that it will never be codegened independently. As a further guarantee, monomorphisation collection will always decide that `#[rustc_force_inline]` functions cannot be codegened locally.
- Like other intrinsics, `#[rustc_force_inline]` annotated functions cannot be cast to function pointers.
- As with other rustc attrs, this cannot be used by users, just within the compiler and standard library.
- This is only implemented within rustc, so should avoid any limitations of LLVM's inlining.

It is intended that this attribute be used with intrinsics that must be inlined for security reasons. For example, pointer authentication intrinsics would allow Rust users to make use of pointer authentication instructions, but if these intrinsic functions were in the binary then they could be used as gadgets with ROP attacks, defeating the point of introducing them. We don't have any intrinsics like this today, but I expect to upstream some once a force inlining mechanism such as this is available.

cc rust-lang#131687 rust-lang/rfcs#3711 - this approach should resolve the concerns from these previous attempts

r? `@saethlin`
@bors
Copy link
Contributor

bors commented Dec 10, 2024

☀️ Try build successful - checks-actions
Build commit: 4ce6fc9 (4ce6fc94291fc8b5e5d6a55c12504bb4b99a139b)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (4ce6fc9): comparison URL.

Overall result: ❌ regressions - please read the text below

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.

mean range count
Regressions ❌
(primary)
0.7% [0.2%, 4.0%] 49
Regressions ❌
(secondary)
1.0% [0.2%, 2.6%] 30
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-0.4% [-0.4%, -0.3%] 2
All ❌✅ (primary) 0.7% [0.2%, 4.0%] 49

Max RSS (memory usage)

Results (primary 0.5%, secondary -0.8%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.6% [1.2%, 2.0%] 5
Regressions ❌
(secondary)
4.9% [4.9%, 4.9%] 1
Improvements ✅
(primary)
-2.4% [-2.5%, -2.3%] 2
Improvements ✅
(secondary)
-3.6% [-5.5%, -1.7%] 2
All ❌✅ (primary) 0.5% [-2.5%, 2.0%] 7

Cycles

Results (primary 1.4%, secondary 2.6%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.9% [0.8%, 3.3%] 7
Regressions ❌
(secondary)
2.6% [2.1%, 2.8%] 5
Improvements ✅
(primary)
-2.6% [-2.6%, -2.6%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 1.4% [-2.6%, 3.3%] 8

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 767.262s -> 765.996s (-0.17%)
Artifact size: 330.86 MiB -> 330.90 MiB (0.01%)

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Dec 10, 2024
@davidtwco
Copy link
Member Author

As far as I can tell, perf impact is just from the overhead from the force inline pass running to check if there are any annotated callees.

compiler/rustc_codegen_ssa/src/codegen_attrs.rs Outdated Show resolved Hide resolved
compiler/rustc_mir_transform/src/inline.rs Outdated Show resolved Hide resolved
compiler/rustc_mir_transform/src/inline.rs Outdated Show resolved Hide resolved
compiler/rustc_mir_transform/src/validate.rs Outdated Show resolved Hide resolved
compiler/rustc_mir_transform/src/validate.rs Outdated Show resolved Hide resolved
compiler/rustc_mir_transform/src/lib.rs Show resolved Hide resolved
@tmiasko
Copy link
Contributor

tmiasko commented Dec 11, 2024

According to pull request description the motivating use case are intrinsics that should be generated inline. The most obvious implementation strategy would be to do exactly that, in the contrast to the usual practice of creating a function wrapper that contains a call to an intrinsic. Can you elaborate on the advantages of the proposed implementation strategy?

Any public function marked with rustc_force_inline implicitly defines conditions under which it can be called without generating an error. Those conditions depend on implementation details of MIR inliner, the actual definition of functions marked with rustc_force_inline, the definitions of other functions called transitively, the optimization level used when building crate that defines those functions (consider consequences of optimizing function marked with rustc_force_inline). All those aspect inevitably become part of a public contract.

@davidtwco
Copy link
Member Author

Can you elaborate on the advantages of the proposed implementation strategy?

In this particular case, the "intrinsics" I have in mind for this aren't just wrapper functions.

They use a bunch of declarative macros to generate inline asm (using inline asm to avoid identical calls being optimised away, values being spilled, and to have more codegen guarantees, necessary to maintain the security properties).

I looked into implementing these as MIR intrinsics prior to revisiting forced inlining, but it looked like it would end up being quite complex. These are also very architecture-specific and it felt strange to be implementing something like that into the compiler rather than being able to implement it normally.

I also considered using a macro rather than a function, but having a proper function signature with const generics was just a lot better - it also allow things like assert_instr be used for testing, etc.

Those conditions depend on implementation details of MIR inliner, the actual definition of functions marked with rustc_force_inline, the definitions of other functions called transitively, the optimization level used when building crate that defines those functions (consider consequences of optimizing function marked with rustc_force_inline). All those aspect inevitably become part of a public contract.

These are all limitations - I think a lot of these can be mitigated by being very careful in what we apply this attribute to.

@saethlin
Copy link
Member

I think a 4% compile time hit on real code is much too high for the implementation of a compiler feature that's not even being used.

So I'd like to see that number brought down quite a lot. I think there are a number of ways to improve the compile time hit, at a glance most of the overhead comes from the resolve_instance_raw query, which I think we can just not do. Since we should be rejecting any attempt to put #[rustc_force_inline] on a function with a generic parameter, it should suffice to just check codegen_fn_attrs... right?

@davidtwco
Copy link
Member Author

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Dec 12, 2024
@bors
Copy link
Contributor

bors commented Dec 12, 2024

⌛ Trying commit 08f365c with merge 3487c46...

bors added a commit to rust-lang-ci/rust that referenced this pull request Dec 12, 2024
mir_transform: implement `#[rustc_force_inline]`

Adds `#[rustc_force_inline]` which is similar to always inlining but reports an error if the inlining was not possible.

- `#[rustc_force_inline]` can only be applied to free functions to guarantee that the MIR inliner will be able to resolve calls.
- `rustc_mir_transform::inline::Inline` is refactored into two passes (`Inline` and `ForceInline`), sharing the vast majority of the implementation.
  - `rustc_mir_transform::inline::ForceInline` can't be disabled so annotated items are always inlined.
  - `rustc_mir_transform::inline::ForceInline` runs regardless of optimisation level.
- `#[rustc_force_inline]` won't inline unless target features match, as with normal inlining.
- MIR validation will ICE if a `#[rustc_force_inline]` isn't inlined, to guarantee that it will never be codegened independently. As a further guarantee, monomorphisation collection will always decide that `#[rustc_force_inline]` functions cannot be codegened locally.
- Like other intrinsics, `#[rustc_force_inline]` annotated functions cannot be cast to function pointers.
- As with other rustc attrs, this cannot be used by users, just within the compiler and standard library.
- This is only implemented within rustc, so should avoid any limitations of LLVM's inlining.

It is intended that this attribute be used with intrinsics that must be inlined for security reasons. For example, pointer authentication intrinsics would allow Rust users to make use of pointer authentication instructions, but if these intrinsic functions were in the binary then they could be used as gadgets with ROP attacks, defeating the point of introducing them. We don't have any intrinsics like this today, but I expect to upstream some once a force inlining mechanism such as this is available.

cc rust-lang#131687 rust-lang/rfcs#3711 - this approach should resolve the concerns from these previous attempts

r? `@saethlin`
@bors
Copy link
Contributor

bors commented Dec 12, 2024

☀️ Try build successful - checks-actions
Build commit: 3487c46 (3487c46568df1d65fba8d2d59ad6c9318908ed04)

@rust-timer

This comment has been minimized.

Adds `#[rustc_force_inline]` which is similar to always inlining but
reports an error if the inlining was not possible, and which always
attempts to inline annotated items, regardless of optimisation levels.
It can only be applied to free functions to guarantee that the MIR
inliner will be able to resolve calls.
@davidtwco
Copy link
Member Author

If you misspell EMIT_MIR as EMIT_MIT then it doesn't work as it turns out.

@bors r=saethlin

@bors
Copy link
Contributor

bors commented Jan 10, 2025

📌 Commit cc9a9ec has been approved by saethlin

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 10, 2025
@bors
Copy link
Contributor

bors commented Jan 10, 2025

⌛ Testing commit cc9a9ec with merge b1a7dfb...

@bors
Copy link
Contributor

bors commented Jan 10, 2025

☀️ Test successful - checks-actions
Approved by: saethlin
Pushing b1a7dfb to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Jan 10, 2025
@bors bors merged commit b1a7dfb into rust-lang:master Jan 10, 2025
7 checks passed
@rustbot rustbot added this to the 1.86.0 milestone Jan 10, 2025
@davidtwco davidtwco deleted the forced-inlining branch January 10, 2025 22:12
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (b1a7dfb): comparison URL.

Overall result: ❌ regressions - please read the text below

Our benchmarks found a performance regression caused by this PR.
This might be an actual regression, but it can also be just noise.

Next Steps:

  • If the regression was expected or you think it can be justified,
    please write a comment with sufficient written justification, and add
    @rustbot label: +perf-regression-triaged to it, to mark the regression as triaged.
  • If you think that you know of a way to resolve the regression, try to create
    a new PR with a fix for the regression.
  • If you do not understand the regression or you think that it is just noise,
    you can ask the @rust-lang/wg-compiler-performance working group for help (members of this group
    were already notified of this PR).

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

This is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.

mean range count
Regressions ❌
(primary)
0.3% [0.2%, 0.3%] 10
Regressions ❌
(secondary)
0.3% [0.2%, 0.3%] 15
Improvements ✅
(primary)
-0.1% [-0.1%, -0.1%] 1
Improvements ✅
(secondary)
-0.3% [-0.4%, -0.3%] 2
All ❌✅ (primary) 0.2% [-0.1%, 0.3%] 11

Max RSS (memory usage)

Results (primary 1.9%, secondary 3.9%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.9% [0.9%, 3.2%] 5
Regressions ❌
(secondary)
3.9% [2.3%, 5.5%] 2
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 1.9% [0.9%, 3.2%] 5

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 763.192s -> 764.797s (0.21%)
Artifact size: 325.76 MiB -> 325.90 MiB (0.04%)

@rustbot rustbot added the perf-regression Performance regression. label Jan 10, 2025
@rylev
Copy link
Member

rylev commented Jan 14, 2025

@davidtwco looks like some regressions snuck back in since the last run - I'm unsure if something happened in a rebase to reintroduce them. The actual regressions are much smaller than the original perf run, and I don't think they're big enough to demand an investigation so I'll mark this as triaged.

@rustbot label: +perf-regression-triaged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. perf-regression Performance regression. perf-regression-triaged The performance regression has been triaged. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants