Add inline(usually) #130679

saethlin · 2024-09-21T22:20:40Z

I'm looking into what kind of things could recover the perf improvement detected in #121417 (comment). I think it's worth spending quite a bit of effort to figure out how to capture a 45% incr-patched improvement.

As far as I can tell, the root cause of the problem is that we have taken very deliberate steps in the compiler to ensure that #[inline(always)] causes inlining where possible, even when all optimizations are disabled. Some of the reasons that was done are now outdated or were misguided. I think the only remaining use case is where the inlined body even without optimizations is cheaper to codegen or call, for example SIMD intrinsics may require a lot of code to put their arguments on the stack, which is slow to compile and run.

I'm quite sure that the majority of users applied this attribute believing it does not cause inlining in unoptimized builds, or didn't appreciate the build time regressions that implies and would prefer it didn't if they knew. (if that's you, put a heart on this or say something elsewhere, don't reply on this PR)

I am going to try to use the existing benchmark suite to evaluate a number of different approaches and take notes here, and hopefully I can collect enough data to shape any conversation about what we can do to help users.

The core of this PR is InlineAttr::Usually (name doesn't matter) which ensures that when optimizations are enabled that the function is inlined (usual exceptions like recursion apply). I think most users believe this is what #[inline(always)] does.

#130685 (comment) Replaced #[inline(always)] with #[inline(usually)] in the standard library, and did not recover the same 45% incr-patched improvement in regex. It's a tidy net positive though, and I suspect that perf improvement would normally be big enough to motivate merging a change. I think that means the standard library's use of #[inline(always)] is imposing marginal compile time overhead on the ecosystem, but the bigger opportunities are probably in third-party crates.

#130679 (comment) Treats #[inline(always)] as #[inline(usually)] literally everywhere; this gets the desired incr-patched improvement but suffers quite a few check and doc regressions. I think that means that alwaysinline is more powerful than function-inline-cost=0 in LLVM.

#130679 (comment) Treats #[inline(always)] as #[inline(usually)] when -Copt-level=0, which looks basically the same as #121417 (comment) (omit alwaysinline when doing -Copt-level=0 codegen).

#130679 (comment) replaces alwaysinline with a very negative inline cost, and it still has check and doc regressions. More investigation required on what the different inlining decision is.

#130679 (comment) is a likely explanation of this, with some interesting implications; adding inline(always) to a function that was going to be inlined anyway can change change optimizations (usually it seems to improve things?).

#130679 (comment) makes #[inline(usually)] also defy instantiation mode selection and always be LocalCopy the way #[inline(always)] does, but still has regressions in stm32f4. I think that proves that alwaysinline can actually improve debug build times.

#130679 (comment) infers alwaysinline for extremely trivial functions, but still has regressions for stm32f4. But of course it does, I left inline(always) treated as inline(usually) which slows down the compiler 🤦 inconclusive perf run.

#130679 (comment) doesn't have any stm32f4 regressions 🥳 I think this means that there is some threshold where alwaysinline produces faster debug builds.

So still two questions:

Why does alwaysinline sometimes make debug builds faster?
Is there any obvious threshold at which adding alwaysinline causes more work for debug builds?

saethlin · 2024-09-21T22:28:59Z

@bors try @rust-timer queue

Add inline(usually) r? `@ghost` I'm looking into what kind of things could recover the perf improvement detected in rust-lang#121417 (comment)

bors · 2024-09-21T22:30:10Z

⌛ Trying commit e85e8d8 with merge 16ff86b...

bors · 2024-09-22T00:16:48Z

☀️ Try build successful - checks-actions
Build commit: 16ff86b (16ff86beadb3172d63fad4369aa349ac2c4aa9b6)

rust-timer · 2024-09-22T01:38:56Z

Finished benchmarking commit (16ff86b): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-1.0%	[-1.5%, -0.8%]	4
Improvements ✅ (secondary)	-0.4%	[-1.0%, -0.3%]	7
All ❌✅ (primary)	-1.0%	[-1.5%, -0.8%]	4

Max RSS (memory usage)

Results (primary -0.2%, secondary -1.8%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	1.8%	[0.5%, 3.0%]	2
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-2.2%	[-2.8%, -1.5%]	2
Improvements ✅ (secondary)	-1.8%	[-1.8%, -1.8%]	1
All ❌✅ (primary)	-0.2%	[-2.8%, 3.0%]	4

Cycles

Results (secondary -2.9%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-2.9%	[-2.9%, -2.9%]	1
All ❌✅ (primary)	-	-	0

Binary size

Results (primary 0.1%, secondary 0.1%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.1%	[0.0%, 0.2%]	45
Regressions ❌ (secondary)	0.1%	[0.0%, 0.2%]	12
Improvements ✅ (primary)	-0.2%	[-0.4%, -0.2%]	8
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.1%	[-0.4%, 0.2%]	53

Bootstrap: 769.527s -> 768.003s (-0.20%)
Artifact size: 341.45 MiB -> 341.42 MiB (-0.01%)

…what-she-sed, r=<try> try `inline(usually)` more see rust-lang#130679 figured I'd see what happens if you sed it in to the library.

saethlin · 2024-09-22T17:28:39Z

@bors try @rust-timer queue

Add inline(usually) r? `@ghost` I'm looking into what kind of things could recover the perf improvement detected in rust-lang#121417 (comment)

bors · 2024-09-22T17:29:50Z

⌛ Trying commit e4ff49d with merge 31d3b5c...

RalfJung · 2024-09-22T17:49:08Z

Is there a place for general discussion of this proposal?

My concern would be -- the data you have shows that debug builds build a lot faster this way, but the resulting binaries may also be a lot slower. If this makes debug binaries 10x slower then I don't think we should do this. I doubt it'll be 10x, but we should know how much it is before changing all those small functions to not be inlined any more.

saethlin · 2024-09-22T18:10:40Z

Is there a place for general discussion of this proposal?

Once again, this is S-experimental because I am trying to gather data, not make a proposal. I don't have a proposal to make, because I have not gathered enough data. You're trying to raise a data-free speculative objection to a proposal that doesn't exist because I'm trying to gather data.

RalfJung · 2024-09-22T18:20:05Z

I was suggesting what kind of data would be good to collect (and which AFAIK our perf infra does not provide). I feel like it'd still make sense to have some place to register questions like that to ensure they get discussed if/when this moves forward. After all there is no way to be sure that I would even notice when that happens. But okay, whatever.

bors · 2024-09-22T19:28:11Z

☀️ Try build successful - checks-actions
Build commit: 31d3b5c (31d3b5c04e0372e59ed15dd7f9b874594d9cea46)

rust-timer · 2024-09-22T20:55:51Z

Finished benchmarking commit (31d3b5c): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.9%	[0.2%, 2.7%]	223
Regressions ❌ (secondary)	1.8%	[0.2%, 6.7%]	201
Improvements ✅ (primary)	-11.8%	[-45.8%, -0.5%]	34
Improvements ✅ (secondary)	-1.8%	[-8.1%, -0.2%]	16
All ❌✅ (primary)	-0.7%	[-45.8%, 2.7%]	257

Max RSS (memory usage)

Results (primary -8.1%, secondary -0.3%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	1.5%	[0.4%, 2.8%]	4
Regressions ❌ (secondary)	2.4%	[0.9%, 4.8%]	3
Improvements ✅ (primary)	-9.6%	[-23.9%, -1.0%]	26
Improvements ✅ (secondary)	-2.4%	[-3.4%, -1.0%]	4
All ❌✅ (primary)	-8.1%	[-23.9%, 2.8%]	30

Cycles

Results (primary -5.1%, secondary 3.1%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	1.3%	[0.4%, 2.6%]	33
Regressions ❌ (secondary)	3.5%	[1.4%, 7.9%]	27
Improvements ✅ (primary)	-11.7%	[-43.1%, -1.3%]	32
Improvements ✅ (secondary)	-7.4%	[-7.4%, -7.4%]	1
All ❌✅ (primary)	-5.1%	[-43.1%, 2.6%]	65

Binary size

Results (primary -2.6%, secondary 1.0%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.7%	[0.1%, 3.8%]	34
Regressions ❌ (secondary)	1.1%	[0.0%, 4.4%]	49
Improvements ✅ (primary)	-4.2%	[-14.5%, -0.1%]	68
Improvements ✅ (secondary)	-1.3%	[-3.9%, -0.3%]	4
All ❌✅ (primary)	-2.6%	[-14.5%, 3.8%]	102

Bootstrap: 768.93s -> 758.71s (-1.33%)
Artifact size: 341.51 MiB -> 336.00 MiB (-1.62%)

Add inline(usually) I'm looking into what kind of things could recover the perf improvement detected in rust-lang#121417 (comment). I think it's worth spending quite a bit of effort to figure out how to capture a 45% incr-patched improvement. As far as I can tell, the root cause of the problem is that we have taken very deliberate steps in the compiler to ensure that `#[inline(always)]` causes inlining where possible, even when all optimizations are disabled. Some of the reasons that was done are now outdated or were misguided. I think the only remaining use case is where the inlined body even without optimizations is cheaper to codegen or call, for example SIMD intrinsics may require a lot of code to put their arguments on the stack, which is slow to compile and run. I'm quite sure that the majority of users applied this attribute believing it does not cause inlining in unoptimized builds, or didn't appreciate the build time regressions that implies and would prefer it didn't if they knew. (if that's you, put a heart on this or say something elsewhere, don't reply on this PR) I am going to _try_ to use the existing benchmark suite to evaluate a number of different approaches and take notes here, and hopefully I can collect enough data to shape any conversation about what we can do to help users. The core of this PR is `InlineAttr::Usually` (name doesn't matter) which ensures that when optimizations are enabled that the function is inlined (usual exceptions like recursion apply). I think most users believe this is what `#[inline(always)]` does. rust-lang#130685 (comment) Replaced `#[inline(always)]` with `#[inline(usually)]` in the standard library, and did not recover the same 45% incr-patched improvement in regex. It's a tidy net positive though, and I suspect that perf improvement would normally be big enough to motivate merging a change. I think that means the standard library's use of `#[inline(always)]` is imposing marginal compile time overhead on the ecosystem, but the bigger opportunities are probably in third-party crates. rust-lang#130679 (comment) Treats `#[inline(always)]` as `#[inline(usually)]` literally everywhere; this gets the desired incr-patched improvement but suffers quite a few check and doc regressions. I think that means that `alwaysinline` is more powerful than `function-inline-cost=0` in LLVM. rust-lang#130679 (comment) Treats `#[inline(always)]` as `#[inline(usually)]` when `-Copt-level=0`, which looks basically the same as rust-lang#121417 (comment) (omit `alwaysinline` when doing `-Copt-level=0` codegen). rust-lang#130679 (comment) replaces `alwaysinline` with a very negative inline cost, and it still has check and doc regressions. More investigation required on what the different inlining decision is. rust-lang#130679 (comment) is a likely explanation of this, with some interesting implications; adding `inline(always)` to a function that was going to be inlined anyway can change change optimizations (usually it seems to improve things?). rust-lang#130679 (comment) makes `#[inline(usually)]` also defy instantiation mode selection and always be LocalCopy the way `#[inline(always)]` does, but still has regressions in stm32f4. I think that proves that `alwaysinline` can actually improve debug build times. TODO: What happens if we infer `alwaysinline` for extremely small functions like most of those in stm32f4?

bors · 2024-09-28T23:40:03Z

☀️ Try build successful - checks-actions
Build commit: 97e9a6f (97e9a6f3b123eeefffc3ce8290e06cbe6a399558)

rust-timer · 2024-09-29T01:21:26Z

Finished benchmarking commit (97e9a6f): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	11.2%	[0.2%, 43.0%]	24
Regressions ❌ (secondary)	1.8%	[0.3%, 5.4%]	6
Improvements ✅ (primary)	-9.3%	[-42.2%, -0.3%]	12
Improvements ✅ (secondary)	-1.3%	[-1.6%, -0.7%]	5
All ❌✅ (primary)	4.4%	[-42.2%, 43.0%]	36

Max RSS (memory usage)

Results (primary -0.4%, secondary -0.3%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	3.7%	[2.1%, 5.5%]	15
Regressions ❌ (secondary)	2.7%	[2.0%, 3.4%]	2
Improvements ✅ (primary)	-10.5%	[-21.3%, -0.6%]	6
Improvements ✅ (secondary)	-3.4%	[-4.5%, -2.3%]	2
All ❌✅ (primary)	-0.4%	[-21.3%, 5.5%]	21

Cycles

Results (primary 6.9%, secondary 1.3%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	15.3%	[2.1%, 45.0%]	18
Regressions ❌ (secondary)	3.6%	[1.9%, 5.2%]	2
Improvements ✅ (primary)	-12.0%	[-38.8%, -0.9%]	8
Improvements ✅ (secondary)	-3.2%	[-3.2%, -3.2%]	1
All ❌✅ (primary)	6.9%	[-38.8%, 45.0%]	26

Binary size

Results (primary -1.4%, secondary -0.1%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.2%	[0.0%, 0.4%]	34
Regressions ❌ (secondary)	0.1%	[0.0%, 0.2%]	39
Improvements ✅ (primary)	-2.9%	[-10.0%, -0.1%]	36
Improvements ✅ (secondary)	-1.2%	[-3.9%, -0.2%]	5
All ❌✅ (primary)	-1.4%	[-10.0%, 0.4%]	70

Bootstrap: 769.092s -> 766.818s (-0.30%)
Artifact size: 341.40 MiB -> 341.37 MiB (-0.01%)

saethlin · 2024-09-29T01:37:49Z

@bors try @rust-timer queue

bors · 2024-09-29T01:39:01Z

⌛ Trying commit 8c95207 with merge 3a65c2b...

Add inline(usually) I'm looking into what kind of things could recover the perf improvement detected in rust-lang#121417 (comment). I think it's worth spending quite a bit of effort to figure out how to capture a 45% incr-patched improvement. As far as I can tell, the root cause of the problem is that we have taken very deliberate steps in the compiler to ensure that `#[inline(always)]` causes inlining where possible, even when all optimizations are disabled. Some of the reasons that was done are now outdated or were misguided. I think the only remaining use case is where the inlined body even without optimizations is cheaper to codegen or call, for example SIMD intrinsics may require a lot of code to put their arguments on the stack, which is slow to compile and run. I'm quite sure that the majority of users applied this attribute believing it does not cause inlining in unoptimized builds, or didn't appreciate the build time regressions that implies and would prefer it didn't if they knew. (if that's you, put a heart on this or say something elsewhere, don't reply on this PR) I am going to _try_ to use the existing benchmark suite to evaluate a number of different approaches and take notes here, and hopefully I can collect enough data to shape any conversation about what we can do to help users. The core of this PR is `InlineAttr::Usually` (name doesn't matter) which ensures that when optimizations are enabled that the function is inlined (usual exceptions like recursion apply). I think most users believe this is what `#[inline(always)]` does. rust-lang#130685 (comment) Replaced `#[inline(always)]` with `#[inline(usually)]` in the standard library, and did not recover the same 45% incr-patched improvement in regex. It's a tidy net positive though, and I suspect that perf improvement would normally be big enough to motivate merging a change. I think that means the standard library's use of `#[inline(always)]` is imposing marginal compile time overhead on the ecosystem, but the bigger opportunities are probably in third-party crates. rust-lang#130679 (comment) Treats `#[inline(always)]` as `#[inline(usually)]` literally everywhere; this gets the desired incr-patched improvement but suffers quite a few check and doc regressions. I think that means that `alwaysinline` is more powerful than `function-inline-cost=0` in LLVM. rust-lang#130679 (comment) Treats `#[inline(always)]` as `#[inline(usually)]` when `-Copt-level=0`, which looks basically the same as rust-lang#121417 (comment) (omit `alwaysinline` when doing `-Copt-level=0` codegen). rust-lang#130679 (comment) replaces `alwaysinline` with a very negative inline cost, and it still has check and doc regressions. More investigation required on what the different inlining decision is. rust-lang#130679 (comment) is a likely explanation of this, with some interesting implications; adding `inline(always)` to a function that was going to be inlined anyway can change change optimizations (usually it seems to improve things?). rust-lang#130679 (comment) makes `#[inline(usually)]` also defy instantiation mode selection and always be LocalCopy the way `#[inline(always)]` does, but still has regressions in stm32f4. I think that proves that `alwaysinline` can actually improve debug build times. rust-lang#130679 (comment) infers `alwaysinline` for extremely trivial functions, but still has regressions for stm32f4. But of course it does, I left `inline(always)` treated as `inline(usually)` which slows down the compiler 🤦 inconclusive perf run. TODO: What happens if we infer `alwaysinline` for extremely small functions like most of those in stm32f4?

bors · 2024-09-29T03:22:15Z

☀️ Try build successful - checks-actions
Build commit: 3a65c2b (3a65c2bcf5382f9a846e4b942883f862b98f3fb8)

rust-timer · 2024-09-29T05:37:50Z

Finished benchmarking commit (3a65c2b): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	10.2%	[0.2%, 43.0%]	25
Regressions ❌ (secondary)	1.6%	[0.3%, 5.4%]	7
Improvements ✅ (primary)	-9.5%	[-45.8%, -0.3%]	13
Improvements ✅ (secondary)	-1.1%	[-1.6%, -0.2%]	6
All ❌✅ (primary)	3.5%	[-45.8%, 43.0%]	38

Max RSS (memory usage)

Results (primary -0.0%, secondary -1.6%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	3.3%	[0.7%, 9.9%]	22
Regressions ❌ (secondary)	2.1%	[2.1%, 2.1%]	1
Improvements ✅ (primary)	-8.3%	[-21.2%, -0.5%]	9
Improvements ✅ (secondary)	-3.4%	[-4.7%, -2.1%]	2
All ❌✅ (primary)	-0.0%	[-21.2%, 9.9%]	31

Cycles

Results (primary 6.3%, secondary 2.4%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	13.3%	[0.8%, 45.1%]	20
Regressions ❌ (secondary)	3.0%	[1.8%, 5.8%]	11
Improvements ✅ (primary)	-17.1%	[-42.2%, -1.1%]	6
Improvements ✅ (secondary)	-3.4%	[-3.4%, -3.4%]	1
All ❌✅ (primary)	6.3%	[-42.2%, 45.1%]	26

Binary size

Results (primary -1.6%, secondary -0.1%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.5%	[0.0%, 3.8%]	35
Regressions ❌ (secondary)	0.1%	[0.0%, 0.2%]	39
Improvements ✅ (primary)	-3.2%	[-14.2%, -0.0%]	46
Improvements ✅ (secondary)	-1.2%	[-3.9%, -0.2%]	5
All ❌✅ (primary)	-1.6%	[-14.2%, 3.8%]	81

Bootstrap: 767.6s -> 767.716s (0.02%)
Artifact size: 341.36 MiB -> 341.41 MiB (0.01%)

saethlin · 2024-09-29T15:44:45Z

@bors try @rust-timer queue

bors · 2024-09-29T15:45:58Z

⌛ Trying commit 8ca3275 with merge 70c95b2...

Add inline(usually) I'm looking into what kind of things could recover the perf improvement detected in rust-lang#121417 (comment). I think it's worth spending quite a bit of effort to figure out how to capture a 45% incr-patched improvement. As far as I can tell, the root cause of the problem is that we have taken very deliberate steps in the compiler to ensure that `#[inline(always)]` causes inlining where possible, even when all optimizations are disabled. Some of the reasons that was done are now outdated or were misguided. I think the only remaining use case is where the inlined body even without optimizations is cheaper to codegen or call, for example SIMD intrinsics may require a lot of code to put their arguments on the stack, which is slow to compile and run. I'm quite sure that the majority of users applied this attribute believing it does not cause inlining in unoptimized builds, or didn't appreciate the build time regressions that implies and would prefer it didn't if they knew. (if that's you, put a heart on this or say something elsewhere, don't reply on this PR) I am going to _try_ to use the existing benchmark suite to evaluate a number of different approaches and take notes here, and hopefully I can collect enough data to shape any conversation about what we can do to help users. The core of this PR is `InlineAttr::Usually` (name doesn't matter) which ensures that when optimizations are enabled that the function is inlined (usual exceptions like recursion apply). I think most users believe this is what `#[inline(always)]` does. rust-lang#130685 (comment) Replaced `#[inline(always)]` with `#[inline(usually)]` in the standard library, and did not recover the same 45% incr-patched improvement in regex. It's a tidy net positive though, and I suspect that perf improvement would normally be big enough to motivate merging a change. I think that means the standard library's use of `#[inline(always)]` is imposing marginal compile time overhead on the ecosystem, but the bigger opportunities are probably in third-party crates. rust-lang#130679 (comment) Treats `#[inline(always)]` as `#[inline(usually)]` literally everywhere; this gets the desired incr-patched improvement but suffers quite a few check and doc regressions. I think that means that `alwaysinline` is more powerful than `function-inline-cost=0` in LLVM. rust-lang#130679 (comment) Treats `#[inline(always)]` as `#[inline(usually)]` when `-Copt-level=0`, which looks basically the same as rust-lang#121417 (comment) (omit `alwaysinline` when doing `-Copt-level=0` codegen). rust-lang#130679 (comment) replaces `alwaysinline` with a very negative inline cost, and it still has check and doc regressions. More investigation required on what the different inlining decision is. rust-lang#130679 (comment) is a likely explanation of this, with some interesting implications; adding `inline(always)` to a function that was going to be inlined anyway can change change optimizations (usually it seems to improve things?). rust-lang#130679 (comment) makes `#[inline(usually)]` also defy instantiation mode selection and always be LocalCopy the way `#[inline(always)]` does, but still has regressions in stm32f4. I think that proves that `alwaysinline` can actually improve debug build times. rust-lang#130679 (comment) infers `alwaysinline` for extremely trivial functions, but still has regressions for stm32f4. But of course it does, I left `inline(always)` treated as `inline(usually)` which slows down the compiler 🤦 inconclusive perf run. rust-lang#130679 (comment) doesn't have any stm32f4 regressions 🥳 I think this means that there is some threshold where `alwaysinline` produces faster debug builds. So still two questions: 1. Why does `alwaysinline` sometimes make debug builds faster? 2. Is there any obvious threshold at which adding `alwaysinline` causes more work for debug builds?

rust-log-analyzer · 2024-09-29T16:07:56Z

The job x86_64-gnu-llvm-18 failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)

------
 > importing cache manifest from ghcr.io/rust-lang/rust-ci-cache:20950f2ccee3dff53a038adf5c1cf05231c0b30772617126a5f6478a66316a29cba9aead69f7bb0004886d32c1f8e6287542cf6d25130711c82d16a66201d4fe:
------
##[endgroup]
Setting extra environment values for docker:  --env ENABLE_GCC_CODEGEN=1 --env GCC_EXEC_PREFIX=/usr/lib/gcc/
[CI_JOB_NAME=x86_64-gnu-llvm-18]
---
sccache: Starting the server...
##[group]Configure the build
configure: processing command line
configure: 
configure: build.configure-args := ['--build=x86_64-unknown-linux-gnu', '--llvm-root=/usr/lib/llvm-18', '--enable-llvm-link-shared', '--set', 'rust.randomize-layout=true', '--set', 'rust.thin-lto-import-instr-limit=10', '--set', 'change-id=99999999', '--enable-verbose-configure', '--enable-sccache', '--disable-manage-submodules', '--enable-locked-deps', '--enable-cargo-native-static', '--set', 'rust.codegen-units-std=1', '--set', 'dist.compression-profile=balanced', '--dist-compression-formats=xz', '--set', 'rust.lld=false', '--disable-dist-src', '--release-channel=nightly', '--enable-debug-assertions', '--enable-overflow-checks', '--enable-llvm-assertions', '--set', 'rust.verify-llvm-ir', '--set', 'rust.codegen-backends=llvm,cranelift,gcc', '--set', 'llvm.static-libstdcpp', '--enable-new-symbol-mangling']
configure: target.x86_64-unknown-linux-gnu.llvm-config := /usr/lib/llvm-18/bin/llvm-config
configure: llvm.link-shared     := True
configure: rust.randomize-layout := True
configure: rust.thin-lto-import-instr-limit := 10
---
failures:

---- [incremental] tests/incremental/change_pub_inherent_method_body/struct_point.rs stdout ----

error in revision `cfail2`: test compilation failed although it shouldn't!
status: exit status: 1
command: env -u RUSTC_LOG_COLOR RUSTC_ICE="0" RUST_BACKTRACE="short" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/bin/rustc" "/checkout/tests/incremental/change_pub_inherent_method_body/struct_point.rs" "-Zthreads=1" "-Zsimulate-remapped-rust-src-base=/rustc/FAKE_PREFIX" "-Ztranslate-remapped-path-to-local-path=no" "-Z" "ignore-directory-in-diagnostics-source-blocks=/cargo" "-Z" "ignore-directory-in-diagnostics-source-blocks=/checkout/vendor" "--sysroot" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2" "--target=x86_64-unknown-linux-gnu" "--cfg" "cfail2" "--check-cfg" "cfg(FALSE,cfail1,cfail2)" "-C" "incremental=/checkout/obj/build/x86_64-unknown-linux-gnu/test/incremental/change_pub_inherent_method_body/struct_point/struct_point.inc" "-Z" "incremental-verify-ich" "-O" "--error-format" "json" "--json" "future-incompat" "-Zui-testing" "-Zdeduplicate-diagnostics=no" "-C" "prefer-dynamic" "--out-dir" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/incremental/change_pub_inherent_method_body/struct_point" "-A" "internal_features" "-Crpath" "-Cdebuginfo=0" "-Lnative=/checkout/obj/build/x86_64-unknown-linux-gnu/native/rust-test-helpers" "-L" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/incremental/change_pub_inherent_method_body/struct_point/auxiliary" "-Z" "query-dep-graph"
--- stderr -------------------------------
--- stderr -------------------------------
error: CGU-reuse for `struct_point-fn_calls_changed_method` is `No` but should be at least `PreLto`
   |
   |
LL | #![rustc_partition_reused(module="struct_point-fn_calls_changed_method", cfg="cfail2")]

error: aborting due to 1 previous error
------------------------------------------



---- [incremental] tests/incremental/change_add_field/struct_point.rs stdout ----

error in revision `cfail2`: test compilation failed although it shouldn't!
status: exit status: 1
command: env -u RUSTC_LOG_COLOR RUSTC_ICE="0" RUST_BACKTRACE="short" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/bin/rustc" "/checkout/tests/incremental/change_add_field/struct_point.rs" "-Zthreads=1" "-Zsimulate-remapped-rust-src-base=/rustc/FAKE_PREFIX" "-Ztranslate-remapped-path-to-local-path=no" "-Z" "ignore-directory-in-diagnostics-source-blocks=/cargo" "-Z" "ignore-directory-in-diagnostics-source-blocks=/checkout/vendor" "--sysroot" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2" "--target=x86_64-unknown-linux-gnu" "--cfg" "cfail2" "--check-cfg" "cfg(FALSE,cfail1,cfail2)" "-C" "incremental=/checkout/obj/build/x86_64-unknown-linux-gnu/test/incremental/change_add_field/struct_point/struct_point.inc" "-Z" "incremental-verify-ich" "-O" "--error-format" "json" "--json" "future-incompat" "-Zui-testing" "-Zdeduplicate-diagnostics=no" "-C" "prefer-dynamic" "--out-dir" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/incremental/change_add_field/struct_point" "-A" "internal_features" "-Crpath" "-Cdebuginfo=0" "-Lnative=/checkout/obj/build/x86_64-unknown-linux-gnu/native/rust-test-helpers" "-L" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/incremental/change_add_field/struct_point/auxiliary" "-Z" "query-dep-graph"
--- stderr -------------------------------
--- stderr -------------------------------
error: CGU-reuse for `struct_point-call_fn_with_type_in_body` is `No` but should be at least `PreLto`
   |
   |
LL | #![rustc_partition_reused(module="struct_point-call_fn_with_type_in_body", cfg="cfail2")]

error: aborting due to 1 previous error
------------------------------------------



---- [incremental] tests/incremental/thinlto/cgu_invalidated_via_import.rs stdout ----

error in revision `cfail2`: test compilation failed although it shouldn't!
status: exit status: 1
command: env -u RUSTC_LOG_COLOR RUSTC_ICE="0" RUST_BACKTRACE="short" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/bin/rustc" "/checkout/tests/incremental/thinlto/cgu_invalidated_via_import.rs" "-Zthreads=1" "-Zsimulate-remapped-rust-src-base=/rustc/FAKE_PREFIX" "-Ztranslate-remapped-path-to-local-path=no" "-Z" "ignore-directory-in-diagnostics-source-blocks=/cargo" "-Z" "ignore-directory-in-diagnostics-source-blocks=/checkout/vendor" "--sysroot" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2" "--target=x86_64-unknown-linux-gnu" "--cfg" "cfail2" "--check-cfg" "cfg(FALSE,cfail1,cfail2,cfail3)" "-C" "incremental=/checkout/obj/build/x86_64-unknown-linux-gnu/test/incremental/thinlto/cgu_invalidated_via_import/cgu_invalidated_via_import.inc" "-Z" "incremental-verify-ich" "-O" "--error-format" "json" "--json" "future-incompat" "-Zui-testing" "-Zdeduplicate-diagnostics=no" "-C" "prefer-dynamic" "--out-dir" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/incremental/thinlto/cgu_invalidated_via_import" "-A" "internal_features" "-Crpath" "-Cdebuginfo=0" "-Lnative=/checkout/obj/build/x86_64-unknown-linux-gnu/native/rust-test-helpers" "-L" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/incremental/thinlto/cgu_invalidated_via_import/auxiliary" "-Z" "query-dep-graph" "-O"
--- stderr -------------------------------
--- stderr -------------------------------
error: CGU-reuse for `cgu_invalidated_via_import-bar` is `No` but should be `PreLto`
   |
   |
LL | / #![rustc_expected_cgu_reuse(module="cgu_invalidated_via_import-bar",
LL | |                             cfg="cfail2",
LL | |                             kind="pre-lto")]

error: aborting due to 1 previous error
------------------------------------------



---- [incremental] tests/incremental/thinlto/independent_cgus_dont_affect_each_other.rs stdout ----

error in revision `cfail2`: test compilation failed although it shouldn't!
status: exit status: 1
command: env -u RUSTC_LOG_COLOR RUSTC_ICE="0" RUST_BACKTRACE="short" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/bin/rustc" "/checkout/tests/incremental/thinlto/independent_cgus_dont_affect_each_other.rs" "-Zthreads=1" "-Zsimulate-remapped-rust-src-base=/rustc/FAKE_PREFIX" "-Ztranslate-remapped-path-to-local-path=no" "-Z" "ignore-directory-in-diagnostics-source-blocks=/cargo" "-Z" "ignore-directory-in-diagnostics-source-blocks=/checkout/vendor" "--sysroot" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2" "--target=x86_64-unknown-linux-gnu" "--cfg" "cfail2" "--check-cfg" "cfg(FALSE,cfail1,cfail2,cfail3)" "-C" "incremental=/checkout/obj/build/x86_64-unknown-linux-gnu/test/incremental/thinlto/independent_cgus_dont_affect_each_other/independent_cgus_dont_affect_each_other.inc" "-Z" "incremental-verify-ich" "-O" "--error-format" "json" "--json" "future-incompat" "-Zui-testing" "-Zdeduplicate-diagnostics=no" "-C" "prefer-dynamic" "--out-dir" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/incremental/thinlto/independent_cgus_dont_affect_each_other" "-A" "internal_features" "-Crpath" "-Cdebuginfo=0" "-Lnative=/checkout/obj/build/x86_64-unknown-linux-gnu/native/rust-test-helpers" "-L" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/incremental/thinlto/independent_cgus_dont_affect_each_other/auxiliary" "-Z" "query-dep-graph" "-O"
--- stderr -------------------------------
--- stderr -------------------------------
error: CGU-reuse for `independent_cgus_dont_affect_each_other-bar` is `No` but should be `PreLto`
   |
   |
LL | / #![rustc_expected_cgu_reuse(module="independent_cgus_dont_affect_each_other-bar",
LL | |                             cfg="cfail2",
LL | |                             kind="pre-lto")]

error: aborting due to 1 previous error
------------------------------------------

bors · 2024-09-29T17:28:52Z

☀️ Try build successful - checks-actions
Build commit: 70c95b2 (70c95b222a084ea9a4c7ae723a5a2f27b732093c)

rust-timer · 2024-09-29T18:47:02Z

Finished benchmarking commit (70c95b2): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	11.0%	[0.2%, 30.9%]	22
Regressions ❌ (secondary)	1.3%	[0.3%, 5.4%]	5
Improvements ✅ (primary)	-8.2%	[-42.2%, -0.7%]	17
Improvements ✅ (secondary)	-2.3%	[-3.5%, -1.0%]	2
All ❌✅ (primary)	2.7%	[-42.2%, 30.9%]	39

Max RSS (memory usage)

Results (primary -1.7%, secondary -2.0%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	3.7%	[2.7%, 5.7%]	13
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-10.5%	[-22.1%, -0.5%]	8
Improvements ✅ (secondary)	-2.0%	[-2.0%, -2.0%]	1
All ❌✅ (primary)	-1.7%	[-22.1%, 5.7%]	21

Cycles

Results (primary 3.9%, secondary -2.9%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	14.0%	[2.0%, 30.5%]	18
Regressions ❌ (secondary)	5.0%	[5.0%, 5.0%]	1
Improvements ✅ (primary)	-9.0%	[-39.5%, -1.4%]	14
Improvements ✅ (secondary)	-4.0%	[-4.8%, -3.1%]	7
All ❌✅ (primary)	3.9%	[-39.5%, 30.5%]	32

Binary size

Results (primary -1.7%, secondary 0.5%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.3%	[0.0%, 0.8%]	23
Regressions ❌ (secondary)	0.5%	[0.1%, 1.4%]	15
Improvements ✅ (primary)	-3.0%	[-10.0%, -0.1%]	38
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-1.7%	[-10.0%, 0.8%]	61

Bootstrap: 768.779s -> 769.017s (0.03%)
Artifact size: 341.41 MiB -> 341.37 MiB (-0.01%)

saethlin added the S-experimental Status: Ongoing experiment that does not require reviewing and won't be merged in its current state. label Sep 21, 2024

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Sep 21, 2024

saethlin removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Sep 21, 2024

saethlin force-pushed the inline-usually branch from c8830ec to e85e8d8 Compare September 21, 2024 22:21

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Sep 21, 2024

This comment has been minimized.

Sign in to view

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Sep 22, 2024

workingjubilee mentioned this pull request Sep 22, 2024

try inline(usually) more #130685

Closed

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Sep 22, 2024

This comment has been minimized.

Sign in to view

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Sep 29, 2024

Don't special-case Usually in instantiation mode selection

8c95207

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Sep 29, 2024

This comment has been minimized.

Sign in to view

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Sep 29, 2024

Don't add alwaysinline where it wasn't added already

8ca3275

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Sep 29, 2024

This comment has been minimized.

Sign in to view

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Sep 29, 2024

lqd mentioned this pull request Oct 14, 2024

#[inline(required)] #131687

Closed

tgross35 mentioned this pull request Oct 14, 2024

#[inline(required)] rust-lang/rfcs#3711

Closed

Add inline(usually) #130679

Are you sure you want to change the base?

Add inline(usually) #130679

Conversation

saethlin commented Sep 21, 2024 • edited Loading

saethlin commented Sep 21, 2024

This comment has been minimized.

bors commented Sep 21, 2024

This comment has been minimized.

bors commented Sep 22, 2024

This comment has been minimized.

rust-timer commented Sep 22, 2024

Overall result: ✅ improvements - no action needed

saethlin commented Sep 22, 2024

This comment has been minimized.

bors commented Sep 22, 2024

RalfJung commented Sep 22, 2024

This comment has been minimized.

saethlin commented Sep 22, 2024 • edited Loading

RalfJung commented Sep 22, 2024 via email

bors commented Sep 22, 2024

This comment has been minimized.

rust-timer commented Sep 22, 2024

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

This comment has been minimized.

bors commented Sep 28, 2024

This comment has been minimized.

rust-timer commented Sep 29, 2024

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

saethlin commented Sep 29, 2024

This comment has been minimized.

bors commented Sep 29, 2024

This comment has been minimized.

bors commented Sep 29, 2024

This comment has been minimized.

rust-timer commented Sep 29, 2024

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

saethlin commented Sep 29, 2024

This comment has been minimized.

bors commented Sep 29, 2024

rust-log-analyzer commented Sep 29, 2024

bors commented Sep 29, 2024

This comment has been minimized.

rust-timer commented Sep 29, 2024

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

saethlin commented Sep 21, 2024 •

edited

Loading

saethlin commented Sep 22, 2024 •

edited

Loading