Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking Issue for optimize_for_size standard library feature #125612

Open
5 tasks done
Kobzol opened this issue May 27, 2024 · 14 comments
Open
5 tasks done

Tracking Issue for optimize_for_size standard library feature #125612

Kobzol opened this issue May 27, 2024 · 14 comments
Labels
C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC I-heavy Issue: Problems and improvements with respect to binary size of generated code. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Comments

@Kobzol
Copy link
Contributor

Kobzol commented May 27, 2024

#125011 has added a optimize_for_size feature to the standard library, which is designed for special casing some algorithms in core/alloc/std to provide an implementation that tries to minimize its effect on binary size.

To use this feature, the standard library has to be compiled using -Zbuild-std:

$ cargo +nightly build -Z build-std -Z build-std-features="optimize_for_size" ...

Optimizations that leverage this flag:

@Kobzol Kobzol added C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC I-heavy Issue: Problems and improvements with respect to binary size of generated code. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels May 27, 2024
@clarfonthey
Copy link
Contributor

clarfonthey commented May 29, 2024

Small aside, but do you think that code using optimize_for_size should be always optimized for size, or only in release mode? Since a lot of the submitted PRs use cfg!(feature = "optimize_for_size") to test, and I'm not sure that dead code removal will always work in these cases without optimizations enabled.

Obviously, to get the best benefits, you're going to want optimizations anyway, but I'm thinking that we probably do want to ensure that code actually isn't compiled in all cases using conditional compilation, rather than just conditions. This also might help building on memory-constrained environments since the code won't have to be kept in memory before it's removed.

I also think that longer-term, there should be an easier flag to enable this feature that also takes into account optimization options (making sure loops aren't unrolled, etc.) but at least for now, while it's just a Cargo feature, it makes sense to at least figure out what cases we should optimize for.

@Kobzol
Copy link
Contributor Author

Kobzol commented May 29, 2024

I would be extremely surprised if any cfg-ed out code survived even in debug mode without optimizations. Even something like this:

fn main() {
    #[cfg(feature = "foo")]
    println!("hellorust");
}

Doesn't contain the hellorust string in the final binary in a debug build when the feature is disabled.

@folkertdev
Copy link
Contributor

in practice we've been using the cfg!(feature = "optimize_for_size") macro a bunch, which would rely on the optimizer evaluating and simplifying boolean conditions and then removing branches or blocks. I suspect in practice this is reliable enough. Certainly on small examples in godbolt this seems to work fine.

More generally, the point of this flag is really --release mode. The size of a debug binary is just not representative at all. Especially on embedded, --release is effectively standard: you don't want to wait for your debug binary to be pushed through a wire to your device. Because optimization passes in general don't really give guarantees I think it's fine that the optimizer may miss some opportunities in a debug build.

I say that in part because using proper conditional compilation would make the code harder to maintain, to the point where we may require completely separate functions for when the flag is enabled or not. We've been trying to prevent completely separate implementations from happening so far: that won't always work but when we can I think the result is better.

@clarfonthey
Copy link
Contributor

clarfonthey commented May 29, 2024

I would be extremely surprised if any cfg-ed out code survived even in debug mode without optimizations. Even something like this:

fn main() {
    #[cfg(feature = "foo")]
    println!("hellorust");
}

Doesn't contain the hellorust string in the final binary in a debug build when the feature is disabled.

Right, note that this is different from:

fn main() {
    if cfg!(feature = "foo") {
        println!("hellorust");
    }
}

since #[cfg(...)] will prevent the code from even compiling, whereas this will compile the code but leave it inside a dead branch under if false.

I say that in part because using proper conditional compilation would make the code harder to maintain, to the point where we may require completely separate functions for when the flag is enabled or not. We've been trying to prevent completely separate implementations from happening so far: that won't always work but when we can I think the result is better.

I agree here, mostly just wanted to ask because options do exist (like the currently unstable cfg_match macro). And while you might not explicitly push debug code to a device, you might run tests on it off the device in debug mode, and it's important to ensure that these tests are accurate. I can imagine also wanting to compare generated assembly between debug and release mode to see what optimisations are done, without having the non-size-optimised code getting in the way.

Basically agree with the current implementation, but think it's important to properly clarify so folks are aware of the best way to implement things.

@Kobzol
Copy link
Contributor Author

Kobzol commented May 29, 2024

Ah, sorry, I didn't realize that with cfg! it is different. Well, I would be also extremely surprised if if false survived even in debug mode 😆 But happy to see a counterexample.

@dragonnn
Copy link

Hi! I tested this option on my embedded project using embassy on a STM32F401 and my code size did go up from 335kB to 360kB. I can do some debugging/investigation why it happens if giving some instructions what to test for.
Thanks!

@Kobzol
Copy link
Contributor Author

Kobzol commented Jun 11, 2024

Did it go up vs just using -Zbuild-std without the optimize_for_size feature, or versus a normal cargo build with a pre-built libstd?

@dragonnn
Copy link

Ah, I think I found the issue, adding -Z build-std-features="std/optimize_for_size" to cargo build overrides build-std-features from cargo config.toml and that disabled panic_immediate_abort when building my project and bumped the code size. When I add -Zbuild-std-features="std/optimize_for_size panic_immediate_abort" my code size goes down:
with "std/optimize_for_size"

   text    data     bss     dec     hex filename
 339432    2800   35452  377684   5c354 init-stm32

without:

   text    data     bss     dec     hex filename
 340176    2800   35452  378428   5c63c init-stm32

Not a lot but it is something. Thanks!
Btw. small nitpick, why the reason for the std/ prefix? Since options like panic_immediate_abort don't have it, sounds kind redundant when it is already in build-std-features

@Kobzol
Copy link
Contributor Author

Kobzol commented Jun 11, 2024

Hmm, that's weird, on CI we enable it with just optimize_for_size, and in Cargo.toml of stdlib it seems to be defined in the same way as panic_immediate_abort (https://github.com/rust-lang/rust/pull/125011/files).

I think that optimize_for_size is enough. This works for me:

$ cargo +nightly build -Zbuild-std -Zbuild-std-features="optimize_for_size,panic_immediate_abort" --target x86_64-unknown-linux-gnu

@diondokter
Copy link
Contributor

Not a lot but it is something. Thanks!

With what's currently merged, the max you can get is ~1.5kb savings on opt-level z.
There's certainly more to be done for sure :)

@dragonnn
Copy link

Hmm, that's weird, on CI we enable it with just optimize_for_size, and in Cargo.toml of stdlib it seems to be defined in the same way as panic_immediate_abort (https://github.com/rust-lang/rust/pull/125011/files).

I think that optimize_for_size is enough. This works for me:

$ cargo +nightly build -Zbuild-std -Zbuild-std-features="optimize_for_size,panic_immediate_abort" --target x86_64-unknown-linux-gnu

Oh, right it works both way, doesn't metter if it starts with std/ or not. I assumed from the first posts it needs to start with std/. Thanks!

@Kobzol
Copy link
Contributor Author

Kobzol commented Jun 11, 2024

Sorry, that was an error on my part, fixed the code example. Thanks!

@clarfonthey
Copy link
Contributor

clarfonthey commented Jun 11, 2024

This definitely feels like something that could use a proper guide mentioning these sorts of caveats-- the unstable book would probably be a good place to put it for now. That way, we could update it as more cases like these pop up, and it would also help with the bit I mentioned about being able to have a general flag for "use size-optimised libstd" that makes sure the correct optimisation settings, etc. are applied to make it work. (Having a full list of the things people have to do to make it work, means we can take that into account when making a flag that "just works" for this case.)

Perhaps in the distant future, we could even see official builds of "size-optimised" libstd for certain architectures on rustup!

And we can also link that in the original comment so that folks can see the up-to-date version.

@johnthagen
Copy link
Contributor

This has been added to min-sized-rust to increase visibility. Thanks @Kobzol

bors added a commit to rust-lang-ci/rust that referenced this issue Jun 17, 2024
…ate, r=Amanieu

make `ptr::rotate` smaller when using `optimize_for_size`

code to reproduce https://github.com/folkertdev/optimize_for_size-slice-rotate

In the example the size of `.text` goes down from 1624 to 276 bytes.

```
> cargo size --release --features "left-std"  -- -A

slice-rotate  :
section              size        addr
.vector_table        1024         0x0
.text                1624       0x400
.rodata                 0       0xa58
.data                   0  0x20000000
.gnu.sgstubs            0       0xa60
.bss                    0  0x20000000
.uninit                 0  0x20000000
.debug_loc            591         0x0
.debug_abbrev        1452         0x0
.debug_info         10634         0x0
.debug_aranges        480         0x0
.debug_ranges        1504         0x0
.debug_str          11716         0x0
.comment               72         0x0
.ARM.attributes        56         0x0
.debug_frame         1036         0x0
.debug_line          5837         0x0
Total               36026

> cargo size --release --features "left-size"  -- -A

slice-rotate  :
section             size        addr
.vector_table       1024         0x0
.text                276       0x400
.rodata                0       0x514
.data                  0  0x20000000
.gnu.sgstubs           0       0x520
.bss                   0  0x20000000
.uninit                0  0x20000000
.debug_loc           347         0x0
.debug_abbrev        965         0x0
.debug_info         4216         0x0
.debug_aranges       168         0x0
.debug_ranges        216         0x0
.debug_str          3615         0x0
.comment              72         0x0
.ARM.attributes       56         0x0
.debug_frame         232         0x0
.debug_line          723         0x0
Total              11910
```

tracking issue: rust-lang#125612
github-actions bot pushed a commit to rust-lang/miri that referenced this issue Jun 17, 2024
…anieu

make `ptr::rotate` smaller when using `optimize_for_size`

code to reproduce https://github.com/folkertdev/optimize_for_size-slice-rotate

In the example the size of `.text` goes down from 1624 to 276 bytes.

```
> cargo size --release --features "left-std"  -- -A

slice-rotate  :
section              size        addr
.vector_table        1024         0x0
.text                1624       0x400
.rodata                 0       0xa58
.data                   0  0x20000000
.gnu.sgstubs            0       0xa60
.bss                    0  0x20000000
.uninit                 0  0x20000000
.debug_loc            591         0x0
.debug_abbrev        1452         0x0
.debug_info         10634         0x0
.debug_aranges        480         0x0
.debug_ranges        1504         0x0
.debug_str          11716         0x0
.comment               72         0x0
.ARM.attributes        56         0x0
.debug_frame         1036         0x0
.debug_line          5837         0x0
Total               36026

> cargo size --release --features "left-size"  -- -A

slice-rotate  :
section             size        addr
.vector_table       1024         0x0
.text                276       0x400
.rodata                0       0x514
.data                  0  0x20000000
.gnu.sgstubs           0       0x520
.bss                   0  0x20000000
.uninit                0  0x20000000
.debug_loc           347         0x0
.debug_abbrev        965         0x0
.debug_info         4216         0x0
.debug_aranges       168         0x0
.debug_ranges        216         0x0
.debug_str          3615         0x0
.comment              72         0x0
.ARM.attributes       56         0x0
.debug_frame         232         0x0
.debug_line          723         0x0
Total              11910
```

tracking issue: rust-lang/rust#125612
bors added a commit to rust-lang/rust-analyzer that referenced this issue Jun 20, 2024
…anieu

make `ptr::rotate` smaller when using `optimize_for_size`

code to reproduce https://github.com/folkertdev/optimize_for_size-slice-rotate

In the example the size of `.text` goes down from 1624 to 276 bytes.

```
> cargo size --release --features "left-std"  -- -A

slice-rotate  :
section              size        addr
.vector_table        1024         0x0
.text                1624       0x400
.rodata                 0       0xa58
.data                   0  0x20000000
.gnu.sgstubs            0       0xa60
.bss                    0  0x20000000
.uninit                 0  0x20000000
.debug_loc            591         0x0
.debug_abbrev        1452         0x0
.debug_info         10634         0x0
.debug_aranges        480         0x0
.debug_ranges        1504         0x0
.debug_str          11716         0x0
.comment               72         0x0
.ARM.attributes        56         0x0
.debug_frame         1036         0x0
.debug_line          5837         0x0
Total               36026

> cargo size --release --features "left-size"  -- -A

slice-rotate  :
section             size        addr
.vector_table       1024         0x0
.text                276       0x400
.rodata                0       0x514
.data                  0  0x20000000
.gnu.sgstubs           0       0x520
.bss                   0  0x20000000
.uninit                0  0x20000000
.debug_loc           347         0x0
.debug_abbrev        965         0x0
.debug_info         4216         0x0
.debug_aranges       168         0x0
.debug_ranges        216         0x0
.debug_str          3615         0x0
.comment              72         0x0
.ARM.attributes       56         0x0
.debug_frame         232         0x0
.debug_line          723         0x0
Total              11910
```

tracking issue: rust-lang/rust#125612
flip1995 pushed a commit to flip1995/rust-clippy that referenced this issue Jun 28, 2024
…anieu

make `ptr::rotate` smaller when using `optimize_for_size`

code to reproduce https://github.com/folkertdev/optimize_for_size-slice-rotate

In the example the size of `.text` goes down from 1624 to 276 bytes.

```
> cargo size --release --features "left-std"  -- -A

slice-rotate  :
section              size        addr
.vector_table        1024         0x0
.text                1624       0x400
.rodata                 0       0xa58
.data                   0  0x20000000
.gnu.sgstubs            0       0xa60
.bss                    0  0x20000000
.uninit                 0  0x20000000
.debug_loc            591         0x0
.debug_abbrev        1452         0x0
.debug_info         10634         0x0
.debug_aranges        480         0x0
.debug_ranges        1504         0x0
.debug_str          11716         0x0
.comment               72         0x0
.ARM.attributes        56         0x0
.debug_frame         1036         0x0
.debug_line          5837         0x0
Total               36026

> cargo size --release --features "left-size"  -- -A

slice-rotate  :
section             size        addr
.vector_table       1024         0x0
.text                276       0x400
.rodata                0       0x514
.data                  0  0x20000000
.gnu.sgstubs           0       0x520
.bss                   0  0x20000000
.uninit                0  0x20000000
.debug_loc           347         0x0
.debug_abbrev        965         0x0
.debug_info         4216         0x0
.debug_aranges       168         0x0
.debug_ranges        216         0x0
.debug_str          3615         0x0
.comment              72         0x0
.ARM.attributes       56         0x0
.debug_frame         232         0x0
.debug_line          723         0x0
Total              11910
```

tracking issue: rust-lang/rust#125612
tgross35 added a commit to tgross35/rust that referenced this issue Jul 16, 2024
… r=tgross35

Skip fast path for dec2flt when optimize_for_size

Tracking issue: rust-lang#125612

Skip the fast algorithm when optimizing for size.
When compiling for https://github.com/quartiq/stabilizer I get these numbers:

Before
```
   text    data     bss     dec     hex filename
 192192       8   49424  241624   3afd8 dual-iir
```

After
```
   text    data     bss     dec     hex filename
 191632       8   49424  241064   3ada8 dual-iir
```

This saves 560 bytes.
tgross35 added a commit to tgross35/rust that referenced this issue Jul 17, 2024
… r=tgross35

Skip fast path for dec2flt when optimize_for_size

Tracking issue: rust-lang#125612

Skip the fast algorithm when optimizing for size.
When compiling for https://github.com/quartiq/stabilizer I get these numbers:

Before
```
   text    data     bss     dec     hex filename
 192192       8   49424  241624   3afd8 dual-iir
```

After
```
   text    data     bss     dec     hex filename
 191632       8   49424  241064   3ada8 dual-iir
```

This saves 560 bytes.
rust-timer added a commit to rust-lang-ci/rust that referenced this issue Jul 17, 2024
Rollup merge of rust-lang#126271 - diondokter:dec2flt-skip-fast-path, r=tgross35

Skip fast path for dec2flt when optimize_for_size

Tracking issue: rust-lang#125612

Skip the fast algorithm when optimizing for size.
When compiling for https://github.com/quartiq/stabilizer I get these numbers:

Before
```
   text    data     bss     dec     hex filename
 192192       8   49424  241624   3afd8 dual-iir
```

After
```
   text    data     bss     dec     hex filename
 191632       8   49424  241064   3ada8 dual-iir
```

This saves 560 bytes.
bors added a commit to rust-lang-ci/rust that referenced this issue Sep 24, 2024
…sort-impls, r=cuviper

Add `optimize_for_size` variants for stable and unstable sort as well as select_nth_unstable

- Stable sort uses a simple merge-sort that re-uses the existing - rather gnarly - merge function.
- Unstable sort jumps directly to the branchless heapsort fallback.
- select_nth_unstable jumps directly to the median_of_medians fallback, which is augmented with a custom tiny smallsort and partition impl.

Some code is duplicated but de-duplication would bring it's own problems. For example `swap_if_less` is critical for performance, if the sorting networks don't inline it perf drops drastically, however `#[inline(always)]` is also a poor fit, if the provided comparison function is huge, it gives the compiler an out to only instantiate `swap_if_less` once and call it. Another aspect that would suffer when making `swap_if_less` pub, is having to cfg out dozens of functions in in smallsort module.

Part of rust-lang#125612

r​? `@Kobzol`
github-actions bot pushed a commit to rust-lang/miri that referenced this issue Sep 25, 2024
…s, r=cuviper

Add `optimize_for_size` variants for stable and unstable sort as well as select_nth_unstable

- Stable sort uses a simple merge-sort that re-uses the existing - rather gnarly - merge function.
- Unstable sort jumps directly to the branchless heapsort fallback.
- select_nth_unstable jumps directly to the median_of_medians fallback, which is augmented with a custom tiny smallsort and partition impl.

Some code is duplicated but de-duplication would bring it's own problems. For example `swap_if_less` is critical for performance, if the sorting networks don't inline it perf drops drastically, however `#[inline(always)]` is also a poor fit, if the provided comparison function is huge, it gives the compiler an out to only instantiate `swap_if_less` once and call it. Another aspect that would suffer when making `swap_if_less` pub, is having to cfg out dozens of functions in in smallsort module.

Part of rust-lang/rust#125612

r​? `@Kobzol`
lnicola pushed a commit to lnicola/rust-analyzer that referenced this issue Sep 25, 2024
…s, r=cuviper

Add `optimize_for_size` variants for stable and unstable sort as well as select_nth_unstable

- Stable sort uses a simple merge-sort that re-uses the existing - rather gnarly - merge function.
- Unstable sort jumps directly to the branchless heapsort fallback.
- select_nth_unstable jumps directly to the median_of_medians fallback, which is augmented with a custom tiny smallsort and partition impl.

Some code is duplicated but de-duplication would bring it's own problems. For example `swap_if_less` is critical for performance, if the sorting networks don't inline it perf drops drastically, however `#[inline(always)]` is also a poor fit, if the provided comparison function is huge, it gives the compiler an out to only instantiate `swap_if_less` once and call it. Another aspect that would suffer when making `swap_if_less` pub, is having to cfg out dozens of functions in in smallsort module.

Part of rust-lang/rust#125612

r​? `@Kobzol`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC I-heavy Issue: Problems and improvements with respect to binary size of generated code. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

6 participants