Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance regression on Windows due to removal of jemalloc #36328

Closed
iopq opened this issue Sep 7, 2016 · 7 comments
Closed

Performance regression on Windows due to removal of jemalloc #36328

iopq opened this issue Sep 7, 2016 · 7 comments
Labels
A-allocators Area: Custom and system allocators I-slow Issue: Problems and improvements with respect to performance of generated code. O-windows Operating system: Windows P-low Low priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@iopq
Copy link

iopq commented Sep 7, 2016

https://bitbucket.org/iopq/fizzbuzz-in-rust

Steps to reproduce:

  1. Compile this with
    rustup install nightly-2016-01-20
    cargo clean
    rustup run nightly-2016-01-20 cargo bench

observe that you get nice performance:

running 2 tests
test bench_cowbuzz ... bench: 891 ns/iter (+/- 72)
test bench_fizzbuzz ... bench: 1,125 ns/iter (+/- 40)

  1. Compile this with

rustup install nightly-2016-01-21
cargo clean
rustup run nightly-2016-01-21 cargo bench

observe that you get a performance regression:

test bench_cowbuzz ... bench: 1,071 ns/iter (+/- 42)
test bench_fizzbuzz ... bench: 1,542 ns/iter (+/- 101)

rustc -vV gives

commit-hash: 7dce32e
commit-date: 2016-01-20
host: i686-pc-windows-gnu
release: 1.7.0-nightly

  1. The performance is worse in every version including the current nightly, 2016-06-12 gives

test bench_cowbuzz ... bench: 996 ns/iter (+/- 31)
test bench_fizzbuzz ... bench: 1,420 ns/iter (+/- 50)

@durka
Copy link
Contributor

durka commented Sep 7, 2016

Seems to be within the error bars on OSX.

@brson brson added I-slow Issue: Problems and improvements with respect to performance of generated code. O-windows Operating system: Windows regression-from-stable-to-stable Performance or correctness regression from one stable version to another. labels Sep 7, 2016
@iopq
Copy link
Author

iopq commented Sep 7, 2016

I found that it's my dependency that has the performance regression:

https://github.com/iopq/monoid

running the bench on 2016-01-20 gives

test bench_cowstring ... bench: 71 ns/iter (+/- 4)
test bench_cowstring_ms2ger ... bench: 66 ns/iter (+/- 5)

running the same thing on 2016-01-21 gives

test bench_cowstring ... bench: 130 ns/iter (+/- 5)
test bench_cowstring_ms2ger ... bench: 122 ns/iter (+/- 4)

the performance hasn't recovered since

all these tests are doing is string concatenation

#[bench]
fn bench_cowstring(b: &mut test::Bencher) {
    b.iter(|| {
        let fizz = Into::<Cow<_>>::into("Fizz").into_owned();
        let buzz = Into::<Cow<_>>::into("Buzz");
        Into::<Cow<_>>::into(fizz + &*buzz)
    });
}

#[bench]
fn bench_cowstring_ms2ger(b: &mut test::Bencher) {
    b.iter( || {  
            let mut fizz = Into::<Cow<_>>::into("Fizz").into_owned();
            let buzz = Into::<Cow<_>>::into("Buzz");
            fizz.push_str(&*buzz);
            Into::<Cow<_>>::into(fizz);
        }
    );
}

@TimNN
Copy link
Contributor

TimNN commented Sep 8, 2016

The reduced test case does also not reproduce the problem on OS X.

Looking at the changes, this regression may have been caused by disabling jemalloc on windows-gnu in #30985 (edit: although it should have been re-enabled as far as I can tell, so if this was the cause, things should be fine on current nightlies).

@ollie27
Copy link
Member

ollie27 commented Sep 9, 2016

This does look like it's because jemalloc was disabled. On the 2016-09-08 nightly for x86_64-pc-windows-gnu I get:

test bench_cowstring        ... bench:         157 ns/iter (+/- 8)
test bench_cowstring_ms2ger ... bench:         155 ns/iter (+/- 11)

Adding #![feature(alloc_jemalloc)] and extern crate alloc_jemalloc; gives:

test bench_cowstring        ... bench:          85 ns/iter (+/- 3)
test bench_cowstring_ms2ger ... bench:          84 ns/iter (+/- 2)

Compared with the 2016-01-20 nightly:

test bench_cowstring        ... bench:          90 ns/iter (+/- 3)
test bench_cowstring_ms2ger ... bench:          91 ns/iter (+/- 6)

And the 2016-01-21 nightly:

test bench_cowstring        ... bench:         158 ns/iter (+/- 4)
test bench_cowstring_ms2ger ... bench:         156 ns/iter (+/- 2)

@brson brson added A-allocators Area: Custom and system allocators P-low Low priority I-nominated T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Sep 13, 2016
@brson
Copy link
Contributor

brson commented Sep 13, 2016

Libs team needs to explain the path forward for re-gaining this perf loss. We're unlikely to add jemalloc back by default here.

@brson brson removed the I-nominated label Oct 4, 2016
@brson
Copy link
Contributor

brson commented Oct 4, 2016

Right now the intended path forward to regain this perf is by stabilizing the global allocator API, and publishing the jemalloc crate to crates.io.

@brson brson changed the title Performance regression on Windows Performance regression on Windows due to removal of jemalloc Oct 6, 2016
@alexcrichton
Copy link
Member

We're unlikely to really do anything to fix this, so I'm going to close this in favor of #27389. It's highly likely that all programs will start to link to jemalloc by default once we stabilize that feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-allocators Area: Custom and system allocators I-slow Issue: Problems and improvements with respect to performance of generated code. O-windows Operating system: Windows P-low Low priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

6 participants