Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3x~10x Performance regression between 7.2.0 and >7.3.0 on large folder #980

Open
peter50216 opened this issue Mar 8, 2022 · 5 comments
Open

Comments

@peter50216
Copy link

Noticed that some fd commends runs much slower (10x slower) when I upgraded my local fd from 6.2.0 to newest 8.3.2, and did a quick version bisect.

Looks like the regression is between 7.2.0 and 7.3.0, and all version I've tested after 7.3.0 (7.4.0, 7.5.0, 8.0.0, 8.1.1, 8.3.2) are all as about the same speed as 7.3.0.

Reproduce script:

set -e

wget -q https://github.com/sharkdp/fd/releases/download/v7.2.0/fd-v7.2.0-x86_64-unknown-linux-musl.tar.gz
tar -xf fd-v7.2.0-x86_64-unknown-linux-musl.tar.gz

wget -q https://github.com/sharkdp/fd/releases/download/v7.3.0/fd-v7.3.0-x86_64-unknown-linux-musl.tar.gz
tar -xf fd-v7.3.0-x86_64-unknown-linux-musl.tar.gz

hyperfine --version
hyperfine \
  --warmup 5 \
  './fd-v7.2.0-x86_64-unknown-linux-musl/fd ".*camera_hal.*" ~/chromiumos/src' \
  './fd-v7.3.0-x86_64-unknown-linux-musl/fd ".*camera_hal.*" ~/chromiumos/src'

(I'm using Chrome OS source tree as an example here, but I can reproduce similar regression on other large source tree, for example, linux source tree)

Result:

  • On a VPS without SSD, with 24 cores/96 hyperthreads:
hyperfine 1.11.0
Benchmark #1: ./fd-v7.2.0-x86_64-unknown-linux-musl/fd ".*camera_hal.*" ~/chromiumos/src
  Time (mean ± σ):      2.468 s ±  0.058 s    [User: 90.829 s, System: 115.032 s]
  Range (min … max):    2.402 s …  2.555 s    10 runs
 
Benchmark #2: ./fd-v7.3.0-x86_64-unknown-linux-musl/fd ".*camera_hal.*" ~/chromiumos/src
  Time (mean ± σ):     25.529 s ±  0.328 s    [User: 222.856 s, System: 1924.844 s]
  Range (min … max):   24.980 s … 26.091 s    10 runs
 
Summary
  './fd-v7.2.0-x86_64-unknown-linux-musl/fd ".*camera_hal.*" ~/chromiumos/src' ran
   10.34 ± 0.28 times faster than './fd-v7.3.0-x86_64-unknown-linux-musl/fd ".*camera_hal.*" ~/chromiumos/src'
  • On my local laptop with SSD, with 4 cores/8 hyperthreads:
hyperfine 1.13.0
Benchmark 1: ./fd-v7.2.0-x86_64-unknown-linux-musl/fd ".*camera_hal.*" ~/chromiumos/src
  Time (mean ± σ):      2.348 s ±  0.101 s    [User: 10.347 s, System: 6.298 s]
  Range (min … max):    2.237 s …  2.527 s    10 runs
 
Benchmark 2: ./fd-v7.3.0-x86_64-unknown-linux-musl/fd ".*camera_hal.*" ~/chromiumos/src
  Time (mean ± σ):      6.882 s ±  0.090 s    [User: 44.010 s, System: 6.813 s]
  Range (min … max):    6.783 s …  7.065 s    10 runs

Summary
  './fd-v7.2.0-x86_64-unknown-linux-musl/fd ".*camera_hal.*" ~/chromiumos/src' ran
    2.93 ± 0.13 times faster than './fd-v7.3.0-x86_64-unknown-linux-musl/fd ".*camera_hal.*" ~/chromiumos/src'

Also tried adding --color=never and the result are similar to this, from the changelog the only other suspect is the --exec-batch command?

Happy to provide additional testing / debug info if needed.

@tavianator
Copy link
Collaborator

I can reproduce that here, but with -j1 the performance is the same. I think this is #710, and the cause is just the musl version being upgraded as a result of Rust being updated. Or maybe this is around when Rust stopped using jemalloc by default.

See also

@peter50216
Copy link
Author

Tested with the gnu version instead of musl, and verified that this is specific to musl.

  • On a VPS without SSD, with 24 cores/96 hyperthreads:
hyperfine 1.11.0
Benchmark #1: ./fd-v7.2.0-x86_64-unknown-linux-gnu/fd ".*camera_hal.*" ~/chromiumos/src
  Time (mean ± σ):      2.439 s ±  0.096 s    [User: 99.311 s, System: 109.548 s]
  Range (min … max):    2.347 s …  2.679 s    10 runs
 
Benchmark #2: ./fd-v7.3.0-x86_64-unknown-linux-gnu/fd ".*camera_hal.*" ~/chromiumos/src
  Time (mean ± σ):      2.947 s ±  0.065 s    [User: 138.492 s, System: 49.916 s]
  Range (min … max):    2.851 s …  3.046 s    10 runs
 
Summary
  './fd-v7.2.0-x86_64-unknown-linux-gnu/fd ".*camera_hal.*" ~/chromiumos/src' ran
    1.21 ± 0.05 times faster than './fd-v7.3.0-x86_64-unknown-linux-gnu/fd ".*camera_hal.*" ~/chromiumos/src'

There's still a slowdown of ~1.2x, which is probably caused by Rust stopped using jemalloc by default as you said, and jemalloc being faster in this use case than glibc malloc?

I think this is covered by #710 anyway, so feel free to close this as duplicate.

@sharkdp
Copy link
Owner

sharkdp commented Mar 9, 2022

Thank you for reporting this anyway!

See also: https://dev.to/sharkdp/an-unexpected-performance-regression-11ai

Back then, the performance regression was between 7.0 and 7.1, so that doesn't quite fit with your results. You can easily check if a particular fd executable uses jemalloc by doing something like

strings <fd-executable> | grep jemalloc

@peter50216
Copy link
Author

peter50216 commented Mar 9, 2022

Did a quick grep from binaries downloaded from https://github.com/sharkdp/fd/releases:

Using jemalloc:

  • fd-v7.2.0-x86_64-unknown-linux-musl
  • fd-v7.2.0-x86_64-unknown-linux-gnu
  • fd-v7.4.0-x86_64-unknown-linux-gnu
  • fd-v8.3.2-x86_64-unknown-linux-gnu

Not using jemalloc:

  • fd-v7.3.0-x86_64-unknown-linux-musl
  • fd-v7.4.0-x86_64-unknown-linux-musl
  • fd-v8.0.0-x86_64-unknown-linux-musl
  • fd-v8.2.1-x86_64-unknown-linux-musl
  • fd-v8.3.2-x86_64-unknown-linux-musl
  • fd-v7.3.0-x86_64-unknown-linux-gnu

Looks like the patch to use jemalloc in 7.4.0 is not applied to musl build (which is also stated in the 7.4.0 release notes).

@peter50216
Copy link
Author

peter50216 commented Mar 9, 2022

Also tried building musl + jemalloc on the master branch (c577b08), with cross build --target=x86_64-unknown-linux-musl (gnzlbg/jemallocator#124 (comment)), and the performance is much better than the non-jemalloc version:

Benchmark #1: ~/temp/fd-musl-no-jemalloc ".*camera_hal.*" ~/chromiumos/src
  Time (mean ± σ):     18.901 s ±  0.281 s    [User: 166.882 s, System: 1532.500 s]
  Range (min … max):   18.467 s … 19.252 s    10 runs
 
Benchmark #2: ~/temp/fd-musl-jemalloc ".*camera_hal.*" ~/chromiumos/src
  Time (mean ± σ):      4.614 s ±  0.570 s    [User: 26.295 s, System: 361.069 s]
  Range (min … max):    3.435 s …  5.445 s    10 runs
 
Summary
  '~/temp/fd-musl-jemalloc ".*camera_hal.*" ~/chromiumos/src' ran
    4.10 ± 0.51 times faster than '~/temp/fd-musl-no-jemalloc ".*camera_hal.*" ~/chromiumos/src'

So it might be worthwhile to enable jemalloc for musl build too. (From a quick glance at the github action the musl version is already building with cross, so there shouldn't be any build issue)

It's still slower than 7.2.0 but that's likely #599.

tavianator added a commit to tavianator/fd that referenced this issue Jul 12, 2022
tavianator added a commit to tavianator/fd that referenced this issue Jul 13, 2022
tavianator added a commit to tavianator/fd that referenced this issue Sep 16, 2022
sharkdp pushed a commit that referenced this issue Sep 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants