Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collect and aggregate timing statistics of CI builders #48829

Open
alexcrichton opened this issue Mar 7, 2018 · 12 comments
Open

Collect and aggregate timing statistics of CI builders #48829

alexcrichton opened this issue Mar 7, 2018 · 12 comments
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue.

Comments

@alexcrichton
Copy link
Member

Currently we only have a little bit of insight into how long all our CI builds are taking with a breakdown of where all that time is going. It's hoped that if we have a historical record of the duration of CI builds which is relatively consistent and granular we can more easily see regressions that happen over time and diagnose otherwise obvious-in-retrospect problems with the build.

The current thinking for how this might be done is to instrument rustbuild to, by default on CI, emit timing information about each step in a structured fashion. The thinking is we could then scrape the logs of all our CI builds to aggregate this structure data. The data needs to include enough information so that it can indeed be easily aggregated. For example we shouldn't have to look at the Travis environment to guess whether it's an OSX/Linux build or which architecture is being assembled.

Once we've got this data we can hopefully set up graphs over time like https://rust-lang-nursery.github.io/rustc-pr-tracking/ to see what time is going where. CI timings are very noisy but the hope is that general trends will still be relatively clear.

@alexcrichton alexcrichton added A-build T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) labels Mar 7, 2018
@alexcrichton
Copy link
Member Author

cc @rust-lang/infra

@matthiaskrgr
Copy link
Member

my two cents:
there is http://scribu.net/travis-stats/#rust-lang/rust/auto which can uncover recent commits that blew up general runtime of a travis build.
Another similar approach is http://buildtimetrend.github.io/ , however there has not been a lot of activity in their repos recently and I am not sure how up-to-date it is (haven't tried getting it to work yet).

@alexcrichton
Copy link
Member Author

Indeed! Unfortunately though those timings are very coarse in that they generally only cover the entire build itself. The variance of time taken from build to build is huge so I'd particularly be interested in a step-by-step breakdown which we can hopefully track much more reliably over time. I'm also curious in things like 10 minute regressions over 1 hour regression which will be much more difficult to diagnose in looking at graphs of overall build times.

@kennytm
Copy link
Member

kennytm commented Mar 8, 2018

We need to know the time taken to compile each crate (have each bootstrap/bin/rustc invocation write out its start-to-finish duration). Otherwise just knowing "compiling stage1-rustc takes 30 minutes" is not really useful. Judging from the stamp output only will always lead us to wrongly blame syntax_ext.

@matthiaskrgr
Copy link
Member

cc rust-lang/cargo#5046

@kennytm
Copy link
Member

kennytm commented Mar 8, 2018

Unfortunately this needs to apply to the bootstrap cargo, meaning we can't benefit from it until the next 6-week cycle. Modifying bootstrap/bin/rustc.rs can be done immediately, however.

@scottmcm
Copy link
Member

scottmcm commented Mar 10, 2018

Are you thinking something like this? #48867 (Direct travis log link)

I made some stats tables for that run as an experiment:

Crates we compile without anything else compiling at the same time (largely rustc, unsurprisingly)
Crate Serial Time (s) Blocking
stage1-rustc 660.93 stage1-ieee stage1-num_cpus stage1-ppc stage0-linkchecker stage1-cc stage1-rustc_apfloat stage1-rustc_typeck stage1-rustc_incremental stage1-rustc_resolve stage1-tempdir stage1-rustc_metadata
stage0-rustc 559.42 stage0-rustc_resolve stage0-num_cpus stage0-tempdir stage0-rustc_mir stage0-rustc_metadata stage0-rustc_allocator stage0-cc
stage1-syntax 222.75 stage1-syntax_ext stage2-error_index_generator stage1-proc_macro stage1-rustc_back stage1-rustc_const_math
stage1-rustc_mir 167.09 stage1-rustc_borrowck stage1-rustc_lint stage1-rustc_passes stage1-rustc_trans_utils
stage1-std 142.18 stage1-term stage1-getopts stage1-std_unicode_tests
stage2-rustdoc 139.29 stage2-rustdoc_tool_binary
stage0-syntax 135.36 stage0-rustc_const_math stage0-rustc_back stage0-proc_macro
stage1-rustc_driver 134.24 stage1-rustc stage1-rustc_errors
stage1-core 108.36 stage1-compiler_builtins stage1-std_unicode stage1-libc
stage0-rustc_driver 81.31 stage0-rustc
stage0-rustc_trans 77.67 stage1-core
stage1-rustc_trans 77.35
stage1-coretests 76.48 stage1-std
stage0-rustc_mir 55.66 stage0-rustc_trans_utils stage0-rustc_borrowck stage0-rustc_passes stage0-rustc_lint
stage0-mdbook 51.92 stage0-rustbook
stage1-collectionstests 44.72 stage1-coretests
stage1-serialize 31.29 stage1-rustc_data_structures stage1-syntax
(others) 423.52
Overall, the stage 1 compiler was 1.48 ×⁄÷ 1.62 times slower than stage0 (geometric mean and stdev)
Crate Time Ratio
rustc_back 2.500
rustc_incremental 1.076
syntax_ext 1.045
std 6.229
rustc_resolve 0.864
rustc_save_analysis 1.029
rustc_metadata 0.970
rustc_typeck 1.060
rustc_mir 1.743
test 1.820
rustc 1.170
rustc_errors 1.121
alloc 3.814
core 3.628
syntax 1.646
rustc_driver 1.651
rustc_trans 0.996
serialize 9.686
(others) (that took less than 10 seconds each)

@alexcrichton
Copy link
Member Author

Whoa that's awesome @scottmcm!

Out of curiosity, how is the "time ratio" calculated? The serialize crate seems to have a very high one! (I'm also quite surprised that libserialize takes 30s to compile...)

@scottmcm
Copy link
Member

Very hacky code used to produce the numbers: https://gist.github.com/scottmcm/241223e4964279a71a41d9846384cb87

Hmm, it does look like it's picking up more things than intended for the ratio. For example, the big reason that std shows as slower is because it's also picking up the time for its test build (https://travis-ci.org/rust-lang/rust/builds/351202994#L3260-L3261), and the same for serialize (https://travis-ci.org/rust-lang/rust/builds/351202994#L3884-L3885). I guess the logging and/or processing needs to account for that.

The overall ratio is a time-weighted geometric mean of the crate ratios.

The other thing I did is split time intervals between all the processes running in the slice (https://gist.github.com/scottmcm/241223e4964279a71a41d9846384cb87#file-timing-rs-L39) in an attempt to not penalize things that are running in parallel with others.

@alexcrichton
Copy link
Member Author

I've made a PR for this at #49094

I'm hoping with that we can adapt @scottmcm's script and perhaps even get a daily build up and running displaying this information!

alexcrichton added a commit to alexcrichton/rust that referenced this issue Mar 19, 2018
This commit updates CI configuration to inform rustbuild that it should print
out how long each step takes on CI. This'll hopefully allow us to track the
duration of steps over time and follow regressions a bit more closesly (as well
as have closer analysis of differences between two builds).

cc rust-lang#48829
alexcrichton added a commit to alexcrichton/rust that referenced this issue Mar 20, 2018
This commit updates CI configuration to inform rustbuild that it should print
out how long each step takes on CI. This'll hopefully allow us to track the
duration of steps over time and follow regressions a bit more closesly (as well
as have closer analysis of differences between two builds).

cc rust-lang#48829
bors added a commit that referenced this issue Mar 22, 2018
ci: Print out how long each step takes on CI

This commit updates CI configuration to inform rustbuild that it should print
out how long each step takes on CI. This'll hopefully allow us to track the
duration of steps over time and follow regressions a bit more closesly (as well
as have closer analysis of differences between two builds).

cc #48829
@XAMPPRocky XAMPPRocky added the C-enhancement Category: An issue proposing an enhancement or a PR with one. label May 14, 2018
@jonas-schievink jonas-schievink added T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) and removed T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) A-build labels Apr 21, 2019
@jyn514
Copy link
Member

jyn514 commented Feb 3, 2023

What's the current status of this issue? I see #49094 got merged - is that enough? Was the plan to also show that in a dashboard somewhere?

cc @pietroalbini , I think you've done a lot of work on CI metrics.

@ehuss
Copy link
Contributor

ehuss commented Feb 3, 2023

There was a dashboard at https://alexcrichton.github.io/rust-ci-timing-tracker/, but it broke many years ago. I think it was helpful for tracking unexpected increases in CI time.

Mark has a project at https://github.com/Mark-Simulacrum/rustc-ci-timing for graphing CI time, but it has to be run manually. It would be nice to make that more accessible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

8 participants