Easy run time benchmarking in GitHub Actions #2

itamarst · 2024-07-19T17:26:03Z

As the maintainer of a project on GitHub Actions, I would like to be able to have benchmarks run automatically on PRs, telling me when a PR slows things down, or perhaps speeds things up.

(A broader use case is tracking performance over time, but this rules out some of the solutions that work for the narrower use case, and it's not quite as important.)

This is harder than it sounds, because cloud CI runners use whatever random cloud VM you get assigned, which means inconsistent hardware. Inconsistent hardware means inconsistent results for normal benchmarking, so results are hard to compare. See e.g. https://bheisler.github.io/post/benchmarking-in-the-cloud/ for some experiments to show this noise. Traditionally people get around this by having a fixed hardware machine to run the benchmarks, which you can then hook up to CI as a runner machine (Github Actions supports this) but this is brittle and not very scalable across teams.

itamarst · 2024-07-19T17:29:58Z

Proposed solution: codspeed.io

This is an online service that does benchmarking, with free accounts for open source projects.

It uses cachegrind/callgrind or something similar (see here for a writeup on the idea.) Basically it counts CPU instructions run using a CPU simulator.

Benefits:

Pretty consistent results; for Python code noise is less than 2%, for computational code might do even better since there's less randomization due to dictionary hash seed.
Easy to setup, you just use normal CI runner as provided e.g. by GitHub Actions.

Downsides:

While it works well for Python code, for compiled code it can be very misleading. For example, improved instruction-level parallelism (ILP) can make core run 2× as fast; worse ILP can make code run half as fast. In both cases the instruction count won't necessarily change, or might even go in opposite direction of performance. codspeed won't be able to deal with this scenario correctly, since its performance metric is instruction counts.

I feel like not noticing a 50% slowdown or 2× speedup is sufficient to rule out codspeed for anything using compiled code.

itamarst · 2024-07-19T17:36:42Z

Proposed solution: Double benchmark run, `main` vs PR

This is used by e.g. pyca/crytography project.

Benchmarks run on a normal CI runner in the cloud, with inconsistent hardware.

On every PR, within a single CI job:

Run the benchmark on main branch.
Run the benchmark on the PR code.
Report both results, optionally calculating the difference.

The idea here is that by running the main branch too, you get a baseline on the same hardware as the benchmarks for the PR. So the comparison is meaningful.

This would be a lot better if done as a GitHub Actions pre-written action, so integrating it with a project is straightforward.

Benefits:

Runs on normal CI runner.

Downsides:

You still have noise from software, and potentially hardware, but there are ways to get around that somewhat, e.g. see Enhancement: Reduce noise by running benchmarks multiple times in multiple subprocesses ionelmc/pytest-benchmark#262

thomasjpfan · 2024-07-25T02:24:40Z

Quansight Labs has a blog post about their benchmarking experience on Github's CI with scikit-image: https://labs.quansight.org/blog/github-actions-benchmarks The blog starts with seeing how consistent CI hardware is over multiple days. In the "Run it on demand!" section, it goes into running benchmarks on PRs and comparing it to main.

Their Github Action for trigging benchmarks is defined here: https://github.com/scikit-image/scikit-image/blob/main/.github/workflows/benchmarks.yml

itamarst changed the title ~~Easy benchmarking in GitHub Actions~~ Easy run time benchmarking in GitHub Actions Jul 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Easy run time benchmarking in GitHub Actions #2

Easy run time benchmarking in GitHub Actions #2

itamarst commented Jul 19, 2024 •

edited

Loading

itamarst commented Jul 19, 2024 •

edited

Loading

itamarst commented Jul 19, 2024

thomasjpfan commented Jul 25, 2024

Easy run time benchmarking in GitHub Actions #2

Easy run time benchmarking in GitHub Actions #2

Comments

itamarst commented Jul 19, 2024 • edited Loading

itamarst commented Jul 19, 2024 • edited Loading

Proposed solution: codspeed.io

itamarst commented Jul 19, 2024

Proposed solution: Double benchmark run, main vs PR

thomasjpfan commented Jul 25, 2024

itamarst commented Jul 19, 2024 •

edited

Loading

itamarst commented Jul 19, 2024 •

edited

Loading

Proposed solution: Double benchmark run, `main` vs PR