Connect ASV benchmarking to CI #496

anissa111 · 2023-10-27T17:26:54Z

PR Summary

Closes #472

To see this in action, see my fork.

Benchmarks run in this action, producing this commit on NCAR/geocat-asv
Then, I manually re-run the benchmarks in this action. It does not re-run the previously run benchmarks in the previous actions, and fails here on purpose.

In this PR, I made a couple changes from the proof of concept runs above:

I've increased the number of commits that gets benchmarked. For our actual CI, I've set it to benchmark all commits from the v2022.10.0 tag forward. This will take a while to run the first time, but should only produce long CI runs when a new benchmark is added in the future.
I've altered the last step, the push to NCAR/geocat-comp-asv, to only run if the CI is coming from this repository from the main branch. This addition is to make sure that any manual runs on branches other than main do not get included in our archived benchmark results.

Here are a couple of design decisions that might warrant further discussion:

the action will fail if there are no new benchmarks to be committed. I thought about putting some checks to make sure the action passed anyway, but decided against it. If this CI, which runs automatically on pushes to main and tagged versions, can't find a new push to main or tagged version to benchmark (or fails to produce benchmarks), I assume we might actually want that failure to notify us if something has gone weird. @kafitzgerald, what do you think here?
How long do we actually want to produce a commit history for? Is v2022.10.0 too long or too short?
With this change, we'll really want to make sure to have clean commit histories before pushing to main, should we add any additional guard rails up for that?
Just noting again that it feels really un-ideal to have my personal info added to authenticate the pushes, but I'm unsure how to get around that without fine-grained tokens and/or a bot

PR Checklist

General

Make an issue if one doesn't already exist
Link the issue this PR resolves by adding closes #XXX to the PR description where XXX is the number of the issue.
Add a brief summary of changes to docs/release-notes.rst in a relevant section for the next unreleased release. Possible sections include: Documentation, New Features, Bug Fixes, Internal Changes, Breaking Changes/Deprecated
Add appropriate labels to this PR
Make your changes in a forked repository rather than directly in this repo
Passes precommit. To set up on your local, run pre-commit install from the top level of the repository. To manually run pre-commits, use pre-commit run --all-files and re-add any changed files before committing again and pushing.
If needed, squash and merge PR commits into a single commit to clean up commit history

.github/workflows/asv-benchmarking.yml

philipc2 · 2023-10-27T20:33:53Z

With this change, we'll really want to make sure to have clean commit histories before pushing to main, should we add any additional guard rails up for that?

By this, do you mean squashing any excessive commits? Using myself as a bad example, I made the mistake of not squashing a series of small commits before merging a PR (see here). Would ASV run a benchmark on each of these commits?

anissa111 · 2023-10-27T20:55:08Z

With this change, we'll really want to make sure to have clean commit histories before pushing to main, should we add any additional guard rails up for that?

By this, do you mean squashing any excessive commits? Using myself as a bad example, I made the mistake of not squashing a series of small commits before merging a PR (see here). Would ASV run a benchmark on each of these commits?

With the way we're currently set up in this PR, yes. The asv-run documentation says that they passes the commit list generated from git rev-list. From the commit list I got on uxarray, I can see it's grabbing all the un-squashed commits, which means that asv would benchmach all of them with this set up.

(base) ➜  uxarray git:(main) git rev-list v2023.10.1..main
97fc1a2f443d9ffc997460530f6ff3ed7ae296c3
505209bd783920a6fcc6b4ab0fefd1a4f3df8873
e7c85ac19e9b256b1b63713ead2bff1db362e220
fddb407037f953fc1b8d3d6921eb0f40b3fb6b29
2bd81d47ef6827ec11f7f6d4104678a783b6c7d3
01a34ffa2ff4e9c027871fee72aa65d608ae63ff
9e1c9605845449ecc09aaa3e0a164ac564ed1502
feb0985ba01498c39411ac92fe08408ac7b15c9d
5bf042b3f4fef84f7641a47194cc6fe6057135ea
c72b6e9d4958a9ba159955941abfa5fa039979ff
487ce1f957521d90e3de0636f24fe42a8b7c8056
84bc776b8892ddc3b2d58b84ea3e9dc9d1cd325a
28fcf1dfdf93c861f5fd185d4c9748d9e1973607
6cf5cde795936b2ebd6ff876998fd1593f533218
802749dcebe69b06084a54ee7bf487895cce1ea4
8394dd5e1a2240f8867eea82a52c78de04d9faa6
01e13ca4d921140c1df7aafcb8d3fe710ce7360b
f4c57fe0928fa2927fd2a12b04b12927f53a2cb1
b43157dc39969530058a9e8dfd0d9fa1418f7bda
f046cea7b9bb48e5d8fe645999058cfdd89ec076
d28f3b30e7606063dfb076adea86b66d701e0329
69308cdfe539b967b7602664de86f597c0304abe
da03e5a440671b33f2764acc1b4772eda266f836
5133a62e637b6f5feada702c5ef825f58b93ca30
9b50bdf30eed249b9af2bd931863f4330f78f515
943c9d8d3238ad1e27ff51f13cd04f1e8303eed1
7d444f7f509cd8cb2caa09ed51cf14b682876743
535fbb7e6a054d994cdd88985ce3029dbfd91fde
59d6fd1627439f6d0fcd7d5bb7a77771f0e61c47
f38e4c9fe1acd943c87bd14849c9922097cd8fb0
8baf53dd53360daadd21669ef27d185906605f76
147e2314f131aa53ab426f0cbe4bf6a63658935a
3ed9a94df5873228b966539e71326b484ebcfe45
a00aa1062adb9dbf667299c26cb58bdba8c3d522
b23d33120cff9aedb5992bcbda5fc5fe46060682
63e7d5f06d406d69b55f5c6bda222064d7c3a64b
828f29654927feeefdef2d809fabe1ce60b4cc22
dbbf2362f5c78dbaaab29bbf7bcdbce927d0a003
6bb7416728b4c5d6fcb5d18a8e33a4dff729c0f3
e6d2a33be122553a9fb7026f6522ea6e7fefc608
f00e182f0aaffbe9631d4ec070e5a5ed19d1bea1
7a1bccfd33bc0bf2a1da95e19b4cac3f90a3fb1c

.github/workflows/asv-benchmarking.yml

philipc2 · 2023-10-27T21:48:39Z

With this change, we'll really want to make sure to have clean commit histories before pushing to main, should we add any additional guard rails up for that?

By this, do you mean squashing any excessive commits? Using myself as a bad example, I made the mistake of not squashing a series of small commits before merging a PR (see here). Would ASV run a benchmark on each of these commits?

With the way we're currently set up in this PR, yes. The asv-run documentation says that they passes the commit list generated from git rev-list. From the commit list I got on uxarray, I can see it's grabbing all the un-squashed commits, which means that asv would benchmach all of them with this set up.

(base) ➜  uxarray git:(main) git rev-list v2023.10.1..main
97fc1a2f443d9ffc997460530f6ff3ed7ae296c3
505209bd783920a6fcc6b4ab0fefd1a4f3df8873
e7c85ac19e9b256b1b63713ead2bff1db362e220
fddb407037f953fc1b8d3d6921eb0f40b3fb6b29
2bd81d47ef6827ec11f7f6d4104678a783b6c7d3
01a34ffa2ff4e9c027871fee72aa65d608ae63ff
9e1c9605845449ecc09aaa3e0a164ac564ed1502
feb0985ba01498c39411ac92fe08408ac7b15c9d
5bf042b3f4fef84f7641a47194cc6fe6057135ea
c72b6e9d4958a9ba159955941abfa5fa039979ff
487ce1f957521d90e3de0636f24fe42a8b7c8056
84bc776b8892ddc3b2d58b84ea3e9dc9d1cd325a
28fcf1dfdf93c861f5fd185d4c9748d9e1973607
6cf5cde795936b2ebd6ff876998fd1593f533218
802749dcebe69b06084a54ee7bf487895cce1ea4
8394dd5e1a2240f8867eea82a52c78de04d9faa6
01e13ca4d921140c1df7aafcb8d3fe710ce7360b
f4c57fe0928fa2927fd2a12b04b12927f53a2cb1
b43157dc39969530058a9e8dfd0d9fa1418f7bda
f046cea7b9bb48e5d8fe645999058cfdd89ec076
d28f3b30e7606063dfb076adea86b66d701e0329
69308cdfe539b967b7602664de86f597c0304abe
da03e5a440671b33f2764acc1b4772eda266f836
5133a62e637b6f5feada702c5ef825f58b93ca30
9b50bdf30eed249b9af2bd931863f4330f78f515
943c9d8d3238ad1e27ff51f13cd04f1e8303eed1
7d444f7f509cd8cb2caa09ed51cf14b682876743
535fbb7e6a054d994cdd88985ce3029dbfd91fde
59d6fd1627439f6d0fcd7d5bb7a77771f0e61c47
f38e4c9fe1acd943c87bd14849c9922097cd8fb0
8baf53dd53360daadd21669ef27d185906605f76
147e2314f131aa53ab426f0cbe4bf6a63658935a
3ed9a94df5873228b966539e71326b484ebcfe45
a00aa1062adb9dbf667299c26cb58bdba8c3d522
b23d33120cff9aedb5992bcbda5fc5fe46060682
63e7d5f06d406d69b55f5c6bda222064d7c3a64b
828f29654927feeefdef2d809fabe1ce60b4cc22
dbbf2362f5c78dbaaab29bbf7bcdbce927d0a003
6bb7416728b4c5d6fcb5d18a8e33a4dff729c0f3
e6d2a33be122553a9fb7026f6522ea6e7fefc608
f00e182f0aaffbe9631d4ec070e5a5ed19d1bea1
7a1bccfd33bc0bf2a1da95e19b4cac3f90a3fb1c

Got it. What about only running benchmarks on commits that originated after merging a pull request? Could that possibly be one solution?

anissa111 · 2023-10-30T18:49:47Z

Got it. What about only running benchmarks on commits that originated after merging a pull request? Could that possibly be one solution?

~~@philipc2 Ah! Yeah there is an option for this, it's git rev-list --first-parent.~~

~~To continue the example from Friday, this is what that change produces on uxarray:~~

(base) ➜  uxarray git:(main) git rev-list main --first-parent v2023.10.1..main
9158ae47c46071d8c1a5e5e5d3f07c66279dfe22
97fc1a2f443d9ffc997460530f6ff3ed7ae296c3
505209bd783920a6fcc6b4ab0fefd1a4f3df8873
e7c85ac19e9b256b1b63713ead2bff1db362e220
fddb407037f953fc1b8d3d6921eb0f40b3fb6b29
8baf53dd53360daadd21669ef27d185906605f76
147e2314f131aa53ab426f0cbe4bf6a63658935a
3ed9a94df5873228b966539e71326b484ebcfe45
63e7d5f06d406d69b55f5c6bda222064d7c3a64b

~~I'll try to see if I can pass flags through asv-run~~

EDIT: So, I did manage to get it to run with the --first-parent flag, but when checking it out in verbose to confirm that less commits were found with the flag, I discovered that by default asv run adds the --first-parent flag 🤦‍♀️. Good news though we should already be set.

kafitzgerald

I'm also a bit worried about the "failure" state being confusing, but ok with merging for now and seeing how that plays out in practice

Not so much about this specific PR, but the set-up here in general:

This will likely run on a lot of commits. We'll need to take that into account when writing benchmarks and perhaps create another class of benchmarks to run elsewhere.
Given that we may be running on ~~different~~ shared hardware (and not the most controlled environment), it'll be interesting to see how much variability we have in run time for various benchmarks.

I think these are just things we figure out as we go though.

I guess one more question before I approve this... If we did end up refactoring how this runs and wanted to rerun (say from a different release) and start over on the archive repo, how much of a problem would that be?

philipc2 · 2023-10-30T20:09:35Z

Got it. What about only running benchmarks on commits that originated after merging a pull request? Could that possibly be one solution?

~~@philipc2 Ah! Yeah there is an option for this, it's git rev-list --first-parent.~~

~~To continue the example from Friday, this is what that change produces on uxarray:~~
(base) ➜  uxarray git:(main) git rev-list main --first-parent v2023.10.1..main
9158ae47c46071d8c1a5e5e5d3f07c66279dfe22
97fc1a2f443d9ffc997460530f6ff3ed7ae296c3
505209bd783920a6fcc6b4ab0fefd1a4f3df8873
e7c85ac19e9b256b1b63713ead2bff1db362e220
fddb407037f953fc1b8d3d6921eb0f40b3fb6b29
8baf53dd53360daadd21669ef27d185906605f76
147e2314f131aa53ab426f0cbe4bf6a63658935a
3ed9a94df5873228b966539e71326b484ebcfe45
63e7d5f06d406d69b55f5c6bda222064d7c3a64b
~~I'll try to see if I can pass flags through asv-run~~

EDIT: So, I did manage to get it to run with the --first-parent flag, but when checking it out in verbose to confirm that less commits were found with the flag, I discovered that by default asv run adds the --first-parent flag 🤦‍♀️. Good news though we should already be set.

Thats good to hear then!

philipc2 · 2023-10-30T20:15:13Z

Given that we may be running on different hardware (and not the most controlled environment), it'll be interesting to see how much variability we have in run time for various benchmarks.

Adding to what @kafitzgerald mentioned, I'm worried on what type of results the default actions instance would produce, since I don't belive they have substantial hardware.

For an "automatic" action that we run, would it not be worth considering a fixed hardware (i.e. known CPU, Memory) instance on something like AWS (or even casper, if that's possible) that runs the bencharmk, instead of the default action instance?

However, ASV does support keeping track of the machine that ran the bencharmk (see numpy), so we should be able to distinguish between these automatic ones and manually comitted benchmarks.

EDIT: This could also cause issues if, when running on Github actions, we get an instance that has different hardware than the previous run, which would lead to adding another "machine" to our list.

This is a good read

anissa111 · 2023-10-30T22:01:20Z

@kafitzgerald

I'm also a bit worried about the "failure" state being confusing, but ok with merging for now and seeing how that plays out in practice

Fair enough, I think this won't actually happen that often, if ever, in the regular use of this action.

Not so much about this specific PR, but the set-up here in general:

This will likely run on a lot of commits. We'll need to take that into account when writing benchmarks and perhaps create another class of benchmarks to run elsewhere.

Yeah, totally. Any weighty benchmarks should probably be filtered out of this regular CI

I guess one more question before I approve this... If we did end up refactoring how this runs and wanted to rerun (say from a different release) and start over on the archive repo, how much of a problem would that be?

We can reconfigure this action without needing to delete all our archives (though we could also do that too). The results are only grouped by machine, which we're manually setting. If we wanted to switch to start from an earlier release, we could add to the existing archive by altering the asv run line.

philipc2

I'm on the same page now, this looks great!

kafitzgerald

Great!

anissa111 added 5 commits October 27, 2023 11:02

add asv-benchmarking.yml

ec61f60

only allow to push from NCAR/geocat-comp main

89c6f34

set start to 2022.10.0

7bb99bd

add to release notes

aea638e

fix pre-commits

b8a37e9

anissa111 added developer feature For development standardization / best practices / enhancement testing Issue related to testing scalability and performance Related to scalability, performance, or benchmarking labels Oct 27, 2023

anissa111 requested review from kafitzgerald, jukent and philipc2 October 27, 2023 17:32

philipc2 reviewed Oct 27, 2023

View reviewed changes

.github/workflows/asv-benchmarking.yml Show resolved Hide resolved

Merge branch 'main' into asv-ci

3d59852

kafitzgerald reviewed Oct 27, 2023

View reviewed changes

.github/workflows/asv-benchmarking.yml Show resolved Hide resolved

anissa111 requested review from philipc2 and kafitzgerald October 30, 2023 19:08

kafitzgerald reviewed Oct 30, 2023

View reviewed changes

philipc2 approved these changes Oct 30, 2023

View reviewed changes

kafitzgerald approved these changes Oct 30, 2023

View reviewed changes

anissa111 merged commit a5bdf49 into NCAR:main Oct 30, 2023
8 checks passed

anissa111 deleted the asv-ci branch October 30, 2023 22:23

philipc2 mentioned this pull request Feb 8, 2024

Setup ASV Benchmarking UXARRAY/uxarray#694

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Connect ASV benchmarking to CI #496

Connect ASV benchmarking to CI #496

anissa111 commented Oct 27, 2023 •

edited

Loading

philipc2 commented Oct 27, 2023

anissa111 commented Oct 27, 2023 •

edited

Loading

philipc2 commented Oct 27, 2023 •

edited

Loading

anissa111 commented Oct 30, 2023 •

edited

Loading

kafitzgerald left a comment •

edited

Loading

philipc2 commented Oct 30, 2023

philipc2 commented Oct 30, 2023 •

edited

Loading

anissa111 commented Oct 30, 2023

philipc2 left a comment

kafitzgerald left a comment

Connect ASV benchmarking to CI #496

Connect ASV benchmarking to CI #496

Conversation

anissa111 commented Oct 27, 2023 • edited Loading

PR Summary

PR Checklist

philipc2 commented Oct 27, 2023

anissa111 commented Oct 27, 2023 • edited Loading

philipc2 commented Oct 27, 2023 • edited Loading

anissa111 commented Oct 30, 2023 • edited Loading

kafitzgerald left a comment • edited Loading

Choose a reason for hiding this comment

philipc2 commented Oct 30, 2023

philipc2 commented Oct 30, 2023 • edited Loading

anissa111 commented Oct 30, 2023

philipc2 left a comment

Choose a reason for hiding this comment

kafitzgerald left a comment

Choose a reason for hiding this comment

anissa111 commented Oct 27, 2023 •

edited

Loading

anissa111 commented Oct 27, 2023 •

edited

Loading

philipc2 commented Oct 27, 2023 •

edited

Loading

anissa111 commented Oct 30, 2023 •

edited

Loading

kafitzgerald left a comment •

edited

Loading

philipc2 commented Oct 30, 2023 •

edited

Loading