Automate shrinking benchmark more #4214
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Brings more of the shrinking benchmark into pytest for automation. Also replaces our plotting code with a vega specification. The plotting code requires a dependency on vl-convert to generate the plot. I know about altair, but to my understanding that only aims to represent vega-lite specs, and we are moderately abusing vega (not lite) to have two axes.
Here's a benchmark result where old == new, which gives an idea of the variability in current shrinking. 10 trials (5 old, 5 new), 95% CI. blue is absolute difference, red is relative difference. (@DRMacIver you might find this interesting!)