-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Conversation
@alvicsam worth to re-generate weights in same PR? |
Sure, I ran the new job so now I'm waiting for the new weights |
How would that pipeline be triggered? Is it manually? |
very soon, I hope, you'll be able to run them through command bot too. |
That is fine! Just wanted to make sure it is not run on every commit or something 😄 |
I pushed the weight results. Is it expected that the weights get worse by that much? |
Yes, the job is manual and available in every PR and commit to be run.
It can be, afaiu consistency matters. We can run benchmarks one more time and compare results of the runs on new runners. However, results on polkadot and cumulus were consistent. @mordamax can we run only some of the benchmarks with the bot in substrate? Where bot will run the benchmark (I mean on which runner)?
@oleg-plakida can tell more about this benchmark. AFAIR the results weren't consistent and different glibc version could change the output |
So how do we continue here? This doesn't sound very confident.
What good can come from this? If it is the same numbers: Why does a supposedly faster machine perform 20% worse in many benchmarks? If it is completely different numbers: The numbers are not consistent.
That dosn't help us here in substrate. I suggest reverting the bot to the old runners until this is figured out. This is blocking all of my PRs. |
Results are consistent in polkadot and cumulus. I see no reason why they won't be consistent here. If you have doubts I suggested options for confirming the stability of the results.
The machines are generated dynamically so several ppl can run benchmarks in the same time in different PRs. Also it's possible to parallelise benchmarks in the future.
Because benchmarks run only on one core. And core frequency on new runners is lower than on the old ones. |
Okay let's re-run then. I tried to trigger. I hope I did it the right way:
Okay this might be a misunderstanding. I didn't mean from the whole endeavour. I understand that bare metal machines are annoying for you to manage. I meant from re-running the benchmarks. |
I've ran tests with
|
@oleg-plakida So I guess we should just remove this then? |
The command or our concern? |
This micro benchmark which is not reflective of the actual performance we are interested in. |
I would say that it would be nice to have benchmark like this, and i suppose that was the idea at the beginning, but we shouldn't trust it a lot right now at least until we bring it to the consistency. And i assume it's hard challenge. But your question is realy interesting. Does anyone really use this benchmark for node testing!? |
Once the benchmarks are finished I will make a new PR into this one with them. This way we can use the weight UI to make a comparison. Please don't commit them to this PR. |
But i suppose consistency the only matters for us. The setup which is used for measuring and not the performance of the setup. As long as we compare result produced in the the reference set up we can measure code performance. |
Committed the new weights here: #13336
|
The The single-threaded CPU speed is expected to be slower than old ref hardware because it is using cloud VM which have server CPUs. Our current goal is indeed to have consistent results, so lets re-run them a few times. |
@ggwpez Have you checked the link above? We ran |
Ah, thought you compared against master 🤦♂️ |
I re-run the worst offender |
Sorry I am confused now. There was no version update between the two runs we did in this PR. You think we should merge this PR as-is? |
I tested it on master, so there was an update since then which now pollutes these results. But that wont explain inconsistencies here, yea. |
Update to rustc? Wasmi is sometimes really senstive to those. It relies on some things being a tail call and rust gives no guarentee for that :(. @Robbepop Is that still the case. I remember faintly that you found a workaround for this.
|
Yes, The good thing is that the Rust devs take performance regressions very seriously. The bad thing is that so far I always had to dig out myself when they popped up. Rustc issues are still open and nobody working on them although having high priority. |
bot help |
Here's a link to docs |
PR adds job that runs benchmarks and creates artifacts with git diff. The job runs on new GCP runners, the goal is to deprecate
bm*
machines. Weights generation is currently in progresshttps://github.com/paritytech/ci_cd/issues/697
https://github.com/paritytech/ci_cd/issues/733
@mordamax @athei