-
-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fluctuating CPU MHz makes tsc
profiling lie
#23
Comments
tsc
profiling lie
Yes, when benchmarking / profiling code it's not uncommon to override the governor to use the performance governor to remove the "random" clock changes that normal governors will make. That's less about TSC & more about the fact that your code is actually taking a variable amount of time. And honestly, +/- 30% for something taking < 1ns isn't surprising. At that point, the loop counter used for doing the repeated iteration within the timing code is going to be a meaningful part of the measurement. Running within a VM is going to be another contributing factor for measurement error. |
Ultimately I spent quite a while in testing different configurations for this. What I need is benchmarks for something that's in the nanosecond range, where the benchmark will reliably tell if a new version is faster or slower than the old one, where reliably means maybe only 1 in twenty mistakes, but preferably 1 in a hundred. I tried a lot of options with divan, also on a quiet desktop PC, but I was unable to achieve such reliable benchmarks. In the end, criterion ended up being the better choice for me, even though the benchmarks are really slow. I don't know if there's any sense in keeping this "bug" open, so I'll close it. |
So is this issue not just with the TSC then? |
I don't think so. I have observed result instability For what it's worth as well. I suspect criterion has similar behavior and the main differentiating factor is the statistical analysis done within that crate more than the fact that they iterate so much. |
Yes, it happens also without TSC. But I need to spend a bit more time on this so I can actually give reasonable comments and not just guesswork. |
This isn't a fully debugged issue report yet, nor do I have a clean reproduction recipe yet...
Sometimes when I benchmark, attempting to force divan to use TSC, I get results where timer accuracy is 20 ns (this is Github Codespaces so extremely heavily virtualized):
However, every now and then, divan figures out the timer accuracy is 29.85 ns, in which case I see wildly differing values:
This happens also if I don't specify a sample size, but I wanted to fix it for this example so that it's clear the difference isn't because of a different iters value.
This is an extreme microbenchmark, but I think I was seeing similar on larger benchmarks at well.
Actually, while writing this, I think I might've figure out the issue – different CPUs on this system have different MHz, and they seem to be constantly changing.
Not sure what divan can do about that :-)
The text was updated successfully, but these errors were encountered: