-
-
Notifications
You must be signed in to change notification settings - Fork 435
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a benchmark for 32-bit UniformInt sample_single #985
Conversation
Simply removing the heuristic gives the following results:
So, at least in my environment with rustc 1.44.0, |
Putting the
|
With the heuristic added back in:
Some conclusions:
|
I've made some changes to the benchmark to ensure that it is representative. Most importantly, I moved distribution creation into the timed part of the benchmark to better reflect the cost of generating a single random value.
The u32 values with no code changes to UniformInt are now:
This generally shows the same pattern as the original benchmark, although With the approximation removed:
the best cases ( |
I'm trying to square these results with the benchmark in
but with the removal of the approximation, gives:
The current modulo implementation is much more expensive for larger integers, but it's also disappointing that the 32-bit benchmark increases so much. I expect that this is the source of the claims in |
For comparison, on my Haswell desktop CPU without changing sampling code:
With the left-shift approximation removed:
|
Looks like I'm seeing most of the costs and fewer of the gains of your suggested simplification. What's your CPU? And your change is simply commenting out the At any rate, the gains are mostly less than a factor 2. The costs for |
Thanks for taking a look.
For the new benchmarks, removing the approximation only affects no change:
with change:
I'd say you're seeing more gains than costs:
Comparing the Admittedly the
The aim in this PR is to introduce more representative benchmarks. I'm trying to get that right before submitting any changes to the code itself. |
I changed the no change:
with change:
As you suggest, while this supports the case that 32-bit sometimes benefits from removing the approximation, larger bit sizes are overwhelmed by the cost of the modulo. Interesting that removing the approximation gives a 48% increase for the original 32-bit benchmark and a 48% decrease for this benchmark. |
Yes it is. So you're proposing to replace the Probably the And thanks for giving this stuff some attention. |
Latest changes move the uniform and standard benchmarks out of I have some code changes lined up, but I will introduce them as a separate PR, since this benchmark PR could be merged now, but the code changes will affect the value stability of the outputs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
Thank you. Then I shall go ahead and merge. |
This PR adds a benchmark for
sample_single
forUniformInt<u32>
.This demonstrates the problem that I described in #951
I get these results, where
dist
is sampling from an existing distribution andsingle
is usingsample_single
:This shows that, while
sample_single
is faster for some ranges, it can be significantly slower for others.Before making any changes to the
sample_single
code, I'd like to get these results verified so I don't make changes and then find there's a bug in the benchmark.