Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add accuracy note to gen_bool #347

Merged
merged 1 commit into from
Apr 1, 2018

Conversation

pitdicker
Copy link
Contributor

A first change to the documentation in reply to @huonw's comment on Reddit.

I don't think adding an extra check for 0.0 in gen_bool is worth it. If p is variable and the code expects to receive both true and false, there is no problem, we are just at the limit of the accuracy. If the goal is to consistently generate false by using 0.0 as a constant, why are you using gen_bool?

Copy link
Member

@dhardy dhardy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively maybe we should use HighPrecision01 here and add a gen_bool_const variant with this method? I don't really like having two methods for the same job, but each has its advantage/disadvantage.

src/lib.rs Outdated
/// # Accuracy note
///
/// `gen_bool` uses 32 bits of the RNG, so if you use it to generate close
/// to or more than 2^32 results, a tiny bias may becomes noticable.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

become

src/lib.rs Outdated
/// `gen_bool` uses 32 bits of the RNG, so if you use it to generate close
/// to or more than 2^32 results, a tiny bias may becomes noticable.
/// A notable consequence of the method used here is that the worst case is
/// `rng.gen_bool(0.0)`: it has a chance of 1 in 2^32 of being true, while
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I told you to use <= after examining the upper bound but not the lower. I guess it's better to use < then and ensure gen_bool(0.0) is correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we would shift to problem to gen_bool(1.0). But I see our implementation as reasonable. We could alternatively just always return false for gen_bool(0.0). But I wouldn't bother, using gen_bool like this is really nor very sensible (as I tried to wr3ite in the first comment, and the comment in the code).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To top it off, some generators are explicitly designed not to produce 0. We could use (rng.gen() - 1) < p_int, but I guess that's slower?

Copy link
Contributor Author

@pitdicker pitdicker Mar 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😄 Good point. Nothing seems perfect any more once you know too many details...

That would have to use a wrapping subtraction, and would also only shift the problem from 0.0 to 1.0.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it should use rejection sampling — but then there's no point shifting. I agree, we're thinking too much into this 😆

@pitdicker
Copy link
Contributor Author

I am starting to think that implementing a different method as the Bernoulli distribution could make sense, but would like to keep gen_bool as it is.

@pitdicker
Copy link
Contributor Author

Ready to merge? (seems like the latest nightly broke stdweb, and somehow we try to use stdweb on two builders now... #352)

@dhardy
Copy link
Member

dhardy commented Mar 29, 2018

Well, apart from the spelling error, it's an improvement, though I'm not entirely happy with it.

@pitdicker pitdicker force-pushed the gen_bool_accuracy_note branch from 26ec612 to ca4722b Compare March 29, 2018 08:34
@pitdicker
Copy link
Contributor Author

I remember changing and pushing that. No idea what happened 😄.

What is the part you are not entirely happy with? That there is a situation where it is not perfect?

@dhardy
Copy link
Member

dhardy commented Mar 29, 2018

Yes. And that we have a more accurate approach which is just as fast (according to my benchmarking) when p is not statically known (which is commonly the case). But I guess we can get to that later.

@pitdicker
Copy link
Contributor Author

The way I see it is that with an accuracy of 32 bits, we can be off by 2^-33. And with rounding, this becomes visible in the 0.0 case.

when p is not statically known (which is commonly the case).

I was thinking just the opposite 😄, thinking about the current uses of gen_weighted_bool.

But what did you think about also adding a bernoulli distribution to add the more precise method?

@dhardy
Copy link
Member

dhardy commented Mar 29, 2018

I was thinking of sim work I've done in the past where many probabilities are fixed, but read from config files, and many more are derived. But I don't know what the average use is!

What, gen_bool and gen_bernoulli? We could I guess. Both are no more than a couple lines of code, and the method currently used is the only one with any potential for pre-computation.

@pitdicker
Copy link
Contributor Author

No, I didn't have an extra method on Rng in mind. Weren't you once thinking about adding a Bernoulli distribution?

@dhardy dhardy merged commit 683d6b9 into rust-random:master Apr 1, 2018
@pitdicker pitdicker deleted the gen_bool_accuracy_note branch April 1, 2018 17:30
pitdicker pushed a commit that referenced this pull request Apr 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants