-
-
Notifications
You must be signed in to change notification settings - Fork 436
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Poisson: Fix undefined behavior and support f64 output #795
Conversation
Before, we might trigger undefined behaviour if the sample gets too large.
Internally, we generate `f64`, so it makes sense to let the user access that result directly, instead of forcing a conversion from `u64`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
Now the only question is whether to bother with an f32
version. I don't see a lot of point, but it would still be a breaking change to introduce later (because Poission
becomes Poisson<f64>
).
rand_distr/src/poisson.rs
Outdated
impl Distribution<u64> for Poisson { | ||
fn sample<R: Rng + ?Sized>(&self, rng: &mut R) -> u64 { | ||
let result: f64 = self.sample(rng); | ||
assert!(result >= 0.); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is already guaranteed by the algorithm (see loop break condition), though the check is cheap enough relative to the algorithm that it doesn't matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I should add #[inline]
, so the check can be optimized away?
I think it makes sense to have the check to avoid undefined behavior, in case the the implementation somehow is wrong and returns negative numbers or nan.
I added |
rand_distr/src/utils.rs
Outdated
@@ -91,33 +111,33 @@ impl Float for f64 { | |||
/// `Ag(z)` is an infinite series with coefficients that can be calculated | |||
/// ahead of time - we use just the first 6 terms, which is good enough | |||
/// for most purposes. | |||
pub(crate) fn log_gamma(x: f64) -> f64 { | |||
pub(crate) fn log_gamma<N: Float>(x: N) -> N { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not an optimal f32
implementation and I'm not sure it's even an acceptable one (e.g. the a
parameter below rounds to 1
). I would suggest just using f64
arithmetic and converting at function start/end, though there's very little point to supporting Poisson<f32>
in this case (only really for the type deduction).
Additionally, this function prototype is okay for internal use but not if we wished to expose log_gamma
since we could not add correct implementations for other float types without specialization. A trait would be better, though clunky since we only have one method.
Statrs's Gamma function implementations are likely better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It essentially evaluates a polynomial, so I think it should be fine, even if the coefficients have less precision. I thinks it is more likely that too many coefficients are used for the target precision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
though there's very little point to supporting Poisson in this case (only really for the type deduction).
It might help for sampling from the uniform distribution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this help with the uniform distribution?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are doing some rejection sampling for the Poisson distribution, sampling f32
instead of f64
in the loop might help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, you mean speed up usage of uniform, not help implement uniform. Yes, and when we get a specialisation of log_gamma
it may speed things up for any(?) situation where f32
is faster than f64
.
@@ -137,29 +137,41 @@ mod test { | |||
fn test_poisson_10() { | |||
let poisson = Poisson::new(10.0).unwrap(); | |||
let mut rng = crate::test::rng(123); | |||
let mut sum = 0; | |||
let mut sum_u64 = 0; | |||
let mut sum_f64 = 0.; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the goal of this test? We know both values should be the same since (a) both should be non-negative integers and (b) we should not be going anywhere close to the limits/accuracy of either type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal is to test that the code path works as expected, not the precision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the same code path, aside from the extra cast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to trigger the path with the cast, making sure it works. I think it is important because of the potential undefined behavior.
I changed |
@@ -67,6 +67,7 @@ where Standard: Distribution<N> | |||
impl<N: Float> Distribution<N> for Poisson<N> | |||
where Standard: Distribution<N> | |||
{ | |||
#[inline] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a large function — I don't think #[inline]
makes any sense here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not #[inline(always)]
, so the compiler is still free to make that choice. It might make sense to inline it into the u64
sampling, so that some bound checks can be eliminated.
No description provided.