New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Von Mises Distribution #330

Closed

ghost wants to merge 9 commits into boostorg:develop from unknown repository

ghost commented Mar 27, 2020

My trial to implement a circular distribution #296. Basic moments, PDF and CDF work for small values of κ, i. e. the concentration parameter. For larger values the used methods are too imprecise.

pcjmuenster added 5 commits

January 26, 2020 15:40


          Initial commit for Von Mises Distribution

e33f6b5

Add class and test file;
tests for PDF pass for single precision


          Add support for von Mises CDF

828f64d

Redefine support after discussion on GitHub (#296).

Use an established algorithm from the literature for CDF.
Add tests for various concentration values.
Some tests fail for very large values.


          Add quantile function for von Mises distribution

a59c12e

Use Newton–Raphson iteration to calculate zeros of CDF - quantile.


          [von_mises] add complementary quantile and variance

f76081c


          clean up code before PR

a385e69

Collaborator

NAThompson commented Apr 5, 2020 •

edited

Loading

@pcjmuenster : My apologies for not being able to look at this; you will notice my comments were not super meaningful. We've got some work to do before the release; will probably not be able to look at this until after.

Could you write a google benchmark for each, run it under AddressSanitizer and UBSan, and then do a couple plots?

Also, I note your unit tests are mainly spot checks. Are there symmetries or integrals that could provide other tests?

Also, note the file test/math_unit_test.hpp; it does a number of things which are specific to mathematics and are not found in Boost.Test. It also compiles much faster.

NAThompson reviewed

View reviewed changes

include/boost/math/distributions/von_mises.hpp Outdated Show resolved Hide resolved

include/boost/math/distributions/von_mises.hpp Show resolved Hide resolved

include/boost/math/distributions/von_mises.hpp

+              // From MathWorld--A Wolfram Web Resource.
+              // http://mathworld.wolfram.com/VonMisesDistribution.html
+              #include <boost/math/distributions/fwd.hpp>

Collaborator

NAThompson Apr 5, 2020

Can we remove the forward header? @jzmaddock or @pabristow might have a better idea that I about why this is necessary but in general I find it causes problems.

include/boost/math/distributions/von_mises.hpp

+              #ifdef BOOST_MSVC
+              #pragma warning(push)
+              #pragma warning(disable:4127)

Collaborator

NAThompson Apr 5, 2020

Could you add a comment on why this warning should be disabled?

Contributor

pabristow Apr 5, 2020

MSVC used to warn on anything that was constant.

// C 4127 "conditional expression is constant"

They have recently stopped doing this, at last, thank goodness!

https://docs.microsoft.com/en-us/cpp/error-messages/compiler-warnings/compiler-warning-level-4-c4127?view=vs-2019

But we still test antediluvian versions of MSVC, so still worth keeping to reduce clutter?

include/boost/math/distributions/von_mises.hpp Outdated

+                {
+                  // Polynomial coefficients from boost/math/special_functions/detail/bessel_i0.hpp
+                  // > Max error in interpolated form: 5.195e-08
+                  // > Max Error found at float precision = Poly: 8.502534e-08

Collaborator

NAThompson Apr 5, 2020

You are using float precision to do the interpolation, but allow result to be an arbitrary precision. Is there a better way?

Author

ghost Apr 7, 2020

I don't understand what you are referring to. pdf_impl is overloaded depending on the precision.

include/boost/math/distributions/von_mises.hpp Outdated Show resolved Hide resolved

include/boost/math/distributions/von_mises.hpp Outdated

+                if (conc > ck) {
+                  RealType c = 24.0 * conc;
+                  RealType v = c - 56;
+                  RealType r = sqrt((54.0 / (347.0 / v + 26.0 - c) - 6.0 + c) / 12.0);

Collaborator

NAThompson Apr 5, 2020

Lots of double precision constants here. Is arbitrary precision not possible?

Author

ghost Apr 7, 2020

As for the CDF, the mentioned paper gives constants depending on the precision for the values of ck, a1–a4 and v. However, they are only for upto 12 decimal digits. I tried to extrapolate these values but that wasn't quite successful. I have looked around for implementations in other libraries and they all seem to use this old paper and the values for 8 digits (single precision).

Contributor

pabristow Apr 7, 2020

That's a little disappointing (though probably good enough for most users).

https://www.rdocumentation.org/packages/circular/versions/0.4-93 R documents mention this but I haven't found/downloaded the source code yet, but perhaps you already have?

These fitting functions are a Black Art but I know the Chief Wizard and perhaps he can advise @jzmaddock .

Collaborator

jzmaddock Apr 7, 2020

I do wonder if we can do better here:

I notice you've pulled out and duplicated the contents of I0 - why not just call that function directly?
Looking at the original paper, the series to evaluate for the CDF is given on the start of page 280 and everything else is just "faff" to try to evaluate that. I confess it's going to be a nasty one to code up because the Bessel function recurrences are stable in the backwards not forwards direction. But it would be true arbitrary precision.

I faced something similar with some of the 1F1 series expansions, and the way I coded it was to maintain a cache of Bessel function values, when filling the cache you start with an arbitrary small value, then iterate to fill the cache and then normalise all the values afterwards either using the last value from the previous cache or a single Bessel function call. Which is not easy to explain and may or may not make sense!

The code in question is hypergeometric_1F1_AS_13_3_7_tricomi_series::refill_cache in

math/include/boost/math/special_functions/detail/hypergeometric_1F1_bessel.hpp

Line 153 in 8eb40af

void refill_cache()

. Note that this is way over-complex for your case because it deals with both Bessel I and J, and positive and negative orders, neither of which are needed here.

Author

ghost Apr 8, 2020

The reason to duplicate the code was that I0(kappa) grows very fast just like e^kappa, dividing these two however leads to numbers < 100. So, in order to make the code work for large values of kappa, I calculate the quotient directly via polynomial approximation.

Collaborator

NAThompson Apr 17, 2020

The reason to duplicate the code was that I0(kappa) grows very fast just like e^kappa, dividing these two however leads to numbers < 100. So, in order to make the code work for large values of kappa, I calculate the quotient directly via polynomial approximation.

Can you just test for values of kappa that cause overflow and then use an asymptotic there?

test/test_von_mises.cpp Show resolved Hide resolved

test/test_von_mises.cpp Outdated

+              // From MathWorld--A Wolfram Web Resource.
+              // http://mathworld.wolfram.com/VonMisesDistribution.html
+              #include <pch.hpp> // include directory /libs/math/src/tr1/ is needed.

Collaborator

NAThompson Apr 5, 2020

Do you need the entire pch.hpp? If not I'd just include what you need.

Author

ghost Apr 8, 2020

Not necessary at all.

Author

ghost May 18, 2020

Appearently necessary for some architectures …

Author

ghost commented Apr 7, 2020

Could you write a google benchmark for each, run it under AddressSanitizer and UBSan, and then do a couple plots?

What kind of benchmarks do you have in mind?

Collaborator

NAThompson commented Apr 7, 2020

@pcjmuenster : Here's a google benchmark for the GCD in multiprecision. I'd do a float and double test for the pdf and cdf.

Author

ghost commented Apr 17, 2020

I'd do a float and double test for the pdf and cdf.

Until now, I manually typed in the formula into Wolfram|Alpha to get a correct value. In order to make a plot, I'd have to retrieve hundreds or thousands of these. WA has an API to automate these kind of things but unfortunately caching the results is against the ToS. Do you another way to get the exact values? (The papers I read are so old, they only have 4 digits of precision)

Collaborator

NAThompson commented Apr 17, 2020

Until now, I manually typed in the formula into Wolfram|Alpha to get a correct value. In order to make a plot, I'd have to retrieve hundreds or thousands of these.

No don't do that.

I'd do the following:

Test the evenness of the PDF f(-x) = f(x) for various kappa.
Test the oddness of the CDF about y = 0.5 for various kappa.
Test quadratures of the PDF and see that it agrees with the CDF. I think tanh_sinh should work well here, though over the entire domain use trapezoidal.
Test moments of the distribution; it looks like they can be evaluated as quadratures and then tested via modified Bessel functions.

Author

ghost commented Apr 29, 2020

Until now, I manually typed in the formula into Wolfram|Alpha to get a correct value. In order to make a plot, I'd have to retrieve hundreds or thousands of these.

No don't do that.

I'd do the following:
* Test the evenness of the PDF f(-x) = f(x) for various kappa.

* Test the oddness of the CDF about y = 0.5 for various kappa.

* Test quadratures of the PDF and see that it agrees with the CDF. I think `tanh_sinh` should work well here, though over the entire domain use `trapezoidal`.

* Test moments of the distribution; it looks like they can be evaluated as quadratures and then tested via modified Bessel functions.

So shall I plot the difference between say pdf(mu - x) and pdf(mu + x) and not between the calculated value and some yet to be retrieved exact value?

Contributor

pabristow commented Apr 29, 2020

In the past we have used a modest number of spot values using Mathematica or some other 'known good' source. Plot are pretty, but in the end they may not tell you much beyond the obvious over nearly all the range.

But we have tried to include some spot values that are 'edge and corner' cases too. These may reveal where a better algorithm would be useful (or has been used).

Collaborator

NAThompson commented Apr 29, 2020

So shall I plot the difference between say pdf(mu - x) and pdf(mu + x) and not between the calculated value and some yet to be retrieved exact value?

Don't do it as a plot, do:

#include "math_unit_test.hpp"

template<typename Real>
void test_even()
{
     Real mu = ...;
     Real x = 0;
     Real dx = ...;
     while (mu + x < pi) {
             CHECK_ULP_CLOSE(pdf(mu+x), pdf(mu-x), 4);
              x+= dx;
     }
}

Author

ghost commented May 6, 2020

#include "math_unit_test.hpp"

template<typename Real>
void test_even()
{
     Real mu = ...;
     Real x = 0;
     Real dx = ...;
     while (mu + x < pi) {
             CHECK_ULP_CLOSE(pdf(mu+x), pdf(mu-x), 4);
              x+= dx;
     }
}

Switching from BOOST_CHECK_CLOSE to CHECK_ULP_CLOSE revealed that the symmetry doesn't increase with the precision. Everything's fine for float but for double the ULPs off are about 1e+6.

Collaborator

NAThompson commented May 6, 2020

Everything's fine for float but for double the ULPs off are about 1e+6.

Yeah ULPs tests are pretty unforgiving! Try creating an ulps plot in float precision and you might be able to see where things are going wrong.

Contributor

pabristow commented May 6, 2020

I'm currently playing with Nick's ulps toy (looks really helpful) and I'll put your distribution on my todo-next list.

Author

ghost commented May 13, 2020

Appearantly the problem was the imprecision of the calculation of mu - x and mu + x. After switching to inputs that have a finite binary representation, PDF is now down to 2 ULPs and CDF to 32.


          Reduce the ULP error of PDF and CDF, minor corrections

f3d324c

NAThompson reviewed

View reviewed changes

test/test_von_mises.cpp Show resolved Hide resolved

NAThompson reviewed

View reviewed changes

test/test_von_mises.cpp Show resolved Hide resolved

test/test_von_mises.cpp Show resolved Hide resolved

test/test_von_mises.cpp Outdated Show resolved Hide resolved

test/test_von_mises.cpp Outdated Show resolved Hide resolved

test/test_von_mises.cpp Outdated

		@@ -363,7 +387,7 @@ BOOST_AUTO_TEST_CASE( test_main )
		"to pass.</note>" << std::endl;
		#endif

Collaborator

NAThompson May 13, 2020

Could you add a multiprecision test? Template it on float128 should be sufficient.

Collaborator

NAThompson commented May 13, 2020

Could you drop

an ulps plot of the pdf and cdf into doc/graphs?
a google benchmark into reporting/performance?

Contributor

pabristow commented May 14, 2020

I think I remember that passing zero as in test_symmetry(0.0); is a left-over from the dark days only just in the current millennium when templates didn't work properly on all compilers.

(But some of the testers may still using these old compilers/platforms, so we haven't been through to rationalise it. I suspect the those that are marked as requiring C++11 may not try anyway?

"If it ain't broke, don't fix it" applies? )

Author

ghost commented Jun 21, 2020

Could you drop

* an ulps plot of the pdf and cdf into `doc/graphs`?

* a google benchmark into `reporting/performance`?

I wasn't sure how to use the ULP plot function in Boost.Math to plot a function with two variables.
So I made a color-coded one instead. What do you think? The red areas have an error of 260 ULPs.

Collaborator

NAThompson commented Jun 21, 2020

@pcjmuenster : Looks pretty good. For our own ability to reproduce this later though, could you fix the concentration parameter to (say) 64 and then use the univariate ulps plots in boost/math/tools? Put the file you write in reporting/accuracy/von_mises_ulps.cpp.

pcjmuenster added 3 commits

July 18, 2020 20:05


          [von Mises] add quad precision test

8230a6d


          Merge branch 'develop' into von_mises

b8d8acc


          [von_mises] add ULP plots for CDF and PDF [CI SKIP]

3e4f2eb

Author

ghost commented Aug 23, 2020

The ULP plots for the PDF look okay-ish, but the ones for the CDF clearly indicate that the cancelling out of significant digits at the end due to the formula causes severe imprecisions.

Collaborator

NAThompson commented Aug 23, 2020

The ULP plots for the PDF look okay-ish, but the ones for the CDF clearly indicate that the cancelling out of significant digits at the end due to the formula causes severe imprecisions.

Can you post a 1D ULPs plot in the "bad region" you are finding? The condition number envelope should tell us if those errors are acceptable or not.

mborland mentioned this pull request

Von Mises Distribution #818

Open

mborland closed this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet