Bug in GammaDistribution.ProbabilityDensityFunction #304

Burkhardus · 2016-09-25T06:18:26Z

On the given dataset2, there is no density function calculated. Please have a look at my test program.

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;
using Accord;
using Accord.Statistics;
using Accord.Statistics.Distributions.Univariate; 
using Accord.Statistics.Visualizations;
using Accord.Statistics.Analysis;
using ZedGraph;

namespace WindowsFormsApplication1
{
    public partial class Form1 : Form
    {
        public Form1()
        {
            InitializeComponent();
        }

        private void Form1_Load(object sender, EventArgs e)
        {

            Double[] Dataset1 = { 8.141, 8.516, 7.049, 7.555, 7.999, 9.638, 7.445, 8.322, 8.184, 9.138, 8.489, 7.138, 7.855, 8.354, 10.648, 9.036, 9.371, 7.243, 6.967, 7.570, 8.636, 9.734, 8.713, 8.898, 7.969, 7.223, 7.162, 9.536, 8.919, 8.304, 7.746, 8.911, 7.857, 9.024, 7.383, 8.928, 7.410, 9.033, 7.912, 7.751, 7.359, 7.920, 7.294, 7.583, 8.122, 7.586, 7.011, 8.460, 9.126, 7.247, 9.816, 8.411, 8.833, 9.028, 8.271, 7.698, 7.399, 9.823, 8.202, 8.413, 7.384, 7.721, 7.384, 7.660, 7.698, 8.396, 8.038, 7.295, 8.202, 9.214, 10.263, 7.990, 7.502, 8.502, 8.022, 7.824, 9.193, 9.490, 9.279, 8.961, 8.213, 7.448, 8.385, 9.276, 8.178, 7.102, 7.206, 7.594, 7.718, 7.770, 7.997, 7.866, 8.157, 9.319, 7.449, 8.559, 7.617, 8.074, 7.922, 9.178, 8.738, 7.679, 7.983, 8.307, 8.022, 9.407, 7.667, 8.844, 8.134, 8.383, 8.395, 8.004, 9.252, 8.130, 7.934, 8.140, 7.924, 8.893, 6.850, 8.238, 9.762, 7.544, 8.845, 7.595, 8.453, 7.892, 8.007, 8.352, 8.537, 9.402 };
            Double[] Dataset2 = { 0.999, 0.996, 1.066, 1.041, 1.052, 1.043, 1.052, 1.046, 1.032, 1.038, 1.052, 1.038, 1.040, 1.056, 1.043, 1.044, 1.058, 1.044, 1.053, 1.046, 1.045, 1.046, 1.048, 1.051, 1.056, 1.058, 1.065, 1.051, 1.049, 1.049, 1.043, 1.042, 1.049, 1.042, 1.033, 1.042, 1.028, 1.031, 1.036, 1.044, 1.037, 1.054, 1.046, 1.061, 1.048, 1.048, 1.028, 1.030, 1.049, 1.038, 1.042, 1.057, 1.028, 1.047, 1.044, 1.040, 1.038, 1.044, 1.046, 1.035, 1.040, 1.039, 1.046, 1.050, 1.033, 1.032, 1.040, 1.040, 1.032, 1.045, 1.054, 1.040, 1.045, 1.045, 1.043, 1.066, 1.057, 1.052, 1.060, 1.069, 1.046, 1.041, 1.048, 1.057, 1.059, 1.076, 1.051, 1.057, 1.067, 1.062, 1.025, 1.072, 1.061, 1.067, 1.062, 1.055, 1.065, 1.047, 1.046, 1.047, 1.045, 1.038, 1.056, 1.037, 1.042, 1.044, 1.047, 1.037, 1.047, 1.041, 1.042, 1.042, 1.042, 1.040, 1.060, 1.066, 1.067, 1.077, 1.059, 1.067 };
            Double[] Dataset= Dataset2;

            // with Accord.NET the results of the gamma fit are
            // Dataset1: k = 119.306665626699, θ = 0.0690958879514208  (rate=1/θ = 14.4726412764688)
            // Dataset2: k = 6952.29276909337, θ = 0.00015060019786871 (rate=1/θ = 6640.0975174798) 

            // with the same data a gamma distribution fit was performed online via "http://www.wessa.net/rwasp_fitdistrgamma.wasp"
            // Dataset1: shape= 115.405346712493, rate = 13.9994954351296 (θ= 1/shape= 0.07143114583191708605319342679921)
            // Dataset2: shape= 6930.15592532156 , rate = 6618.95473828367 (θ= 1/shape= 0.00015108125671506031378872360567992)

            //Conclusion
            //The fittet parameter are very similar 
            // Dataset 1: k, θ = 3% difference
            // Dataset 2: k, θ = 0.3% difference

        var da = new DistributionAnalysis();
            da.Learn(Dataset);
        var result = da.GoodnessOfFit[0].Distribution.ToString();

        var hist = new Histogram();  
            hist.Compute(Dataset);

        var DistGamma1 = new GammaDistribution();
            DistGamma1.Fit(Dataset);

        var ppl = new PointPairList();

            double x = hist.Range.Min;
            double stepx = (hist.Range.Max - hist.Range.Min) / 100;
            double  y;

            for (int i = 0; i < 100; i++)
            {
                y = DistGamma1.ProbabilityDensityFunction(x);
                x = x + stepx;
                ppl.Add(x, y);
            }

        }
    }
}

External fit test to show that the parameter fitted by Accord are right for both data sets.

The Probability Density Function (PDF) produces valid results on Dataset 1.

The PDF produces invalid results on Dataset 2.

The PDF from inside with invalid result on the given k and theta parameter.

Here is why the calculation is not working:

But on external website it is possible to calculate a density function with the same k and theta parameter:

cesarsouza · 2016-09-25T07:56:31Z

Hi Burkhardus,

Thank you a lot for the detailed report!

The problem is happening because the Gamma distribution is too close to being generate. In the init method of the GammaDistribution, the distribution normalization constant was being computed for the non-logarithm case using direct multiplication of really large numbers, which was bringing Infinity into the calculations and leading to a NaN at the end. The same problem will happen in the PDF function.

A possible workaround for this problem is to use LogProbabilityDensityFunction instead of ProbabilityDensityFunction, then apply Math.Exp() to the generated values. I haven't tested this yet, but it should work. In any case, I will commit a fix in the next few minutes.

Regards,
Cesar

Burkhardus · 2016-09-25T14:03:36Z

Hello Cesar,

have you tested if the Probability Density Function is delivering exactly the right results, on the values of my dataset2?

I am expecting to get the same results like shown in the picture of my report using the online tool of keisan.casio.com which creates the PDF with exactly the same k and theta parameters, that your fitting function found and which are verified by another external tool.

See here:

I am asking, because I could not finish the test successfully. In my test, the y values are all the way zero.

I dont know how the guys at the web site of keisan.casio.com are doing their calculations (maybe at higher accuracy and bigger data types, so they can deal with very small and very large values), could you please do the same in your library?

cesarsouza · 2016-09-26T13:11:33Z

Hi Burkhardus,

Thanks for the remarks! This is the reason the framework offers the LogProbabilityDensity(x) and similar functions. While exp(-600) cannot be represented using System.Double, its logarithm can. In this case, the solution is to keep the data as logarithms until the very end when you need to plot it. When you finally have to plot, first convert the samples to the log-space of your plot coordinates (which will put their exponential version into the normal range of System.Double) and then finally convert them with Math.Exp only after they have been normalized.

This is the traditional approach used in many libraries, including R and Python (that's the reason they include methods for computing log-probabilities in methods such as R's pgamma and Scipy's logpdf.

I will work out an example on how this can be done, but unfortunately I can't do it right now. But if you can, please try to keep the values as logarithms until the time you actually need to use them. If you have to compare the values against others, first see if it is possible to transform the others to the log-space instead of the ones you have into normal space.

Regards,
Cesar

cesarsouza · 2016-09-26T21:00:53Z

Hi Burkhardus,

So, checking for the example you posted, it is not even necessary to go much further in the log route. Please consider the following example. It will give you the needle at 1 that you mentioned, reproducing the values at the Casio website:

double[] dataset2 = { 0.999, 0.996, 1.066, 1.041, 1.052, 1.043, 1.052, 1.046, 1.032, 1.038, 1.052, 1.038, 1.040, 1.056, 1.043, 1.044, 1.058, 1.044, 1.053, 1.046, 1.045, 1.046, 1.048, 1.051, 1.056, 1.058, 1.065, 1.051, 1.049, 1.049, 1.043, 1.042, 1.049, 1.042, 1.033, 1.042, 1.028, 1.031, 1.036, 1.044, 1.037, 1.054, 1.046, 1.061, 1.048, 1.048, 1.028, 1.030, 1.049, 1.038, 1.042, 1.057, 1.028, 1.047, 1.044, 1.040, 1.038, 1.044, 1.046, 1.035, 1.040, 1.039, 1.046, 1.050, 1.033, 1.032, 1.040, 1.040, 1.032, 1.045, 1.054, 1.040, 1.045, 1.045, 1.043, 1.066, 1.057, 1.052, 1.060, 1.069, 1.046, 1.041, 1.048, 1.057, 1.059, 1.076, 1.051, 1.057, 1.067, 1.062, 1.025, 1.072, 1.061, 1.067, 1.062, 1.055, 1.065, 1.047, 1.046, 1.047, 1.045, 1.038, 1.056, 1.037, 1.042, 1.044, 1.047, 1.037, 1.047, 1.041, 1.042, 1.042, 1.042, 1.040, 1.060, 1.066, 1.067, 1.077, 1.059, 1.067 };

var b = GammaDistribution.Estimate(dataset2);

double[] x = Vector.Range(0.0, 10.0, 0.05);
double[] pdf = x.Apply(b.LogProbabilityDensityFunction).Exp();

// result will be
new double[] { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1.60168777623766E-293, 5.51885646801681E-214, 6.87229105720087E-150, 3.05548162854266E-99, 2.07661256908294E-60, 4.84694366550593E-32, 5.26688946739727E-13, 0.0241774294264879, 30.7996636820675, 0.0054993586504859, 5.60031174239992E-13, 1.10634606299171E-28, 1.24123786224247E-49, 2.04031751961623E-75, 1.13865913049715E-105, 4.56124541509388E-140, 2.56251968244235E-178, 3.68510891441049E-220, 2.33359893081555E-265, 1.06302359249589E-313, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };;

Regards,
Cesar

Burkhardus · 2016-09-26T22:10:17Z

Hi Cesar, yes after reading your pre-previous comment, I found it yesterday night, that I only need to use the log probability function and apply the exp function to its results. I wonder why it's so easy!

Why not hiding this all from the user and always do this internally inside the framework?

Thank you anyway for your efforts, I was not clear about this log probability function approach also in other languages like the mentioned "R" which is common in the field of statistics.

Burkhardus changed the title ~~Bug in Accord.Distributions.Univariate.GammaDistribution.ProbabilityDensityFunction~~ Bug in Distributions.Univariate.GammaDistribution.ProbabilityDensityFunction Sep 25, 2016

Burkhardus changed the title ~~Bug in Distributions.Univariate.GammaDistribution.ProbabilityDensityFunction~~ Bug in GammaDistribution.ProbabilityDensityFunction Sep 25, 2016

cesarsouza added a commit that referenced this issue Sep 25, 2016

GH-304: Bug in GammaDistribution.ProbabilityDensityFunction

9289574

cesarsouza added the pending-release label Sep 25, 2016

Burkhardus closed this as completed Sep 28, 2016

cesarsouza removed the pending-release label Jul 9, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug in GammaDistribution.ProbabilityDensityFunction #304

Bug in GammaDistribution.ProbabilityDensityFunction #304

Burkhardus commented Sep 25, 2016 •

edited

Loading

cesarsouza commented Sep 25, 2016

Burkhardus commented Sep 25, 2016 •

edited

Loading

cesarsouza commented Sep 26, 2016 •

edited

Loading

cesarsouza commented Sep 26, 2016 •

edited

Loading

Burkhardus commented Sep 26, 2016 •

edited

Loading

Bug in GammaDistribution.ProbabilityDensityFunction #304

Bug in GammaDistribution.ProbabilityDensityFunction #304

Comments

Burkhardus commented Sep 25, 2016 • edited Loading

cesarsouza commented Sep 25, 2016

Burkhardus commented Sep 25, 2016 • edited Loading

cesarsouza commented Sep 26, 2016 • edited Loading

cesarsouza commented Sep 26, 2016 • edited Loading

Burkhardus commented Sep 26, 2016 • edited Loading

Burkhardus commented Sep 25, 2016 •

edited

Loading

Burkhardus commented Sep 25, 2016 •

edited

Loading

cesarsouza commented Sep 26, 2016 •

edited

Loading

cesarsouza commented Sep 26, 2016 •

edited

Loading

Burkhardus commented Sep 26, 2016 •

edited

Loading