Skip to content
This repository has been archived by the owner on Nov 19, 2020. It is now read-only.

Bug in GammaDistribution.ProbabilityDensityFunction #304

Closed
Burkhardus opened this issue Sep 25, 2016 · 5 comments
Closed

Bug in GammaDistribution.ProbabilityDensityFunction #304

Burkhardus opened this issue Sep 25, 2016 · 5 comments

Comments

@Burkhardus
Copy link

Burkhardus commented Sep 25, 2016

On the given dataset2, there is no density function calculated. Please have a look at my test program.

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;
using Accord;
using Accord.Statistics;
using Accord.Statistics.Distributions.Univariate; 
using Accord.Statistics.Visualizations;
using Accord.Statistics.Analysis;
using ZedGraph;

namespace WindowsFormsApplication1
{
    public partial class Form1 : Form
    {
        public Form1()
        {
            InitializeComponent();
        }

        private void Form1_Load(object sender, EventArgs e)
        {

            Double[] Dataset1 = { 8.141, 8.516, 7.049, 7.555, 7.999, 9.638, 7.445, 8.322, 8.184, 9.138, 8.489, 7.138, 7.855, 8.354, 10.648, 9.036, 9.371, 7.243, 6.967, 7.570, 8.636, 9.734, 8.713, 8.898, 7.969, 7.223, 7.162, 9.536, 8.919, 8.304, 7.746, 8.911, 7.857, 9.024, 7.383, 8.928, 7.410, 9.033, 7.912, 7.751, 7.359, 7.920, 7.294, 7.583, 8.122, 7.586, 7.011, 8.460, 9.126, 7.247, 9.816, 8.411, 8.833, 9.028, 8.271, 7.698, 7.399, 9.823, 8.202, 8.413, 7.384, 7.721, 7.384, 7.660, 7.698, 8.396, 8.038, 7.295, 8.202, 9.214, 10.263, 7.990, 7.502, 8.502, 8.022, 7.824, 9.193, 9.490, 9.279, 8.961, 8.213, 7.448, 8.385, 9.276, 8.178, 7.102, 7.206, 7.594, 7.718, 7.770, 7.997, 7.866, 8.157, 9.319, 7.449, 8.559, 7.617, 8.074, 7.922, 9.178, 8.738, 7.679, 7.983, 8.307, 8.022, 9.407, 7.667, 8.844, 8.134, 8.383, 8.395, 8.004, 9.252, 8.130, 7.934, 8.140, 7.924, 8.893, 6.850, 8.238, 9.762, 7.544, 8.845, 7.595, 8.453, 7.892, 8.007, 8.352, 8.537, 9.402 };
            Double[] Dataset2 = { 0.999, 0.996, 1.066, 1.041, 1.052, 1.043, 1.052, 1.046, 1.032, 1.038, 1.052, 1.038, 1.040, 1.056, 1.043, 1.044, 1.058, 1.044, 1.053, 1.046, 1.045, 1.046, 1.048, 1.051, 1.056, 1.058, 1.065, 1.051, 1.049, 1.049, 1.043, 1.042, 1.049, 1.042, 1.033, 1.042, 1.028, 1.031, 1.036, 1.044, 1.037, 1.054, 1.046, 1.061, 1.048, 1.048, 1.028, 1.030, 1.049, 1.038, 1.042, 1.057, 1.028, 1.047, 1.044, 1.040, 1.038, 1.044, 1.046, 1.035, 1.040, 1.039, 1.046, 1.050, 1.033, 1.032, 1.040, 1.040, 1.032, 1.045, 1.054, 1.040, 1.045, 1.045, 1.043, 1.066, 1.057, 1.052, 1.060, 1.069, 1.046, 1.041, 1.048, 1.057, 1.059, 1.076, 1.051, 1.057, 1.067, 1.062, 1.025, 1.072, 1.061, 1.067, 1.062, 1.055, 1.065, 1.047, 1.046, 1.047, 1.045, 1.038, 1.056, 1.037, 1.042, 1.044, 1.047, 1.037, 1.047, 1.041, 1.042, 1.042, 1.042, 1.040, 1.060, 1.066, 1.067, 1.077, 1.059, 1.067 };
            Double[] Dataset= Dataset2;

            // with Accord.NET the results of the gamma fit are
            // Dataset1: k = 119.306665626699, θ = 0.0690958879514208  (rate=1/θ = 14.4726412764688)
            // Dataset2: k = 6952.29276909337, θ = 0.00015060019786871 (rate=1/θ = 6640.0975174798) 

            // with the same data a gamma distribution fit was performed online via "http://www.wessa.net/rwasp_fitdistrgamma.wasp"
            // Dataset1: shape= 115.405346712493, rate = 13.9994954351296 (θ= 1/shape= 0.07143114583191708605319342679921)
            // Dataset2: shape= 6930.15592532156 , rate = 6618.95473828367 (θ= 1/shape= 0.00015108125671506031378872360567992)

            //Conclusion
            //The fittet parameter are very similar 
            // Dataset 1: k, θ = 3% difference
            // Dataset 2: k, θ = 0.3% difference

        var da = new DistributionAnalysis();
            da.Learn(Dataset);
        var result = da.GoodnessOfFit[0].Distribution.ToString();

        var hist = new Histogram();  
            hist.Compute(Dataset);

        var DistGamma1 = new GammaDistribution();
            DistGamma1.Fit(Dataset);

        var ppl = new PointPairList();

            double x = hist.Range.Min;
            double stepx = (hist.Range.Max - hist.Range.Min) / 100;
            double  y;

            for (int i = 0; i < 100; i++)
            {
                y = DistGamma1.ProbabilityDensityFunction(x);
                x = x + stepx;
                ppl.Add(x, y);
            }

        }
    }
}

External fit test to show that the parameter fitted by Accord are right for both data sets.

dataset1u2_externalfit

The Probability Density Function (PDF) produces valid results on Dataset 1.

gammadistributionfitandpdfdataset1

The PDF produces invalid results on Dataset 2.

gammadistributionfitandpdfdataset2

The PDF from inside with invalid result on the given k and theta parameter.

density function wrong

Here is why the calculation is not working:

density function wrong1

But on external website it is possible to calculate a density function with the same k and theta parameter:

dataset2_density online works

@Burkhardus Burkhardus changed the title Bug in Accord.Distributions.Univariate.GammaDistribution.ProbabilityDensityFunction Bug in Distributions.Univariate.GammaDistribution.ProbabilityDensityFunction Sep 25, 2016
@Burkhardus Burkhardus changed the title Bug in Distributions.Univariate.GammaDistribution.ProbabilityDensityFunction Bug in GammaDistribution.ProbabilityDensityFunction Sep 25, 2016
@cesarsouza
Copy link
Member

Hi Burkhardus,

Thank you a lot for the detailed report!

The problem is happening because the Gamma distribution is too close to being generate. In the init method of the GammaDistribution, the distribution normalization constant was being computed for the non-logarithm case using direct multiplication of really large numbers, which was bringing Infinity into the calculations and leading to a NaN at the end. The same problem will happen in the PDF function.

A possible workaround for this problem is to use LogProbabilityDensityFunction instead of ProbabilityDensityFunction, then apply Math.Exp() to the generated values. I haven't tested this yet, but it should work. In any case, I will commit a fix in the next few minutes.

Regards,
Cesar

@Burkhardus
Copy link
Author

Burkhardus commented Sep 25, 2016

Hello Cesar,

have you tested if the Probability Density Function is delivering exactly the right results, on the values of my dataset2?

I am expecting to get the same results like shown in the picture of my report using the online tool of keisan.casio.com which creates the PDF with exactly the same k and theta parameters, that your fitting function found and which are verified by another external tool.

See here:

dataset2_density online works

I am asking, because I could not finish the test successfully. In my test, the y values are all the way zero.

pdf all zero 1

I dont know how the guys at the web site of keisan.casio.com are doing their calculations (maybe at higher accuracy and bigger data types, so they can deal with very small and very large values), could you please do the same in your library?

@cesarsouza
Copy link
Member

cesarsouza commented Sep 26, 2016

Hi Burkhardus,

Thanks for the remarks! This is the reason the framework offers the LogProbabilityDensity(x) and similar functions. While exp(-600) cannot be represented using System.Double, its logarithm can. In this case, the solution is to keep the data as logarithms until the very end when you need to plot it. When you finally have to plot, first convert the samples to the log-space of your plot coordinates (which will put their exponential version into the normal range of System.Double) and then finally convert them with Math.Exp only after they have been normalized.

This is the traditional approach used in many libraries, including R and Python (that's the reason they include methods for computing log-probabilities in methods such as R's pgamma and Scipy's logpdf.

I will work out an example on how this can be done, but unfortunately I can't do it right now. But if you can, please try to keep the values as logarithms until the time you actually need to use them. If you have to compare the values against others, first see if it is possible to transform the others to the log-space instead of the ones you have into normal space.

Regards,
Cesar

@cesarsouza
Copy link
Member

cesarsouza commented Sep 26, 2016

Hi Burkhardus,

So, checking for the example you posted, it is not even necessary to go much further in the log route. Please consider the following example. It will give you the needle at 1 that you mentioned, reproducing the values at the Casio website:

double[] dataset2 = { 0.999, 0.996, 1.066, 1.041, 1.052, 1.043, 1.052, 1.046, 1.032, 1.038, 1.052, 1.038, 1.040, 1.056, 1.043, 1.044, 1.058, 1.044, 1.053, 1.046, 1.045, 1.046, 1.048, 1.051, 1.056, 1.058, 1.065, 1.051, 1.049, 1.049, 1.043, 1.042, 1.049, 1.042, 1.033, 1.042, 1.028, 1.031, 1.036, 1.044, 1.037, 1.054, 1.046, 1.061, 1.048, 1.048, 1.028, 1.030, 1.049, 1.038, 1.042, 1.057, 1.028, 1.047, 1.044, 1.040, 1.038, 1.044, 1.046, 1.035, 1.040, 1.039, 1.046, 1.050, 1.033, 1.032, 1.040, 1.040, 1.032, 1.045, 1.054, 1.040, 1.045, 1.045, 1.043, 1.066, 1.057, 1.052, 1.060, 1.069, 1.046, 1.041, 1.048, 1.057, 1.059, 1.076, 1.051, 1.057, 1.067, 1.062, 1.025, 1.072, 1.061, 1.067, 1.062, 1.055, 1.065, 1.047, 1.046, 1.047, 1.045, 1.038, 1.056, 1.037, 1.042, 1.044, 1.047, 1.037, 1.047, 1.041, 1.042, 1.042, 1.042, 1.040, 1.060, 1.066, 1.067, 1.077, 1.059, 1.067 };

var b = GammaDistribution.Estimate(dataset2);

double[] x = Vector.Range(0.0, 10.0, 0.05);
double[] pdf = x.Apply(b.LogProbabilityDensityFunction).Exp();

// result will be
new double[] { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1.60168777623766E-293, 5.51885646801681E-214, 6.87229105720087E-150, 3.05548162854266E-99, 2.07661256908294E-60, 4.84694366550593E-32, 5.26688946739727E-13, 0.0241774294264879, 30.7996636820675, 0.0054993586504859, 5.60031174239992E-13, 1.10634606299171E-28, 1.24123786224247E-49, 2.04031751961623E-75, 1.13865913049715E-105, 4.56124541509388E-140, 2.56251968244235E-178, 3.68510891441049E-220, 2.33359893081555E-265, 1.06302359249589E-313, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };;

Regards,
Cesar

@Burkhardus
Copy link
Author

Burkhardus commented Sep 26, 2016

Hi Cesar, yes after reading your pre-previous comment, I found it yesterday night, that I only need to use the log probability function and apply the exp function to its results. I wonder why it's so easy!

Why not hiding this all from the user and always do this internally inside the framework?

Thank you anyway for your efforts, I was not clear about this log probability function approach also in other languages like the mentioned "R" which is common in the field of statistics.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants