[MLLIB] SPARK-2311: Added additional GLMs (Poisson and Gamma) into MLlib #1237

xwei-datageek · 2014-06-26T22:11:37Z

SPARK-2311 - Added additional GLMs (Poisson and Gamma) into MLlib
implemented PoissonRegressionSGD and GammaRegressionSGD.

pwendell · 2014-06-27T04:45:21Z

Would you mind creating a JIRA for this and formatting the title correctly? See the green box here - thanks!

https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark

BaiGang · 2014-06-27T08:02:08Z

Oops! I didn't notice this one. Created #1243 just now.

We actually implemented exactly the same idea of Poisson regression, with only some tiny differences on calculating the gradient of the negative log-likelihood and the test suites.

Commented inline in the code. Please check it.

BaiGang · 2014-06-27T08:04:02Z

mllib/src/main/scala/org/apache/spark/mllib/optimization/Gradient.scala

+    val brzWeights = weights.toBreeze
+    val dotProd = brzWeights.dot(brzData)
+    val diff = math.exp(dotProd) - label
+    val loss = -dotProd * label + math.exp(dotProd) + fact(label.toInt)


We can safely remove the fact(.) part, because it has virtually nothing to do with the resulted weights.

Right. Removed it

…est cases. Added a Poisson regression data generator for generating multi-dimensional test data.

BaiGang · 2014-06-30T10:40:47Z

Merging some of the features in #1243 to this PR via xwei-datageek#2. Please take a review.

LBFGS optimier and new test cases for Poisson and Gamma regression

xwei-datageek · 2014-07-02T06:43:57Z

Could one of the admins verify this patch?

BaiGang · 2014-07-02T08:26:48Z

One more thing. Per our discussion in the line note, let's change SimpleUpdater to SquaredL2Updater.
:-)

BaiGang · 2014-07-08T02:36:36Z

@mengxr Please review this.

mengxr · 2014-07-08T20:39:01Z

@xwei-datageek @BaiGang The current naming scheme Problem+Algorithm doesn't scale. I'm working on some standardized interfaces so that we can decouple them. Do you mind me doing the review after that is done? Thanks!

BaiGang · 2014-07-09T05:53:12Z

@mengxr Sure. Never mind. It will be great to have standard and decoupled interfaces. BTW, do we have a JIRA or pull request for tracking these changes?

xwei-datageek · 2014-07-31T22:38:24Z

@mengxr I was just wondering when (approximately) will the standardized interfaces to decouple Problem+Algorithm be finished?

mengxr · 2014-08-01T04:46:50Z

Sorry, I'm still working on it and will put the design doc to JIRA soon. But unfortunately, it may not be able to catch the v1.1 release.

SparkQA · 2014-09-05T23:45:58Z

Can one of the admins verify this patch?

srowen · 2015-03-05T18:17:38Z

I imagine this is too far out of date, and perhaps obsolete given the new ML API coming. Mind closing this PR?

BaiGang · 2015-03-06T02:22:23Z

@srowen This work is originally for version 1.0.x and is pretty out-dated.

@xwei-datageek Xiaokai, I think it's ok to close this PR.

As for modeling count data using regression models, I think SparkR with glm package would be a good solution though I have not get deep into it.

xwei-datageek added 2 commits June 26, 2014 14:03

added Poisson and Gamma Regression

1081334

minor change

989e38e

BaiGang reviewed Jun 27, 2014
View reviewed changes

xwei-datageek changed the title ~~feature/glm~~ [MLlib] SPARK-2311: Added additional GLMs (Poisson and Gamma) into MLlib Jun 27, 2014

xwei-datageek changed the title ~~[MLlib] SPARK-2311: Added additional GLMs (Poisson and Gamma) into MLlib~~ [MLLIB] SPARK-2311: Added additional GLMs (Poisson and Gamma) into MLlib Jun 27, 2014

xwei-datageek and others added 2 commits June 27, 2014 17:23

removed factorial calculation in PoissonGradient

d26e774

Added LBFGS optimizer for Poisson and Gamma regression model. Added t…

eb5758a

…est cases. Added a Poisson regression data generator for generating multi-dimensional test data.

Merge pull request #2 from BaiGang/xk_glm

49cffaa

LBFGS optimier and new test cases for Poisson and Gamma regression

change SimpleUpdater to SquaredL2Updater

fa2b06a

BaiGang mentioned this pull request Jul 8, 2014

[MLLIB] SPARK-2303: Poisson regression model for count data #1243

Closed

asfgit closed this in 0cc8fcb Apr 12, 2015

wangyum pushed a commit that referenced this pull request May 26, 2023

[CARMEL-6525][MINOR] Support tag different drivers in the queue (#1237)

d8b4399

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MLLIB] SPARK-2311: Added additional GLMs (Poisson and Gamma) into MLlib #1237

[MLLIB] SPARK-2311: Added additional GLMs (Poisson and Gamma) into MLlib #1237

xwei-datageek commented Jun 26, 2014

pwendell commented Jun 27, 2014

BaiGang commented Jun 27, 2014

BaiGang Jun 27, 2014

xwei-datageek Jun 28, 2014

BaiGang commented Jun 30, 2014

xwei-datageek commented Jul 2, 2014

BaiGang commented Jul 2, 2014

BaiGang commented Jul 8, 2014

mengxr commented Jul 8, 2014

BaiGang commented Jul 9, 2014

xwei-datageek commented Jul 31, 2014

mengxr commented Aug 1, 2014

SparkQA commented Sep 5, 2014

srowen commented Mar 5, 2015

BaiGang commented Mar 6, 2015

[MLLIB] SPARK-2311: Added additional GLMs (Poisson and Gamma) into MLlib #1237

[MLLIB] SPARK-2311: Added additional GLMs (Poisson and Gamma) into MLlib #1237

Conversation

xwei-datageek commented Jun 26, 2014

pwendell commented Jun 27, 2014

BaiGang commented Jun 27, 2014

BaiGang Jun 27, 2014

Choose a reason for hiding this comment

xwei-datageek Jun 28, 2014

Choose a reason for hiding this comment

BaiGang commented Jun 30, 2014

xwei-datageek commented Jul 2, 2014

BaiGang commented Jul 2, 2014

BaiGang commented Jul 8, 2014

mengxr commented Jul 8, 2014

BaiGang commented Jul 9, 2014

xwei-datageek commented Jul 31, 2014

mengxr commented Aug 1, 2014

SparkQA commented Sep 5, 2014

srowen commented Mar 5, 2015

BaiGang commented Mar 6, 2015