Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MLLIB] SPARK-2311: Added additional GLMs (Poisson and Gamma) into MLlib #1237

Closed
wants to merge 6 commits into from

Conversation

xwei-datageek
Copy link

SPARK-2311 - Added additional GLMs (Poisson and Gamma) into MLlib
implemented PoissonRegressionSGD and GammaRegressionSGD.

@pwendell
Copy link
Contributor

Would you mind creating a JIRA for this and formatting the title correctly? See the green box here - thanks!

https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark

@BaiGang
Copy link
Contributor

BaiGang commented Jun 27, 2014

Oops! I didn't notice this one. Created #1243 just now.

We actually implemented exactly the same idea of Poisson regression, with only some tiny differences on calculating the gradient of the negative log-likelihood and the test suites.

Commented inline in the code. Please check it.

val brzWeights = weights.toBreeze
val dotProd = brzWeights.dot(brzData)
val diff = math.exp(dotProd) - label
val loss = -dotProd * label + math.exp(dotProd) + fact(label.toInt)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can safely remove the fact(.) part, because it has virtually nothing to do with the resulted weights.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Removed it

@xwei-datageek xwei-datageek changed the title feature/glm [MLlib] SPARK-2311: Added additional GLMs (Poisson and Gamma) into MLlib Jun 27, 2014
@xwei-datageek xwei-datageek changed the title [MLlib] SPARK-2311: Added additional GLMs (Poisson and Gamma) into MLlib [MLLIB] SPARK-2311: Added additional GLMs (Poisson and Gamma) into MLlib Jun 27, 2014
xwei-datageek and others added 2 commits June 27, 2014 17:23
…est cases. Added a Poisson regression data generator for generating multi-dimensional test data.
@BaiGang
Copy link
Contributor

BaiGang commented Jun 30, 2014

Merging some of the features in #1243 to this PR via xwei-datageek#2. Please take a review.

LBFGS optimier and new test cases for Poisson and Gamma regression
@xwei-datageek
Copy link
Author

Could one of the admins verify this patch?

@BaiGang
Copy link
Contributor

BaiGang commented Jul 2, 2014

One more thing. Per our discussion in the line note, let's change SimpleUpdater to SquaredL2Updater.
:-)

@BaiGang
Copy link
Contributor

BaiGang commented Jul 8, 2014

@mengxr Please review this.

@mengxr
Copy link
Contributor

mengxr commented Jul 8, 2014

@xwei-datageek @BaiGang The current naming scheme Problem+Algorithm doesn't scale. I'm working on some standardized interfaces so that we can decouple them. Do you mind me doing the review after that is done? Thanks!

@BaiGang
Copy link
Contributor

BaiGang commented Jul 9, 2014

@mengxr Sure. Never mind. It will be great to have standard and decoupled interfaces. BTW, do we have a JIRA or pull request for tracking these changes?

@xwei-datageek
Copy link
Author

@mengxr I was just wondering when (approximately) will the standardized interfaces to decouple Problem+Algorithm be finished?

@mengxr
Copy link
Contributor

mengxr commented Aug 1, 2014

Sorry, I'm still working on it and will put the design doc to JIRA soon. But unfortunately, it may not be able to catch the v1.1 release.

@SparkQA
Copy link

SparkQA commented Sep 5, 2014

Can one of the admins verify this patch?

@srowen
Copy link
Member

srowen commented Mar 5, 2015

I imagine this is too far out of date, and perhaps obsolete given the new ML API coming. Mind closing this PR?

@BaiGang
Copy link
Contributor

BaiGang commented Mar 6, 2015

@srowen This work is originally for version 1.0.x and is pretty out-dated.

@xwei-datageek Xiaokai, I think it's ok to close this PR.

As for modeling count data using regression models, I think SparkR with glm package would be a good solution though I have not get deep into it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants