Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-6080] [PySpark] correct LogisticRegressionWithLBFGS regType parameter for pyspark #4831

Closed
wants to merge 1 commit into from

Conversation

yanboliang
Copy link
Contributor

Currently LogisticRegressionWithLBFGS in python/pyspark/mllib/classification.py will invoke callMLlibFunc with a wrong "regType" parameter.
It was assigned to "str(regType)" which translate None(Python) to "None"(Java/Scala). The right way should be translate None(Python) to null(Java/Scala) just as what we did at LogisticRegressionWithSGD.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@yanboliang yanboliang changed the title correct LogisticRegressionWithLBFGS regType parameter for pyspark [SPARK-6080] [PySpark] correct LogisticRegressionWithLBFGS regType parameter for pyspark Feb 28, 2015
@mengxr
Copy link
Contributor

mengxr commented Mar 1, 2015

add to whitelist

@mengxr
Copy link
Contributor

mengxr commented Mar 1, 2015

ok to test

@SparkQA
Copy link

SparkQA commented Mar 1, 2015

Test build #28150 has started for PR 4831 at commit 12db65a.

  • This patch merges cleanly.

@@ -207,7 +207,7 @@ def train(cls, data, iterations=100, initialWeights=None, regParam=0.01, regType
"""
def train(rdd, i):
return callMLlibFunc("trainLogisticRegressionModelWithLBFGS", rdd, int(iterations), i,
float(regParam), str(regType), bool(intercept), int(corrections),
float(regParam), regType, bool(intercept), int(corrections),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we use str(regType) if regType else None?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think directly use regType is enough, because py4j can translate "l1","l2",None in Python to "l1","l2",null in Java/Scala smoothly.
I found the peer functions LogisticRegressionWithSGD and SVMWithSGD also directly use regType and it can work well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference is the error message. If we use str(...), the Java function call would be successful and inside the Java function, we say the input regType is not recognized.

If we don't use str(..), if users input something like 1, 2. Py4j will throw an error saying there is no Java function matching the input signature, which usually confuses users.

Anyway, this is a minor issue for regType.

@SparkQA
Copy link

SparkQA commented Mar 1, 2015

Test build #28150 has finished for PR 4831 at commit 12db65a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28150/
Test PASSed.

@mengxr
Copy link
Contributor

mengxr commented Mar 2, 2015

LGTM. Merged into master and branch-1.3. Thanks!

@asfgit asfgit closed this in af2effd Mar 2, 2015
asfgit pushed a commit that referenced this pull request Mar 2, 2015
…rameter for pyspark

Currently LogisticRegressionWithLBFGS in python/pyspark/mllib/classification.py will invoke callMLlibFunc with a wrong "regType" parameter.
It was assigned to "str(regType)" which translate None(Python) to "None"(Java/Scala). The right way should be translate None(Python) to null(Java/Scala) just as what we did at LogisticRegressionWithSGD.

Author: Yanbo Liang <[email protected]>

Closes #4831 from yanboliang/pyspark_classification and squashes the following commits:

12db65a [Yanbo Liang] correct LogisticRegressionWithLBFGS regType parameter for pyspark

(cherry picked from commit af2effd)
Signed-off-by: Xiangrui Meng <[email protected]>
@yanboliang yanboliang deleted the pyspark_classification branch April 24, 2015 10:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants