Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-28985][PYTHON][ML][FOLLOW-UP] Add _IsotonicRegressionBase #26023

Conversation

zero323
Copy link
Member

@zero323 zero323 commented Oct 4, 2019

What changes were proposed in this pull request?

Adds

class _IsotonicRegressionBase(HasFeaturesCol, HasLabelCol, HasPredictionCol, HasWeightCol): ...

with related Params and uses it to replace JavaPredictor and HasWeightCol in IsotonicRegression base classes and JavaPredictionModel, in IsotonicRegressionModel base classes.

Why are the changes needed?

Previous work (#25776) on SPARK-28985 replaced JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredictionCol in IsotonicRegression and JavaModel in IsotonicRegressionModel with newly added JavaPredictor:

class JavaPredictor(JavaEstimator, JavaPredictorParams):

and JavaPredictionModel

class JavaPredictionModel(JavaModel, JavaPredictorParams):

respectively.

This however is inconsistent with Scala counterpart where both classes extend private IsotonicRegressionBase

private[regression] trait IsotonicRegressionBase extends Params with HasFeaturesCol
with HasLabelCol with HasPredictionCol with HasWeightCol with Logging {

This preserves some of the existing inconsistencies (model as defined in the official example), i.e.

from pyspark.ml.regression impor IsotonicRegressionMode
from pyspark.ml.param.shared import HasWeightCol

issubclass(IsotonicRegressionModel, HasWeightCol) 
# False

hasattr(model, "weightCol")                                                       
# True

as well as introduces a bug, by adding unsupported predict method:

import inspect

hasattr(model, "predict")                                                         
# True

inspect.getfullargspec(IsotonicRegressionModel.predict)                           
# FullArgSpec(args=['self', 'value'], varargs=None, varkw=None, defaults=None, kwonlyargs=[], kwonlydefaults=None, annotations={})

IsotonicRegressionModel.predict.__doc__                                                                                                                                                           
# Predict label for the given features.\n\n        .. versionadded:: 3.0.0'

model.predict(dataset.first().features)  

# Py4JError: An error occurred while calling o49.predict. Trace:
# py4j.Py4JException: Method predict([class org.apache.spark.ml.linalg.SparseVector]) does not exist
# ...

Furthermore existing implementation can cause further problems in the future, if Predictor / PredictionModel API changes.

Does this PR introduce any user-facing change?

Yes. It:

  • Removes invalid IsotonicRegressionModel.predict method.
  • Adds HasWeightColumn to IsotonicRegressionModel.

however the faulty implementation hasn't been released yet, and proposed additions have negligible potential for breaking existing code (and none, compared to changes already made in #25776).

How was this patch tested?

  • Existing unit tests.
  • Manual testing.

CC @huaxingao, @zhengruifeng

@SparkQA
Copy link

SparkQA commented Oct 4, 2019

Test build #111774 has finished for PR 26023 at commit e5183b8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class _IsotonicRegressionBase(HasFeaturesCol, HasLabelCol, HasPredictionCol, HasWeightCol):
  • class IsotonicRegression(JavaEstimator, _IsotonicRegressionBase, HasWeightCol,
  • class IsotonicRegressionModel(JavaModel, _IsotonicRegressionBase,

@zhengruifeng
Copy link
Contributor

LGTM

srowen pushed a commit that referenced this pull request Oct 4, 2019
### What changes were proposed in this pull request?

Adds

```python
_AFTSurvivalRegressionParams(HasFeaturesCol, HasLabelCol, HasPredictionCol,
                                   HasMaxIter, HasTol, HasFitIntercept,
                                   HasAggregationDepth): ...
```

with related Params and uses it to replace `HasFitIntercept`, `HasMaxIter`, `HasTol` and  `HasAggregationDepth` in `AFTSurvivalRegression` base classes and `JavaPredictionModel,` in `AFTSurvivalRegressionModel` base classes.

### Why are the changes needed?

Previous work (#25776) on [SPARK-28985](https://issues.apache.org/jira/browse/SPARK-28985) replaced `JavaEstimator`, `HasFeaturesCol`, `HasLabelCol`, `HasPredictionCol` in `AFTSurvivalRegression` and  `JavaModel` in `AFTSurvivalRegressionModel` with newly added `JavaPredictor`:

https://github.com/apache/spark/blob/e97b55d32285052a1f76cca35377c4b21eb2e7d7/python/pyspark/ml/wrapper.py#L377

and `JavaPredictionModel`

https://github.com/apache/spark/blob/e97b55d32285052a1f76cca35377c4b21eb2e7d7/python/pyspark/ml/wrapper.py#L405

respectively.

This however is inconsistent with Scala counterpart where both classes extend private `AFTSurvivalRegressionBase`

https://github.com/apache/spark/blob/eb037a8180be4ab7570eda1fa9cbf3c84b92c3f7/mllib/src/main/scala/org/apache/spark/ml/regression/AFTSurvivalRegression.scala#L48-L50

This preserves some of the existing inconsistencies (variables as defined in [the official example](https://github.com/apache/spark/blob/master/examples/src/main/python/ml/aft_survival_regression.p))

```
from pyspark.ml.regression import AFTSurvivalRegression, AFTSurvivalRegressionModel
from pyspark.ml.param.shared import HasMaxIter, HasTol, HasFitIntercept, HasAggregationDepth
from pyspark.ml.param import Param

issubclass(AFTSurvivalRegressionModel, HasMaxIter)
# False
hasattr(model, "maxIter")  and isinstance(model.maxIter, Param)
# True

issubclass(AFTSurvivalRegressionModel, HasTol)
# False
hasattr(model, "tol")  and isinstance(model.tol, Param)
# True
```

and can cause problems in the future, if Predictor / PredictionModel API changes (unlike [`IsotonicRegression`](#26023), current implementation is technically speaking correct, though incomplete).

### Does this PR introduce any user-facing change?

Yes, it adds a number of base classes to `AFTSurvivalRegressionModel`. These change purely additive and have negligible potential for breaking existing code (and none, compared to changes already made in #25776). Additionally affected API hasn't been released in the current form yet.

### How was this patch tested?

- Existing unit tests.
- Manual testing.

CC huaxingao, zhengruifeng

Closes #26024 from zero323/SPARK-28985-FOLLOW-UP-aftsurival-regression.

Authored-by: zero323 <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
@srowen srowen closed this in 8556710 Oct 4, 2019
@srowen
Copy link
Member

srowen commented Oct 4, 2019

Merged to master

@zero323
Copy link
Member Author

zero323 commented Oct 4, 2019

Thanks @srowen, @zhengruifeng!

@zero323 zero323 deleted the SPARK-28985-FOLLOW-UP-isotonic-regression branch October 4, 2019 23:16
zero323 added a commit to zero323/pyspark-stubs that referenced this pull request Oct 5, 2019
* Reflect changes introduced with SPARK-28985

* Add _IsotonicRegressionBase - apache/spark#26023

* Add _AFTSurvivalRegressionParams - apache/spark#26024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants