-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-29212][ML][PYSPARK] Add common classes without using JVM backend #27245
Conversation
Test build #116878 has finished for PR 27245 at commit
|
04ed1f1
to
577d09c
Compare
Test build #116881 has finished for PR 27245 at commit
|
Test build #116883 has finished for PR 27245 at commit
|
Test build #116880 has finished for PR 27245 at commit
|
Test build #116885 has finished for PR 27245 at commit
|
3a6ab3b
to
be45e3b
Compare
Test build #116886 has finished for PR 27245 at commit
|
Test build #116887 has finished for PR 27245 at commit
|
Test build #116969 has finished for PR 27245 at commit
|
Test build #116970 has finished for PR 27245 at commit
|
Test build #116984 has finished for PR 27245 at commit
|
CC @huaxingao @srowen Forwarding, since you participated in the discussion. |
Test build #117491 has finished for PR 27245 at commit
|
Test build #117493 has finished for PR 27245 at commit
|
Test build #117751 has finished for PR 27245 at commit
|
Test build #117754 has finished for PR 27245 at commit
|
Test build #118184 has finished for PR 27245 at commit
|
In general LGTM, pending update version info 3.0.0 to 3.1.0 |
The changes look good to me. |
Yeah, I was thinking about the right way to tackle this. While we introduce new objects in type hierarchy, the API of the existing classes doesn't change, compared to what we get in 3.0. So I felt that keeping |
Test build #118680 has finished for PR 27245 at commit
|
retest this please |
Test build #119274 has finished for PR 27245 at commit
|
Merged to master, thanks all! |
Thanks! |
### What changes were proposed in this pull request? Implement common base ML classes (`Predictor`, `PredictionModel`, `Classifier`, `ClasssificationModel` `ProbabilisticClassifier`, `ProbabilisticClasssificationModel`, `Regressor`, `RegrssionModel`) for non-Java backends. Note - `Predictor` and `JavaClassifier` should be abstract as `_fit` method is not implemented. - `PredictionModel` should be abstract as `_transform` is not implemented. ### Why are the changes needed? To provide extensions points for non-JVM algorithms, as well as a public (as opposed to `Java*` variants, which are commonly described in docstrings as private) hierarchy which can be used to distinguish between different classes of predictors. For longer discussion see [SPARK-29212](https://issues.apache.org/jira/browse/SPARK-29212) and / or apache#25776. ### Does this PR introduce any user-facing change? It adds new base classes as listed above, but effective interfaces (method resolution order notwithstanding) stay the same. Additionally "private" `Java*` classes in`ml.regression` and `ml.classification` have been renamed to follow PEP-8 conventions (added leading underscore). It is for discussion if the same should be done to equivalent classes from `ml.wrapper`. If we take `JavaClassifier` as an example, type hierarchy will change from data:image/s3,"s3://crabby-images/7f9ce/7f9cec72ef447e253e11f113705fccce972253b7" alt="old pyspark ml classification JavaClassifier" to data:image/s3,"s3://crabby-images/4476a/4476ace3b63d7b2378314bfd250f78705f33517e" alt="new pyspark ml classification _JavaClassifier" Similarly the old model data:image/s3,"s3://crabby-images/bfedc/bfedc6283511136df256ede0bf1212de0be50778" alt="old pyspark ml classification JavaClassificationModel" will become data:image/s3,"s3://crabby-images/3a824/3a8241345451cfaf2c2a1446499c3e97eb61e3e4" alt="new pyspark ml classification _JavaClassificationModel" ### How was this patch tested? Existing unit tests. Closes apache#27245 from zero323/SPARK-29212. Authored-by: zero323 <[email protected]> Signed-off-by: zhengruifeng <[email protected]>
What changes were proposed in this pull request?
Implement common base ML classes (
Predictor
,PredictionModel
,Classifier
,ClasssificationModel
ProbabilisticClassifier
,ProbabilisticClasssificationModel
,Regressor
,RegrssionModel
) for non-Java backends.Note
Predictor
andJavaClassifier
should be abstract as_fit
method is not implemented.PredictionModel
should be abstract as_transform
is not implemented.Why are the changes needed?
To provide extensions points for non-JVM algorithms, as well as a public (as opposed to
Java*
variants, which are commonly described in docstrings as private) hierarchy which can be used to distinguish between different classes of predictors.For longer discussion see SPARK-29212 and / or #25776.
Does this PR introduce any user-facing change?
It adds new base classes as listed above, but effective interfaces (method resolution order notwithstanding) stay the same.
Additionally "private"
Java*
classes inml.regression
andml.classification
have been renamed to follow PEP-8 conventions (added leading underscore).It is for discussion if the same should be done to equivalent classes from
ml.wrapper
.If we take
JavaClassifier
as an example, type hierarchy will change fromto
Similarly the old model
will become
How was this patch tested?
Existing unit tests.