Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-29212][ML][PYSPARK] Add common classes without using JVM backend
### What changes were proposed in this pull request? Implement common base ML classes (`Predictor`, `PredictionModel`, `Classifier`, `ClasssificationModel` `ProbabilisticClassifier`, `ProbabilisticClasssificationModel`, `Regressor`, `RegrssionModel`) for non-Java backends. Note - `Predictor` and `JavaClassifier` should be abstract as `_fit` method is not implemented. - `PredictionModel` should be abstract as `_transform` is not implemented. ### Why are the changes needed? To provide extensions points for non-JVM algorithms, as well as a public (as opposed to `Java*` variants, which are commonly described in docstrings as private) hierarchy which can be used to distinguish between different classes of predictors. For longer discussion see [SPARK-29212](https://issues.apache.org/jira/browse/SPARK-29212) and / or #25776. ### Does this PR introduce any user-facing change? It adds new base classes as listed above, but effective interfaces (method resolution order notwithstanding) stay the same. Additionally "private" `Java*` classes in`ml.regression` and `ml.classification` have been renamed to follow PEP-8 conventions (added leading underscore). It is for discussion if the same should be done to equivalent classes from `ml.wrapper`. If we take `JavaClassifier` as an example, type hierarchy will change from data:image/s3,"s3://crabby-images/7f9ce/7f9cec72ef447e253e11f113705fccce972253b7" alt="old pyspark ml classification JavaClassifier" to data:image/s3,"s3://crabby-images/4476a/4476ace3b63d7b2378314bfd250f78705f33517e" alt="new pyspark ml classification _JavaClassifier" Similarly the old model data:image/s3,"s3://crabby-images/bfedc/bfedc6283511136df256ede0bf1212de0be50778" alt="old pyspark ml classification JavaClassificationModel" will become data:image/s3,"s3://crabby-images/3a824/3a8241345451cfaf2c2a1446499c3e97eb61e3e4" alt="new pyspark ml classification _JavaClassificationModel" ### How was this patch tested? Existing unit tests. Closes #27245 from zero323/SPARK-29212. Authored-by: zero323 <[email protected]> Signed-off-by: zhengruifeng <[email protected]>
- Loading branch information