Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-29212][ML][PYSPARK] Add common classes without using JVM backend #27245

Closed
wants to merge 2 commits into from

Conversation

zero323
Copy link
Member

@zero323 zero323 commented Jan 17, 2020

What changes were proposed in this pull request?

Implement common base ML classes (Predictor, PredictionModel, Classifier, ClasssificationModel ProbabilisticClassifier, ProbabilisticClasssificationModel, Regressor, RegrssionModel) for non-Java backends.

Note

  • Predictor and JavaClassifier should be abstract as _fit method is not implemented.
  • PredictionModel should be abstract as _transform is not implemented.

Why are the changes needed?

To provide extensions points for non-JVM algorithms, as well as a public (as opposed to Java* variants, which are commonly described in docstrings as private) hierarchy which can be used to distinguish between different classes of predictors.

For longer discussion see SPARK-29212 and / or #25776.

Does this PR introduce any user-facing change?

It adds new base classes as listed above, but effective interfaces (method resolution order notwithstanding) stay the same.

Additionally "private" Java* classes inml.regression and ml.classification have been renamed to follow PEP-8 conventions (added leading underscore).

It is for discussion if the same should be done to equivalent classes from ml.wrapper.

If we take JavaClassifier as an example, type hierarchy will change from

old pyspark ml classification JavaClassifier

to

new pyspark ml classification _JavaClassifier

Similarly the old model

old pyspark ml classification JavaClassificationModel

will become

new pyspark ml classification _JavaClassificationModel

How was this patch tested?

Existing unit tests.

@SparkQA
Copy link

SparkQA commented Jan 17, 2020

Test build #116878 has finished for PR 27245 at commit 71f09c4.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class Predictor(Estimator, _PredictorParams):
  • class PredictionModel(Model, _PredictorParams):
  • class JavaPredictor(JavaEstimator, Predictor, _PredictorParams):
  • class JavaPredictionModel(JavaModel, PredictionModel, _PredictorParams):

@zero323 zero323 force-pushed the SPARK-29212 branch 2 times, most recently from 04ed1f1 to 577d09c Compare January 17, 2020 01:45
@SparkQA
Copy link

SparkQA commented Jan 17, 2020

Test build #116881 has finished for PR 27245 at commit 898de5e.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class Predictor(Estimator, _PredictorParams, metaclass=ABCMeta):
  • class PredictionModel(Model, _PredictorParams, metaclass=ABCMeta):
  • class JavaClassifier(JavaPredictor, _JavaClassifierParams, metaclass=ABCMeta):
  • class JavaPredictor(JavaEstimator, Predictor, _PredictorParams, metaclass=ABCMeta):

@SparkQA
Copy link

SparkQA commented Jan 17, 2020

Test build #116883 has finished for PR 27245 at commit d29196b.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class Predictor(Estimator, _PredictorParams):
  • class PredictionModel(Model, _PredictorParams):
  • class JavaClassifier(JavaPredictor, _JavaClassifierParams):

@SparkQA
Copy link

SparkQA commented Jan 17, 2020

Test build #116880 has finished for PR 27245 at commit 04ed1f1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class Predictor(Estimator, _PredictorParams):
  • class PredictionModel(Model, _PredictorParams):
  • class JavaPredictor(JavaEstimator, Predictor, _PredictorParams):
  • class JavaPredictionModel(JavaModel, PredictionModel, _PredictorParams):

@SparkQA
Copy link

SparkQA commented Jan 17, 2020

Test build #116885 has finished for PR 27245 at commit dc654b7.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class Predictor(Estimator, _PredictorParams):
  • class PredictionModel(Model, _PredictorParams):
  • class JavaClassifier(JavaPredictor, _JavaClassifierParams):

@zero323 zero323 force-pushed the SPARK-29212 branch 2 times, most recently from 3a6ab3b to be45e3b Compare January 17, 2020 02:33
@SparkQA
Copy link

SparkQA commented Jan 17, 2020

Test build #116886 has finished for PR 27245 at commit 3a6ab3b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class Predictor(Estimator, _PredictorParams):
  • class PredictionModel(Model, _PredictorParams):
  • class JavaClassifier(JavaPredictor, _JavaClassifierParams):
  • class JavaPredictor(JavaEstimator, Predictor, _PredictorParams):

@SparkQA
Copy link

SparkQA commented Jan 17, 2020

Test build #116887 has finished for PR 27245 at commit be45e3b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class Predictor(Estimator, _PredictorParams):
  • class PredictionModel(Model, _PredictorParams):
  • class JavaClassifier(JavaPredictor, _JavaClassifierParams):
  • class JavaPredictor(JavaEstimator, Predictor, _PredictorParams):

@SparkQA
Copy link

SparkQA commented Jan 18, 2020

Test build #116969 has finished for PR 27245 at commit bc2ebe8.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class _PredictorParams(HasLabelCol, HasFeaturesCol, HasPredictionCol):
  • class Predictor(Estimator, _PredictorParams):
  • class PredictionModel(Model, _PredictorParams):
  • class _ClassifierParams(HasRawPredictionCol, _PredictorParams):
  • class Classifier(Predictor, _ClassifierParams):
  • class ClassificationModel(PredictionModel, _ClassifierParams):
  • class _ProbabilisticClassifierParams(HasProbabilityCol, HasThresholds, _ClassifierParams):
  • class ProbabilisticClassifier(Classifier, _ProbabilisticClassifierParams):
  • class ProbabilisticClassificationModel(ClassificationModel,
  • class _JavaClassifier(Classifier, JavaPredictor):
  • class _JavaClassificationModel(ClassificationModel, JavaPredictionModel):
  • class _JavaProbabilisticClassifier(ProbabilisticClassifier, _JavaClassifier):
  • class _JavaProbabilisticClassificationModel(ProbabilisticClassificationModel,
  • class _LinearSVCParams(_ClassifierParams, HasRegParam, HasMaxIter, HasFitIntercept, HasTol,
  • class LinearSVC(_JavaClassifier, _LinearSVCParams, JavaMLWritable, JavaMLReadable):
  • class LinearSVCModel(_JavaClassificationModel, _LinearSVCParams, JavaMLWritable, JavaMLReadable):
  • class _LogisticRegressionParams(_ProbabilisticClassifierParams, HasRegParam,
  • class LogisticRegression(_JavaProbabilisticClassifier, _LogisticRegressionParams, JavaMLWritable,
  • class LogisticRegressionModel(_JavaProbabilisticClassificationModel, _LogisticRegressionParams,
  • class DecisionTreeClassifier(_JavaProbabilisticClassifier, _DecisionTreeClassifierParams,
  • class DecisionTreeClassificationModel(_DecisionTreeModel, _JavaProbabilisticClassificationModel,
  • class RandomForestClassifier(_JavaProbabilisticClassifier, _RandomForestClassifierParams,
  • class RandomForestClassificationModel(_TreeEnsembleModel, _JavaProbabilisticClassificationModel,
  • class GBTClassifier(_JavaProbabilisticClassifier, _GBTClassifierParams,
  • class GBTClassificationModel(_TreeEnsembleModel, _JavaProbabilisticClassificationModel,
  • class _NaiveBayesParams(_PredictorParams, HasWeightCol):
  • class NaiveBayes(_JavaProbabilisticClassifier, _NaiveBayesParams, HasThresholds, HasWeightCol,
  • class NaiveBayesModel(_JavaProbabilisticClassificationModel, _NaiveBayesParams, JavaMLWritable,
  • class _MultilayerPerceptronParams(_ProbabilisticClassifierParams, HasSeed, HasMaxIter,
  • class MultilayerPerceptronClassifier(_JavaProbabilisticClassifier, _MultilayerPerceptronParams,
  • class MultilayerPerceptronClassificationModel(_JavaProbabilisticClassificationModel,
  • class _OneVsRestParams(_ClassifierParams, HasWeightCol):
  • class FMClassifier(_JavaProbabilisticClassifier, _FactorizationMachinesParams, JavaMLWritable,
  • class FMClassificationModel(_JavaProbabilisticClassificationModel, _FactorizationMachinesParams,
  • class Regressor(Predictor, _PredictorParams):
  • class RegressionModel(PredictionModel, _PredictorParams):
  • class _JavaRegressor(Regressor, JavaPredictor):
  • class _JavaRegressionModel(RegressionModel, JavaPredictionModel):
  • class _LinearRegressionParams(_PredictorParams, HasRegParam, HasElasticNetParam, HasMaxIter,
  • class LinearRegression(_JavaRegressor, _LinearRegressionParams, JavaMLWritable, JavaMLReadable):
  • class LinearRegressionModel(_JavaRegressionModel, _LinearRegressionParams, GeneralJavaMLWritable,
  • class DecisionTreeRegressor(_JavaRegressor, _DecisionTreeRegressorParams, JavaMLWritable,
  • class DecisionTreeRegressionModel(
  • class RandomForestRegressor(_JavaRegressor, _RandomForestRegressorParams, JavaMLWritable,
  • class RandomForestRegressionModel(
  • class GBTRegressor(_JavaRegressor, _GBTRegressorParams, JavaMLWritable, JavaMLReadable):
  • class GBTRegressionModel(
  • class _AFTSurvivalRegressionParams(_PredictorParams, HasMaxIter, HasTol, HasFitIntercept,
  • class AFTSurvivalRegression(_JavaRegressor, _AFTSurvivalRegressionParams,
  • class AFTSurvivalRegressionModel(_JavaRegressionModel, _AFTSurvivalRegressionParams,
  • class _GeneralizedLinearRegressionParams(_PredictorParams, HasFitIntercept, HasMaxIter,
  • class GeneralizedLinearRegression(_JavaRegressor, _GeneralizedLinearRegressionParams,
  • class GeneralizedLinearRegressionModel(_JavaRegressionModel, _GeneralizedLinearRegressionParams,
  • class _FactorizationMachinesParams(_PredictorParams, HasMaxIter, HasStepSize, HasTol,
  • class FMRegressor(_JavaRegressor, _FactorizationMachinesParams, JavaMLWritable, JavaMLReadable):
  • class FMRegressionModel(_JavaRegressionModel, _FactorizationMachinesParams, JavaMLWritable,
  • class JavaPredictor(Predictor, JavaEstimator, _PredictorParams):
  • class JavaPredictionModel(PredictionModel, JavaModel, _PredictorParams):

@SparkQA
Copy link

SparkQA commented Jan 18, 2020

Test build #116970 has finished for PR 27245 at commit d8266c2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zero323 zero323 changed the title [WIP][SPARK-29212][ML][PYSPARK] Add common classes without using JVM backend [SPARK-29212][ML][PYSPARK] Add common classes without using JVM backend Jan 18, 2020
@zero323 zero323 requested a review from zhengruifeng January 18, 2020 03:14
@SparkQA
Copy link

SparkQA commented Jan 19, 2020

Test build #116984 has finished for PR 27245 at commit b852fc6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class _PredictorParams(HasLabelCol, HasFeaturesCol, HasPredictionCol):
  • class Predictor(Estimator, _PredictorParams):
  • class PredictionModel(Model, _PredictorParams):
  • class _ClassifierParams(HasRawPredictionCol, _PredictorParams):
  • class Classifier(Predictor, _ClassifierParams):
  • class ClassificationModel(PredictionModel, _ClassifierParams):
  • class _ProbabilisticClassifierParams(HasProbabilityCol, HasThresholds, _ClassifierParams):
  • class ProbabilisticClassifier(Classifier, _ProbabilisticClassifierParams):
  • class ProbabilisticClassificationModel(ClassificationModel,
  • class _JavaClassifier(Classifier, JavaPredictor):
  • class _JavaClassificationModel(ClassificationModel, JavaPredictionModel):
  • class _JavaProbabilisticClassifier(ProbabilisticClassifier, _JavaClassifier):
  • class _JavaProbabilisticClassificationModel(ProbabilisticClassificationModel,
  • class _LinearSVCParams(_ClassifierParams, HasRegParam, HasMaxIter, HasFitIntercept, HasTol,
  • class LinearSVC(_JavaClassifier, _LinearSVCParams, JavaMLWritable, JavaMLReadable):
  • class LinearSVCModel(_JavaClassificationModel, _LinearSVCParams, JavaMLWritable, JavaMLReadable):
  • class _LogisticRegressionParams(_ProbabilisticClassifierParams, HasRegParam,
  • class LogisticRegression(_JavaProbabilisticClassifier, _LogisticRegressionParams, JavaMLWritable,
  • class LogisticRegressionModel(_JavaProbabilisticClassificationModel, _LogisticRegressionParams,
  • class DecisionTreeClassifier(_JavaProbabilisticClassifier, _DecisionTreeClassifierParams,
  • class DecisionTreeClassificationModel(_DecisionTreeModel, _JavaProbabilisticClassificationModel,
  • class RandomForestClassifier(_JavaProbabilisticClassifier, _RandomForestClassifierParams,
  • class RandomForestClassificationModel(_TreeEnsembleModel, _JavaProbabilisticClassificationModel,
  • class GBTClassifier(_JavaProbabilisticClassifier, _GBTClassifierParams,
  • class GBTClassificationModel(_TreeEnsembleModel, _JavaProbabilisticClassificationModel,
  • class _NaiveBayesParams(_PredictorParams, HasWeightCol):
  • class NaiveBayes(_JavaProbabilisticClassifier, _NaiveBayesParams, HasThresholds, HasWeightCol,
  • class NaiveBayesModel(_JavaProbabilisticClassificationModel, _NaiveBayesParams, JavaMLWritable,
  • class _MultilayerPerceptronParams(_ProbabilisticClassifierParams, HasSeed, HasMaxIter,
  • class MultilayerPerceptronClassifier(_JavaProbabilisticClassifier, _MultilayerPerceptronParams,
  • class MultilayerPerceptronClassificationModel(_JavaProbabilisticClassificationModel,
  • class _OneVsRestParams(_ClassifierParams, HasWeightCol):
  • class FMClassifier(_JavaProbabilisticClassifier, _FactorizationMachinesParams, JavaMLWritable,
  • class FMClassificationModel(_JavaProbabilisticClassificationModel, _FactorizationMachinesParams,
  • class Regressor(Predictor, _PredictorParams):
  • class RegressionModel(PredictionModel, _PredictorParams):
  • class _JavaRegressor(Regressor, JavaPredictor):
  • class _JavaRegressionModel(RegressionModel, JavaPredictionModel):
  • class _LinearRegressionParams(_PredictorParams, HasRegParam, HasElasticNetParam, HasMaxIter,
  • class LinearRegression(_JavaRegressor, _LinearRegressionParams, JavaMLWritable, JavaMLReadable):
  • class LinearRegressionModel(_JavaRegressionModel, _LinearRegressionParams, GeneralJavaMLWritable,
  • class DecisionTreeRegressor(_JavaRegressor, _DecisionTreeRegressorParams, JavaMLWritable,
  • class RandomForestRegressor(_JavaRegressor, _RandomForestRegressorParams, JavaMLWritable,
  • class GBTRegressor(_JavaRegressor, _GBTRegressorParams, JavaMLWritable, JavaMLReadable):
  • class _AFTSurvivalRegressionParams(_PredictorParams, HasMaxIter, HasTol, HasFitIntercept,
  • class AFTSurvivalRegression(_JavaRegressor, _AFTSurvivalRegressionParams,
  • class AFTSurvivalRegressionModel(_JavaRegressionModel, _AFTSurvivalRegressionParams,
  • class _GeneralizedLinearRegressionParams(_PredictorParams, HasFitIntercept, HasMaxIter,
  • class GeneralizedLinearRegression(_JavaRegressor, _GeneralizedLinearRegressionParams,
  • class GeneralizedLinearRegressionModel(_JavaRegressionModel, _GeneralizedLinearRegressionParams,
  • class _FactorizationMachinesParams(_PredictorParams, HasMaxIter, HasStepSize, HasTol,
  • class FMRegressor(_JavaRegressor, _FactorizationMachinesParams, JavaMLWritable, JavaMLReadable):
  • class FMRegressionModel(_JavaRegressionModel, _FactorizationMachinesParams, JavaMLWritable,
  • class JavaPredictor(Predictor, JavaEstimator, _PredictorParams):
  • class JavaPredictionModel(PredictionModel, JavaModel, _PredictorParams):

@zero323
Copy link
Member Author

zero323 commented Jan 26, 2020

CC @huaxingao @srowen Forwarding, since you participated in the discussion.

@SparkQA
Copy link

SparkQA commented Jan 28, 2020

Test build #117491 has finished for PR 27245 at commit 5d24795.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class _PredictorParams(HasLabelCol, HasFeaturesCol, HasPredictionCol):
  • class Predictor(Estimator, _PredictorParams):
  • class PredictionModel(Model, _PredictorParams):
  • class _ClassifierParams(HasRawPredictionCol, _PredictorParams):
  • class Classifier(Predictor, _ClassifierParams):
  • class ClassificationModel(PredictionModel, _ClassifierParams):
  • class _ProbabilisticClassifierParams(HasProbabilityCol, HasThresholds, _ClassifierParams):
  • class ProbabilisticClassifier(Classifier, _ProbabilisticClassifierParams):
  • class ProbabilisticClassificationModel(ClassificationModel,
  • class _JavaClassifier(Classifier, JavaPredictor):
  • class _JavaClassificationModel(ClassificationModel, JavaPredictionModel):
  • class _JavaProbabilisticClassifier(ProbabilisticClassifier, _JavaClassifier):
  • class _JavaProbabilisticClassificationModel(ProbabilisticClassificationModel,
  • class _LinearSVCParams(_ClassifierParams, HasRegParam, HasMaxIter, HasFitIntercept, HasTol,
  • class LinearSVC(_JavaClassifier, _LinearSVCParams, JavaMLWritable, JavaMLReadable):
  • class LinearSVCModel(_JavaClassificationModel, _LinearSVCParams, JavaMLWritable, JavaMLReadable):
  • class _LogisticRegressionParams(_ProbabilisticClassifierParams, HasRegParam,
  • class LogisticRegression(_JavaProbabilisticClassifier, _LogisticRegressionParams, JavaMLWritable,
  • class LogisticRegressionModel(_JavaProbabilisticClassificationModel, _LogisticRegressionParams,
  • class DecisionTreeClassifier(_JavaProbabilisticClassifier, _DecisionTreeClassifierParams,
  • class DecisionTreeClassificationModel(_DecisionTreeModel, _JavaProbabilisticClassificationModel,
  • class RandomForestClassifier(_JavaProbabilisticClassifier, _RandomForestClassifierParams,
  • class RandomForestClassificationModel(_TreeEnsembleModel, _JavaProbabilisticClassificationModel,
  • class GBTClassifier(_JavaProbabilisticClassifier, _GBTClassifierParams,
  • class GBTClassificationModel(_TreeEnsembleModel, _JavaProbabilisticClassificationModel,
  • class _NaiveBayesParams(_PredictorParams, HasWeightCol):
  • class NaiveBayes(_JavaProbabilisticClassifier, _NaiveBayesParams, HasThresholds, HasWeightCol,
  • class NaiveBayesModel(_JavaProbabilisticClassificationModel, _NaiveBayesParams, JavaMLWritable,
  • class _MultilayerPerceptronParams(_ProbabilisticClassifierParams, HasSeed, HasMaxIter,
  • class MultilayerPerceptronClassifier(_JavaProbabilisticClassifier, _MultilayerPerceptronParams,
  • class MultilayerPerceptronClassificationModel(_JavaProbabilisticClassificationModel,
  • class _OneVsRestParams(_ClassifierParams, HasWeightCol):
  • class FMClassifier(_JavaProbabilisticClassifier, _FactorizationMachinesParams, JavaMLWritable,
  • class FMClassificationModel(_JavaProbabilisticClassificationModel, _FactorizationMachinesParams,
  • class Regressor(Predictor, _PredictorParams):
  • class RegressionModel(PredictionModel, _PredictorParams):
  • class _JavaRegressor(Regressor, JavaPredictor):
  • class _JavaRegressionModel(RegressionModel, JavaPredictionModel):
  • class _LinearRegressionParams(_PredictorParams, HasRegParam, HasElasticNetParam, HasMaxIter,
  • class LinearRegression(_JavaRegressor, _LinearRegressionParams, JavaMLWritable, JavaMLReadable):
  • class LinearRegressionModel(_JavaRegressionModel, _LinearRegressionParams, GeneralJavaMLWritable,
  • class DecisionTreeRegressor(_JavaRegressor, _DecisionTreeRegressorParams, JavaMLWritable,
  • class RandomForestRegressor(_JavaRegressor, _RandomForestRegressorParams, JavaMLWritable,
  • class GBTRegressor(_JavaRegressor, _GBTRegressorParams, JavaMLWritable, JavaMLReadable):
  • class _AFTSurvivalRegressionParams(_PredictorParams, HasMaxIter, HasTol, HasFitIntercept,
  • class AFTSurvivalRegression(_JavaRegressor, _AFTSurvivalRegressionParams,
  • class AFTSurvivalRegressionModel(_JavaRegressionModel, _AFTSurvivalRegressionParams,
  • class _GeneralizedLinearRegressionParams(_PredictorParams, HasFitIntercept, HasMaxIter,
  • class GeneralizedLinearRegression(_JavaRegressor, _GeneralizedLinearRegressionParams,
  • class GeneralizedLinearRegressionModel(_JavaRegressionModel, _GeneralizedLinearRegressionParams,
  • class _FactorizationMachinesParams(_PredictorParams, HasMaxIter, HasStepSize, HasTol,
  • class FMRegressor(_JavaRegressor, _FactorizationMachinesParams, JavaMLWritable, JavaMLReadable):
  • class FMRegressionModel(_JavaRegressionModel, _FactorizationMachinesParams, JavaMLWritable,
  • class JavaPredictor(Predictor, JavaEstimator, _PredictorParams):
  • class JavaPredictionModel(PredictionModel, JavaModel, _PredictorParams):

@SparkQA
Copy link

SparkQA commented Jan 28, 2020

Test build #117493 has finished for PR 27245 at commit 755efd3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class _PredictorParams(HasLabelCol, HasFeaturesCol, HasPredictionCol):
  • class Predictor(Estimator, _PredictorParams):
  • class PredictionModel(Model, _PredictorParams):
  • class _ClassifierParams(HasRawPredictionCol, _PredictorParams):
  • class Classifier(Predictor, _ClassifierParams):
  • class ClassificationModel(PredictionModel, _ClassifierParams):
  • class _ProbabilisticClassifierParams(HasProbabilityCol, HasThresholds, _ClassifierParams):
  • class ProbabilisticClassifier(Classifier, _ProbabilisticClassifierParams):
  • class ProbabilisticClassificationModel(ClassificationModel,
  • class _JavaClassifier(Classifier, JavaPredictor):
  • class _JavaClassificationModel(ClassificationModel, JavaPredictionModel):
  • class _JavaProbabilisticClassifier(ProbabilisticClassifier, _JavaClassifier):
  • class _JavaProbabilisticClassificationModel(ProbabilisticClassificationModel,
  • class _LinearSVCParams(_ClassifierParams, HasRegParam, HasMaxIter, HasFitIntercept, HasTol,
  • class LinearSVC(_JavaClassifier, _LinearSVCParams, JavaMLWritable, JavaMLReadable):
  • class LinearSVCModel(_JavaClassificationModel, _LinearSVCParams, JavaMLWritable, JavaMLReadable):
  • class _LogisticRegressionParams(_ProbabilisticClassifierParams, HasRegParam,
  • class LogisticRegression(_JavaProbabilisticClassifier, _LogisticRegressionParams, JavaMLWritable,
  • class LogisticRegressionModel(_JavaProbabilisticClassificationModel, _LogisticRegressionParams,
  • class DecisionTreeClassifier(_JavaProbabilisticClassifier, _DecisionTreeClassifierParams,
  • class DecisionTreeClassificationModel(_DecisionTreeModel, _JavaProbabilisticClassificationModel,
  • class RandomForestClassifier(_JavaProbabilisticClassifier, _RandomForestClassifierParams,
  • class RandomForestClassificationModel(_TreeEnsembleModel, _JavaProbabilisticClassificationModel,
  • class GBTClassifier(_JavaProbabilisticClassifier, _GBTClassifierParams,
  • class GBTClassificationModel(_TreeEnsembleModel, _JavaProbabilisticClassificationModel,
  • class _NaiveBayesParams(_PredictorParams, HasWeightCol):
  • class NaiveBayes(_JavaProbabilisticClassifier, _NaiveBayesParams, HasThresholds, HasWeightCol,
  • class NaiveBayesModel(_JavaProbabilisticClassificationModel, _NaiveBayesParams, JavaMLWritable,
  • class _MultilayerPerceptronParams(_ProbabilisticClassifierParams, HasSeed, HasMaxIter,
  • class MultilayerPerceptronClassifier(_JavaProbabilisticClassifier, _MultilayerPerceptronParams,
  • class MultilayerPerceptronClassificationModel(_JavaProbabilisticClassificationModel,
  • class _OneVsRestParams(_ClassifierParams, HasWeightCol):
  • class FMClassifier(_JavaProbabilisticClassifier, _FactorizationMachinesParams, JavaMLWritable,
  • class FMClassificationModel(_JavaProbabilisticClassificationModel, _FactorizationMachinesParams,
  • class Regressor(Predictor, _PredictorParams):
  • class RegressionModel(PredictionModel, _PredictorParams):
  • class _JavaRegressor(Regressor, JavaPredictor):
  • class _JavaRegressionModel(RegressionModel, JavaPredictionModel):
  • class _LinearRegressionParams(_PredictorParams, HasRegParam, HasElasticNetParam, HasMaxIter,
  • class LinearRegression(_JavaRegressor, _LinearRegressionParams, JavaMLWritable, JavaMLReadable):
  • class LinearRegressionModel(_JavaRegressionModel, _LinearRegressionParams, GeneralJavaMLWritable,
  • class DecisionTreeRegressor(_JavaRegressor, _DecisionTreeRegressorParams, JavaMLWritable,
  • class RandomForestRegressor(_JavaRegressor, _RandomForestRegressorParams, JavaMLWritable,
  • class GBTRegressor(_JavaRegressor, _GBTRegressorParams, JavaMLWritable, JavaMLReadable):
  • class _AFTSurvivalRegressionParams(_PredictorParams, HasMaxIter, HasTol, HasFitIntercept,
  • class AFTSurvivalRegression(_JavaRegressor, _AFTSurvivalRegressionParams,
  • class AFTSurvivalRegressionModel(_JavaRegressionModel, _AFTSurvivalRegressionParams,
  • class _GeneralizedLinearRegressionParams(_PredictorParams, HasFitIntercept, HasMaxIter,
  • class GeneralizedLinearRegression(_JavaRegressor, _GeneralizedLinearRegressionParams,
  • class GeneralizedLinearRegressionModel(_JavaRegressionModel, _GeneralizedLinearRegressionParams,
  • class _FactorizationMachinesParams(_PredictorParams, HasMaxIter, HasStepSize, HasTol,
  • class FMRegressor(_JavaRegressor, _FactorizationMachinesParams, JavaMLWritable, JavaMLReadable):
  • class FMRegressionModel(_JavaRegressionModel, _FactorizationMachinesParams, JavaMLWritable,
  • class JavaPredictor(Predictor, JavaEstimator, _PredictorParams):
  • class JavaPredictionModel(PredictionModel, JavaModel, _PredictorParams):

@SparkQA
Copy link

SparkQA commented Feb 2, 2020

Test build #117751 has finished for PR 27245 at commit 40f9aca.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class _PredictorParams(HasLabelCol, HasFeaturesCol, HasPredictionCol):
  • class Predictor(Estimator, _PredictorParams):
  • class PredictionModel(Model, _PredictorParams):
  • class _ClassifierParams(HasRawPredictionCol, _PredictorParams):
  • class Classifier(Predictor, _ClassifierParams):
  • class ClassificationModel(PredictionModel, _ClassifierParams):
  • class _ProbabilisticClassifierParams(HasProbabilityCol, HasThresholds, _ClassifierParams):
  • class ProbabilisticClassifier(Classifier, _ProbabilisticClassifierParams):
  • class ProbabilisticClassificationModel(ClassificationModel,
  • class _JavaClassifier(Classifier, JavaPredictor):
  • class _JavaClassificationModel(ClassificationModel, JavaPredictionModel):
  • class _JavaProbabilisticClassifier(ProbabilisticClassifier, _JavaClassifier):
  • class _JavaProbabilisticClassificationModel(ProbabilisticClassificationModel,
  • class _LinearSVCParams(_ClassifierParams, HasRegParam, HasMaxIter, HasFitIntercept, HasTol,
  • class LinearSVC(_JavaClassifier, _LinearSVCParams, JavaMLWritable, JavaMLReadable):
  • class LinearSVCModel(_JavaClassificationModel, _LinearSVCParams, JavaMLWritable, JavaMLReadable):
  • class _LogisticRegressionParams(_ProbabilisticClassifierParams, HasRegParam,
  • class LogisticRegression(_JavaProbabilisticClassifier, _LogisticRegressionParams, JavaMLWritable,
  • class LogisticRegressionModel(_JavaProbabilisticClassificationModel, _LogisticRegressionParams,
  • class DecisionTreeClassifier(_JavaProbabilisticClassifier, _DecisionTreeClassifierParams,
  • class DecisionTreeClassificationModel(_DecisionTreeModel, _JavaProbabilisticClassificationModel,
  • class RandomForestClassifier(_JavaProbabilisticClassifier, _RandomForestClassifierParams,
  • class RandomForestClassificationModel(_TreeEnsembleModel, _JavaProbabilisticClassificationModel,
  • class GBTClassifier(_JavaProbabilisticClassifier, _GBTClassifierParams,
  • class GBTClassificationModel(_TreeEnsembleModel, _JavaProbabilisticClassificationModel,
  • class _NaiveBayesParams(_PredictorParams, HasWeightCol):
  • class NaiveBayes(_JavaProbabilisticClassifier, _NaiveBayesParams, HasThresholds, HasWeightCol,
  • class NaiveBayesModel(_JavaProbabilisticClassificationModel, _NaiveBayesParams, JavaMLWritable,
  • class _MultilayerPerceptronParams(_ProbabilisticClassifierParams, HasSeed, HasMaxIter,
  • class MultilayerPerceptronClassifier(_JavaProbabilisticClassifier, _MultilayerPerceptronParams,
  • class MultilayerPerceptronClassificationModel(_JavaProbabilisticClassificationModel,
  • class _OneVsRestParams(_ClassifierParams, HasWeightCol):
  • class FMClassifier(_JavaProbabilisticClassifier, _FactorizationMachinesParams, JavaMLWritable,
  • class FMClassificationModel(_JavaProbabilisticClassificationModel, _FactorizationMachinesParams,
  • class Regressor(Predictor, _PredictorParams):
  • class RegressionModel(PredictionModel, _PredictorParams):
  • class _JavaRegressor(Regressor, JavaPredictor):
  • class _JavaRegressionModel(RegressionModel, JavaPredictionModel):
  • class _LinearRegressionParams(_PredictorParams, HasRegParam, HasElasticNetParam, HasMaxIter,
  • class LinearRegression(_JavaRegressor, _LinearRegressionParams, JavaMLWritable, JavaMLReadable):
  • class LinearRegressionModel(_JavaRegressionModel, _LinearRegressionParams, GeneralJavaMLWritable,
  • class DecisionTreeRegressor(_JavaRegressor, _DecisionTreeRegressorParams, JavaMLWritable,
  • class RandomForestRegressor(_JavaRegressor, _RandomForestRegressorParams, JavaMLWritable,
  • class GBTRegressor(_JavaRegressor, _GBTRegressorParams, JavaMLWritable, JavaMLReadable):
  • class _AFTSurvivalRegressionParams(_PredictorParams, HasMaxIter, HasTol, HasFitIntercept,
  • class AFTSurvivalRegression(_JavaRegressor, _AFTSurvivalRegressionParams,
  • class AFTSurvivalRegressionModel(_JavaRegressionModel, _AFTSurvivalRegressionParams,
  • class _GeneralizedLinearRegressionParams(_PredictorParams, HasFitIntercept, HasMaxIter,
  • class GeneralizedLinearRegression(_JavaRegressor, _GeneralizedLinearRegressionParams,
  • class GeneralizedLinearRegressionModel(_JavaRegressionModel, _GeneralizedLinearRegressionParams,
  • class _FactorizationMachinesParams(_PredictorParams, HasMaxIter, HasStepSize, HasTol,
  • class FMRegressor(_JavaRegressor, _FactorizationMachinesParams, JavaMLWritable, JavaMLReadable):
  • class FMRegressionModel(_JavaRegressionModel, _FactorizationMachinesParams, JavaMLWritable,
  • class JavaPredictor(Predictor, JavaEstimator, _PredictorParams):
  • class JavaPredictionModel(PredictionModel, JavaModel, _PredictorParams):

@SparkQA
Copy link

SparkQA commented Feb 2, 2020

Test build #117754 has finished for PR 27245 at commit 08c36a1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class _PredictorParams(HasLabelCol, HasFeaturesCol, HasPredictionCol):
  • class Predictor(Estimator, _PredictorParams):
  • class PredictionModel(Model, _PredictorParams):
  • class _ClassifierParams(HasRawPredictionCol, _PredictorParams):
  • class Classifier(Predictor, _ClassifierParams):
  • class ClassificationModel(PredictionModel, _ClassifierParams):
  • class _ProbabilisticClassifierParams(HasProbabilityCol, HasThresholds, _ClassifierParams):
  • class ProbabilisticClassifier(Classifier, _ProbabilisticClassifierParams):
  • class ProbabilisticClassificationModel(ClassificationModel,
  • class _JavaClassifier(Classifier, JavaPredictor):
  • class _JavaClassificationModel(ClassificationModel, JavaPredictionModel):
  • class _JavaProbabilisticClassifier(ProbabilisticClassifier, _JavaClassifier):
  • class _JavaProbabilisticClassificationModel(ProbabilisticClassificationModel,
  • class _LinearSVCParams(_ClassifierParams, HasRegParam, HasMaxIter, HasFitIntercept, HasTol,
  • class LinearSVC(_JavaClassifier, _LinearSVCParams, JavaMLWritable, JavaMLReadable):
  • class LinearSVCModel(_JavaClassificationModel, _LinearSVCParams, JavaMLWritable, JavaMLReadable):
  • class _LogisticRegressionParams(_ProbabilisticClassifierParams, HasRegParam,
  • class LogisticRegression(_JavaProbabilisticClassifier, _LogisticRegressionParams, JavaMLWritable,
  • class LogisticRegressionModel(_JavaProbabilisticClassificationModel, _LogisticRegressionParams,
  • class DecisionTreeClassifier(_JavaProbabilisticClassifier, _DecisionTreeClassifierParams,
  • class DecisionTreeClassificationModel(_DecisionTreeModel, _JavaProbabilisticClassificationModel,
  • class RandomForestClassifier(_JavaProbabilisticClassifier, _RandomForestClassifierParams,
  • class RandomForestClassificationModel(_TreeEnsembleModel, _JavaProbabilisticClassificationModel,
  • class GBTClassifier(_JavaProbabilisticClassifier, _GBTClassifierParams,
  • class GBTClassificationModel(_TreeEnsembleModel, _JavaProbabilisticClassificationModel,
  • class _NaiveBayesParams(_PredictorParams, HasWeightCol):
  • class NaiveBayes(_JavaProbabilisticClassifier, _NaiveBayesParams, HasThresholds, HasWeightCol,
  • class NaiveBayesModel(_JavaProbabilisticClassificationModel, _NaiveBayesParams, JavaMLWritable,
  • class _MultilayerPerceptronParams(_ProbabilisticClassifierParams, HasSeed, HasMaxIter,
  • class MultilayerPerceptronClassifier(_JavaProbabilisticClassifier, _MultilayerPerceptronParams,
  • class MultilayerPerceptronClassificationModel(_JavaProbabilisticClassificationModel,
  • class _OneVsRestParams(_ClassifierParams, HasWeightCol):
  • class FMClassifier(_JavaProbabilisticClassifier, _FactorizationMachinesParams, JavaMLWritable,
  • class FMClassificationModel(_JavaProbabilisticClassificationModel, _FactorizationMachinesParams,
  • class Regressor(Predictor, _PredictorParams):
  • class RegressionModel(PredictionModel, _PredictorParams):
  • class _JavaRegressor(Regressor, JavaPredictor):
  • class _JavaRegressionModel(RegressionModel, JavaPredictionModel):
  • class _LinearRegressionParams(_PredictorParams, HasRegParam, HasElasticNetParam, HasMaxIter,
  • class LinearRegression(_JavaRegressor, _LinearRegressionParams, JavaMLWritable, JavaMLReadable):
  • class LinearRegressionModel(_JavaRegressionModel, _LinearRegressionParams, GeneralJavaMLWritable,
  • class DecisionTreeRegressor(_JavaRegressor, _DecisionTreeRegressorParams, JavaMLWritable,
  • class RandomForestRegressor(_JavaRegressor, _RandomForestRegressorParams, JavaMLWritable,
  • class GBTRegressor(_JavaRegressor, _GBTRegressorParams, JavaMLWritable, JavaMLReadable):
  • class _AFTSurvivalRegressionParams(_PredictorParams, HasMaxIter, HasTol, HasFitIntercept,
  • class AFTSurvivalRegression(_JavaRegressor, _AFTSurvivalRegressionParams,
  • class AFTSurvivalRegressionModel(_JavaRegressionModel, _AFTSurvivalRegressionParams,
  • class _GeneralizedLinearRegressionParams(_PredictorParams, HasFitIntercept, HasMaxIter,
  • class GeneralizedLinearRegression(_JavaRegressor, _GeneralizedLinearRegressionParams,
  • class GeneralizedLinearRegressionModel(_JavaRegressionModel, _GeneralizedLinearRegressionParams,
  • class _FactorizationMachinesParams(_PredictorParams, HasMaxIter, HasStepSize, HasTol,
  • class FMRegressor(_JavaRegressor, _FactorizationMachinesParams, JavaMLWritable, JavaMLReadable):
  • class FMRegressionModel(_JavaRegressionModel, _FactorizationMachinesParams, JavaMLWritable,
  • class JavaPredictor(Predictor, JavaEstimator, _PredictorParams):
  • class JavaPredictionModel(PredictionModel, JavaModel, _PredictorParams):

@SparkQA
Copy link

SparkQA commented Feb 10, 2020

Test build #118184 has finished for PR 27245 at commit 63e2fcc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class _PredictorParams(HasLabelCol, HasFeaturesCol, HasPredictionCol):
  • class Predictor(Estimator, _PredictorParams):
  • class PredictionModel(Model, _PredictorParams):
  • class _ClassifierParams(HasRawPredictionCol, _PredictorParams):
  • class Classifier(Predictor, _ClassifierParams):
  • class ClassificationModel(PredictionModel, _ClassifierParams):
  • class _ProbabilisticClassifierParams(HasProbabilityCol, HasThresholds, _ClassifierParams):
  • class ProbabilisticClassifier(Classifier, _ProbabilisticClassifierParams):
  • class ProbabilisticClassificationModel(ClassificationModel,
  • class _JavaClassifier(Classifier, JavaPredictor):
  • class _JavaClassificationModel(ClassificationModel, JavaPredictionModel):
  • class _JavaProbabilisticClassifier(ProbabilisticClassifier, _JavaClassifier):
  • class _JavaProbabilisticClassificationModel(ProbabilisticClassificationModel,
  • class _LinearSVCParams(_ClassifierParams, HasRegParam, HasMaxIter, HasFitIntercept, HasTol,
  • class LinearSVC(_JavaClassifier, _LinearSVCParams, JavaMLWritable, JavaMLReadable):
  • class LinearSVCModel(_JavaClassificationModel, _LinearSVCParams, JavaMLWritable, JavaMLReadable):
  • class _LogisticRegressionParams(_ProbabilisticClassifierParams, HasRegParam,
  • class LogisticRegression(_JavaProbabilisticClassifier, _LogisticRegressionParams, JavaMLWritable,
  • class LogisticRegressionModel(_JavaProbabilisticClassificationModel, _LogisticRegressionParams,
  • class DecisionTreeClassifier(_JavaProbabilisticClassifier, _DecisionTreeClassifierParams,
  • class DecisionTreeClassificationModel(_DecisionTreeModel, _JavaProbabilisticClassificationModel,
  • class RandomForestClassifier(_JavaProbabilisticClassifier, _RandomForestClassifierParams,
  • class RandomForestClassificationModel(_TreeEnsembleModel, _JavaProbabilisticClassificationModel,
  • class GBTClassifier(_JavaProbabilisticClassifier, _GBTClassifierParams,
  • class GBTClassificationModel(_TreeEnsembleModel, _JavaProbabilisticClassificationModel,
  • class _NaiveBayesParams(_PredictorParams, HasWeightCol):
  • class NaiveBayes(_JavaProbabilisticClassifier, _NaiveBayesParams, HasThresholds, HasWeightCol,
  • class NaiveBayesModel(_JavaProbabilisticClassificationModel, _NaiveBayesParams, JavaMLWritable,
  • class _MultilayerPerceptronParams(_ProbabilisticClassifierParams, HasSeed, HasMaxIter,
  • class MultilayerPerceptronClassifier(_JavaProbabilisticClassifier, _MultilayerPerceptronParams,
  • class MultilayerPerceptronClassificationModel(_JavaProbabilisticClassificationModel,
  • class _OneVsRestParams(_ClassifierParams, HasWeightCol):
  • class FMClassifier(_JavaProbabilisticClassifier, _FactorizationMachinesParams, JavaMLWritable,
  • class FMClassificationModel(_JavaProbabilisticClassificationModel, _FactorizationMachinesParams,
  • class Regressor(Predictor, _PredictorParams):
  • class RegressionModel(PredictionModel, _PredictorParams):
  • class _JavaRegressor(Regressor, JavaPredictor):
  • class _JavaRegressionModel(RegressionModel, JavaPredictionModel):
  • class _LinearRegressionParams(_PredictorParams, HasRegParam, HasElasticNetParam, HasMaxIter,
  • class LinearRegression(_JavaRegressor, _LinearRegressionParams, JavaMLWritable, JavaMLReadable):
  • class LinearRegressionModel(_JavaRegressionModel, _LinearRegressionParams, GeneralJavaMLWritable,
  • class DecisionTreeRegressor(_JavaRegressor, _DecisionTreeRegressorParams, JavaMLWritable,
  • class RandomForestRegressor(_JavaRegressor, _RandomForestRegressorParams, JavaMLWritable,
  • class GBTRegressor(_JavaRegressor, _GBTRegressorParams, JavaMLWritable, JavaMLReadable):
  • class _AFTSurvivalRegressionParams(_PredictorParams, HasMaxIter, HasTol, HasFitIntercept,
  • class AFTSurvivalRegression(_JavaRegressor, _AFTSurvivalRegressionParams,
  • class AFTSurvivalRegressionModel(_JavaRegressionModel, _AFTSurvivalRegressionParams,
  • class _GeneralizedLinearRegressionParams(_PredictorParams, HasFitIntercept, HasMaxIter,
  • class GeneralizedLinearRegression(_JavaRegressor, _GeneralizedLinearRegressionParams,
  • class GeneralizedLinearRegressionModel(_JavaRegressionModel, _GeneralizedLinearRegressionParams,
  • class _FactorizationMachinesParams(_PredictorParams, HasMaxIter, HasStepSize, HasTol,
  • class FMRegressor(_JavaRegressor, _FactorizationMachinesParams, JavaMLWritable, JavaMLReadable):
  • class FMRegressionModel(_JavaRegressionModel, _FactorizationMachinesParams, JavaMLWritable,
  • class JavaPredictor(Predictor, JavaEstimator, _PredictorParams):
  • class JavaPredictionModel(PredictionModel, JavaModel, _PredictorParams):

@zhengruifeng
Copy link
Contributor

In general LGTM, pending update version info 3.0.0 to 3.1.0
Also ping @srowen @huaxingao Could you help double check ?

@huaxingao
Copy link
Contributor

The changes look good to me.

@zero323
Copy link
Member Author

zero323 commented Feb 19, 2020

In general LGTM, pending update version info 3.0.0 to 3.1.0

Yeah, I was thinking about the right way to tackle this. While we introduce new objects in type hierarchy, the API of the existing classes doesn't change, compared to what we get in 3.0. So I felt that keeping since intact makes sense. But I am happy to change it, if you think it is the way to go.

@SparkQA
Copy link

SparkQA commented Feb 19, 2020

Test build #118680 has finished for PR 27245 at commit 0b0f723.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zhengruifeng
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Mar 4, 2020

Test build #119274 has finished for PR 27245 at commit 0b0f723.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zhengruifeng
Copy link
Contributor

Merged to master, thanks all!

@zero323
Copy link
Member Author

zero323 commented Mar 4, 2020

Thanks!

@zero323 zero323 deleted the SPARK-29212 branch March 4, 2020 11:51
sjincho pushed a commit to sjincho/spark that referenced this pull request Apr 15, 2020
### What changes were proposed in this pull request?

Implement common base ML classes (`Predictor`, `PredictionModel`, `Classifier`, `ClasssificationModel` `ProbabilisticClassifier`, `ProbabilisticClasssificationModel`, `Regressor`, `RegrssionModel`) for non-Java backends.

Note

- `Predictor` and `JavaClassifier` should be abstract as `_fit` method is not implemented.
- `PredictionModel` should be abstract as `_transform` is not implemented.

### Why are the changes needed?

To provide extensions points for non-JVM algorithms, as well as a public (as opposed to `Java*` variants, which are commonly described in docstrings as private) hierarchy which can be used to distinguish between different classes of predictors.

For longer discussion see [SPARK-29212](https://issues.apache.org/jira/browse/SPARK-29212) and / or apache#25776.

### Does this PR introduce any user-facing change?

It adds new base classes as listed above, but effective interfaces (method resolution order notwithstanding) stay the same.

Additionally "private" `Java*` classes in`ml.regression` and `ml.classification` have been renamed to follow PEP-8 conventions (added leading underscore).

It is for discussion if the same should be done to equivalent classes from `ml.wrapper`.

If we take `JavaClassifier` as an example, type hierarchy will change from

![old pyspark ml classification JavaClassifier](https://user-images.githubusercontent.com/1554276/72657093-5c0b0c80-39a0-11ea-9069-a897d75de483.png)

to

![new pyspark ml classification _JavaClassifier](https://user-images.githubusercontent.com/1554276/72657098-64fbde00-39a0-11ea-8f80-01187a5ea5a6.png)

Similarly the old model

![old pyspark ml classification JavaClassificationModel](https://user-images.githubusercontent.com/1554276/72657103-7513bd80-39a0-11ea-9ffc-59eb6ab61fde.png)

will become

![new pyspark ml classification _JavaClassificationModel](https://user-images.githubusercontent.com/1554276/72657110-80ff7f80-39a0-11ea-9f5c-fe408664e827.png)

### How was this patch tested?

Existing unit tests.

Closes apache#27245 from zero323/SPARK-29212.

Authored-by: zero323 <[email protected]>
Signed-off-by: zhengruifeng <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants