-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-30144][ML][PySpark] Make MultilayerPerceptronClassificationModel extend MultilayerPerceptronParams #26838
Conversation
…el extend MultilayerPerceptronParams
Test build #115115 has finished for PR 26838 at commit
|
Test build #115121 has finished for PR 26838 at commit
|
@@ -328,6 +328,9 @@ object MimaExcludes { | |||
// [SPARK-26457] Show hadoop configurations in HistoryServer environment tab | |||
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.status.api.v1.ApplicationEnvironmentInfo.this"), | |||
|
|||
// [SPARK-30144][ML] Make MultilayerPerceptronClassificationModel extend MultilayerPerceptronParams | |||
ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.ml.classification.MultilayerPerceptronClassificationModel.layers"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a question. Is this worth to break the API, @huaxingao ?
@@ -273,29 +273,29 @@ object MultilayerPerceptronClassifier | |||
* Each layer has sigmoid activation function, output layer has softmax. | |||
* | |||
* @param uid uid | |||
* @param layers array of layer sizes including input and output layers | |||
* @param modelLayers array of layer sizes including input and output layers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know the constructor is private, but is it necessary to change this name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is needed, since all estimators and their models should share the same params, and there is by chance a param named layers
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about just removing modelLayers
in model?
since the value (array of layer sizes) can be easily obtain by $(layers)
* @param weights the weights of layers | ||
*/ | ||
@Since("1.5.0") | ||
class MultilayerPerceptronClassificationModel private[ml] ( | ||
@Since("1.5.0") override val uid: String, | ||
@Since("1.5.0") val layers: Array[Int], | ||
@Since("1.5.0") val modelLayers: Array[Int], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same question, why need to change this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, i see. MultilayerPerceptronParams has layers too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we rename layers of MultilayerPerceptronParams instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but renaming layers of MultilayerPerceptronParams will also break the API.
@dongjoon-hyun @srowen @viirya Thanks for the review. Since
|
It seems that this will be a breaking change, so maybe we need to depreicate model's |
current var |
I can remove So I guess probably keep the current fix? |
I see, so it's not just a question of the API, but the name of the field in the serialized model? Hm, yeah. I think it's OK to change and standardize the name in 3.0, but needs to be able to read 'layers' from previous models if present. Is that the only issue here? |
@huaxingao I guess we can mark |
Test build #115309 has finished for PR 26838 at commit
|
I manually saved 2.4.4 model and loaded using 3.0.0. It worked OK. |
mllib/src/main/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifier.scala
Show resolved
Hide resolved
@Since("2.0.0") val weights: Vector) | ||
extends ProbabilisticClassificationModel[Vector, MultilayerPerceptronClassificationModel] | ||
with Serializable with MLWritable { | ||
with MultilayerPerceptronParams with Serializable with MLWritable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to this change. But do we use MultilayerPerceptronClassificationModel in executors? Like not every classification model extends Serializable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about this. Seems only the tree related model extends Serializable.
@@ -347,13 +355,13 @@ object MultilayerPerceptronClassificationModel | |||
class MultilayerPerceptronClassificationModelWriter( | |||
instance: MultilayerPerceptronClassificationModel) extends MLWriter { | |||
|
|||
private case class Data(layers: Array[Int], weights: Vector) | |||
private case class Data(weights: Vector) | |||
|
|||
override protected def saveImpl(path: String): Unit = { | |||
// Save metadata and Params | |||
DefaultParamsWriter.saveMetadata(instance, path, sc) | |||
// Save model data: layers, weights |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no layers now.
val model = new MultilayerPerceptronClassificationModel(metadata.uid, layers, weights) | ||
|
||
val columns = sparkSession.read.parquet(dataPath).columns | ||
val model = if (columns.length == 2) { // model prior to 3.0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have an example old model and read it in one test case?
val weights = data.getAs[Vector](1) | ||
val model = new MultilayerPerceptronClassificationModel(metadata.uid, layers, weights) | ||
|
||
val columns = sparkSession.read.parquet(dataPath).columns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I think we can read from the data path in any case. Then we check the length of returned array to decide we should read layers + weights or only weights. We do not need to read the data twice.
* @param weights the weights of layers | ||
*/ | ||
@Since("1.5.0") | ||
class MultilayerPerceptronClassificationModel private[ml] ( | ||
@Since("1.5.0") override val uid: String, | ||
@Since("1.5.0") val layers: Array[Int], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we update migration guild?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@srowen Sean, this question is for you.
Test build #115351 has finished for PR 26838 at commit
|
I tagged the JIRA with release_notes and added a note. I also put back the |
I see, so you mean there is no meaningful way to keep the previous setters, because the nature of the param has changed? |
Yes. |
OK, if this is the least change we can make while fixing the inconsistency, I'd be OK with it for 3.0. Even though these weren't technically deprecated, it's a minor API and a minor change, and still legitimate for a major release, versus keeping the inconsistency for years. |
Test build #115871 has finished for PR 26838 at commit
|
retest this please |
Test build #115872 has finished for PR 26838 at commit
|
@huaxingao It seems that the models are saved in different dir? |
@zhengruifeng Yes. The models are now in mllib/src/test/resources/ml-models |
@huaxingao It seems that hashingTF/strIndexer models are stored in |
@zhengruifeng |
Test build #115969 has finished for PR 26838 at commit
|
Test build #115971 has finished for PR 26838 at commit
|
@@ -459,7 +459,7 @@ class StringIndexerSuite extends MLTest with DefaultReadWriteTest { | |||
} | |||
|
|||
test("Load StringIndexderModel prior to Spark 3.0") { | |||
val modelPath = testFile("test-data/strIndexerModel") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strIndexerModel-2.4.4?
@@ -89,7 +89,7 @@ class HashingTFSuite extends MLTest with DefaultReadWriteTest { | |||
} | |||
|
|||
test("SPARK-23469: Load HashingTF prior to Spark 3.0") { | |||
val hashingTFPath = testFile("test-data/hashingTF-pre3.0") | |||
val hashingTFPath = testFile("ml-models/hashingTF-pre3.0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hashingTF-2.4.4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated. Thanks!
Test build #115997 has finished for PR 26838 at commit
|
Merged to master |
Thank you all! |
What changes were proposed in this pull request?
Make
MultilayerPerceptronClassificationModel
extendMultilayerPerceptronParams
Why are the changes needed?
Make
MultilayerPerceptronClassificationModel
extendMultilayerPerceptronParams
to expose the training params, so user can see these params when callingextractParamMap
Does this PR introduce any user-facing change?
Yes. The
MultilayerPerceptronParams
such asseed
,maxIter
... are available inMultilayerPerceptronClassificationModel
nowHow was this patch tested?
Manually tested
MultilayerPerceptronClassificationModel.extractParamMap()
to verify all the new params are there.