Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-29093][PYTHON][ML] Remove automatically generated param setters in _shared_params_code_gen.py #26232

Closed
wants to merge 4 commits into from

Conversation

huaxingao
Copy link
Contributor

What changes were proposed in this pull request?

Remove automatically generated param setters in _shared_params_code_gen.py

Why are the changes needed?

To keep parity between scala and python

Does this PR introduce any user-facing change?

Yes
Add some setters in Python ML XXXModels

How was this patch tested?

unit tests

@SparkQA
Copy link

SparkQA commented Oct 23, 2019

Test build #112559 has finished for PR 26232 at commit a67163e.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 24, 2019

Test build #112571 has finished for PR 26232 at commit d076f9e.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class DCT(JavaUnaryTransformer, JavaMLReadable, JavaMLWritable):
  • class ElementwiseProduct(JavaUnaryTransformer, JavaMLReadable, JavaMLWritable):
  • class NGram(JavaUnaryTransformer, JavaMLReadable, JavaMLWritable):
  • class Normalizer(JavaUnaryTransformer, JavaMLReadable, JavaMLWritable):
  • class PolynomialExpansion(JavaUnaryTransformer, JavaMLReadable, JavaMLWritable):
  • class RegexTokenizer(JavaUnaryTransformer, JavaMLReadable, JavaMLWritable):
  • class Tokenizer(JavaUnaryTransformer, JavaMLReadable, JavaMLWritable):
  • class JavaUnaryTransformer(JavaParams, UnaryTransformer):

@SparkQA
Copy link

SparkQA commented Oct 24, 2019

Test build #112583 has finished for PR 26232 at commit bff5a70.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class DCT(JavaTransformer, HasInputCol, HasOutputCol, JavaMLReadable, JavaMLWritable):
  • class ElementwiseProduct(JavaTransformer, HasInputCol, HasOutputCol, JavaMLReadable,
  • class NGram(JavaTransformer, HasInputCol, HasOutputCol, JavaMLReadable, JavaMLWritable):
  • class Normalizer(JavaTransformer, HasInputCol, HasOutputCol, JavaMLReadable, JavaMLWritable):
  • class PolynomialExpansion(JavaTransformer, HasInputCol, HasOutputCol, JavaMLReadable,
  • class RegexTokenizer(JavaTransformer, HasInputCol, HasOutputCol, JavaMLReadable, JavaMLWritable):
  • class Tokenizer(JavaTransformer, HasInputCol, HasOutputCol, JavaMLReadable, JavaMLWritable):

@zhengruifeng
Copy link
Contributor

1, why you remove the __metaclass__ = ABCMeta in wrapper.py?
2, can we add the missing since annotations for new added setters?
I can not 'add single comment' or 'start a review', I guess that is beacuse this PR is still to large although we had finished some preposed PRs.

@huaxingao
Copy link
Contributor Author

@zhengruifeng
Sorry, I accidentally removed __metaclass__ = ABCMeta . Will add back.

For the newly added setters in Transformers, most of them are new in 3.0.0. I will double check to make sure all the newly added one have @since("3.0.0"). For the newly added setters in Estimators, all of them are existing ones. I initially tried to trace back the first version these setters appeared, and added a bunch of since annotations. Then I checked the documentation, and found out only the parameters that are specific to the Estimators/Transformers have since annotations in the setters, the parameters in HasXXX don't have since annotations in the setters. Here is one example for PCA:

image

I guess I will only add @since("3.0.0") for the setters that are really new in 3.0.0. For the newly added setters in Estimators, since they are actually preexisting ones, I will not add since annotation. I guess I will remove the ones I have added. Does this sound good to you?

@SparkQA
Copy link

SparkQA commented Oct 24, 2019

Test build #112610 has finished for PR 26232 at commit 03c2e4a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zhengruifeng
Copy link
Contributor

merged to master, thanks @huaxingao !
After this PR, the py hierarchy should be highly similar to the scala side.

@huaxingao
Copy link
Contributor Author

Thanks a lot! @zhengruifeng

"""
Sets the value of :py:attr:`topicDistributionCol`.

>>> algo = LDA().setTopicDistributionCol("topicDistributionCol")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If anything, shouldn't we have model specific dosctring here? This one just duplicates what we already have in estimator:

>>> algo = LDA().setTopicDistributionCol("topicDistributionCol")
>>> algo.getTopicDistributionCol()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am in the middle of rechecking this PR and other ml parity PRs. I will have a follow up PR soon to fix this along with a few other problems I found.

Copy link
Member

@zero323 zero323 Jan 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @huaxingao. If you don't mind me asking ‒ should I expect any serious shifts?

(I am in the middle of handling annotations for this, and it might be useful to know what to expect)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. Just fix the small things I overlooked before, for example, I have setters in TrainValidationSplitModel/CrossValidatorModel because these setters are used in _from_java. I will remove the setters and use _set instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @huaxingao.

return self._set(outputCol=value)

@since("3.0.0")
def setMinTF(self, value):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is already present in 2.4 (L916-L921)

return self._set(minTF=value)

@since("3.0.0")
def setBinary(self, value):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto, L923-L928

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix were in a few hours ago. Please let me know if you see other problems. Thanks! @zero323

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants