Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-31853][DOCS] Mention removal of params mixins setter in migration guide #28663

Conversation

EnricoMi
Copy link
Contributor

What changes were proposed in this pull request?

The Pyspark Migration Guide needs to mention a breaking change of the Pyspark ML API.

Why are the changes needed?

In SPARK-29093, all setters have been removed from Params mixins in pyspark.ml.param.shared. Those setters had been part of the public pyspark ML API, hence this is a breaking change.

Does this PR introduce any user-facing change?

Only documentation.

How was this patch tested?

Visually.

@HyukjinKwon
Copy link
Member

@zhengruifeng FYI

@srowen
Copy link
Member

srowen commented May 28, 2020

Dumb question, but does that affect callers?

@EnricoMi
Copy link
Contributor Author

Code that used to call into the setters, e.g. HasOutputCols.setOutputCols(value) will not compile anymore.

@zhengruifeng
Copy link
Contributor

@EnricoMi The old mixins provided some wrong setters in the params, (e.g. OneVsRestModel.setClassifier), which were removed in the 3.0. The new params are keeped in line with the scala side.

For the callers like HasOutputCols.setOutputCols(value), I am not sure whether it is a common usage.

@huaxingao @zero323 How do think about this?

@huaxingao
Copy link
Contributor

I don't think the changes in SPARK-29093 affect users. Using outputCols as an example, the common usage is like this:

    ohe = OneHotEncoder()
    ohe.setInputCols(["input"])
    ohe.setOutputCols(["output"])
    model = ohe.fit(df)

It doesn't matter to the users if setOutputCols is in OneHotEncoder or in HasOutputCols. Seems to me that users will not call HasOutputCols.setOutputCols directly.

@EnricoMi
Copy link
Contributor Author

EnricoMi commented Jun 1, 2020

The HasOutputCols mixin is a public mixin that users can use with other implementations of Params, not only OneHotEncoder. User code breaks and users have to dive into Spark code to find the correct replacement. That single line in the migration guide helps users to save an hour or two when moving their code to pyspark 3.

In my opinion, it does not matter if this is common usage or not, it is a possible usage that breaks with this change. First place to look when your code breaks is the migration guide.

@srowen
Copy link
Member

srowen commented Jun 1, 2020

Yeah the only downside is that if we fill the migration guide with a bunch of niche stuff, it might be hard to figure out what the important common items are. I'm not sure how much user code uses the mixins directly, but if there is any code that does it's probably something like HasOutputCols, a common one. So yeah I think it's OK to mention this change.

@srowen
Copy link
Member

srowen commented Jun 3, 2020

Jenkins test this please

@SparkQA
Copy link

SparkQA commented Jun 3, 2020

Test build #123492 has finished for PR 28663 at commit ae3bf76.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen srowen closed this in 4bbe3c2 Jun 3, 2020
srowen pushed a commit that referenced this pull request Jun 3, 2020
…ion guide

### What changes were proposed in this pull request?
The Pyspark Migration Guide needs to mention a breaking change of the Pyspark ML API.

### Why are the changes needed?
In SPARK-29093, all setters have been removed from `Params` mixins in `pyspark.ml.param.shared`. Those setters had been part of the public pyspark ML API, hence this is a breaking change.

### Does this PR introduce _any_ user-facing change?
Only documentation.

### How was this patch tested?
Visually.

Closes #28663 from EnricoMi/branch-pyspark-migration-guide-setters.

Authored-by: Enrico Minack <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
(cherry picked from commit 4bbe3c2)
Signed-off-by: Sean Owen <[email protected]>
@srowen
Copy link
Member

srowen commented Jun 3, 2020

Merged to master/3.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants