Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

depr(python,rust!): Rename DataFrame.melt to unpivot and make parameters consistent with pivot #17095

Merged
merged 16 commits into from
Jun 21, 2024

Conversation

MarcoGorelli
Copy link
Collaborator

@MarcoGorelli MarcoGorelli commented Jun 20, 2024

Closes #11974

The diff isn't super-clear in one bit, but py-polars/tests/unit/operations/test_melt.py is just renamed to py-polars/tests/unit/operations/test_unpivot.py but with an extra couple of with pytest.deprecated_call

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars labels Jun 20, 2024
@MarcoGorelli MarcoGorelli force-pushed the unpivot branch 2 times, most recently from 039b57a to a517e9a Compare June 20, 2024 15:40
Copy link

codecov bot commented Jun 20, 2024

Codecov Report

Attention: Patch coverage is 88.17204% with 11 lines in your changes missing coverage. Please review.

Project coverage is 80.87%. Comparing base (8a6bf4b) to head (1ad45cf).

Current head 1ad45cf differs from pull request most recent head a633eca

Please upload reports for the commit a633eca to get more accurate results.

Files Patch % Lines
crates/polars-plan/src/plans/functions/mod.rs 44.44% 5 Missing ⚠️
py-polars/src/lazyframe/visitor/nodes.rs 0.00% 4 Missing ⚠️
crates/polars-plan/src/plans/functions/dsl.rs 50.00% 1 Missing ⚠️
...plan/src/plans/optimizer/predicate_pushdown/mod.rs 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #17095      +/-   ##
==========================================
+ Coverage   80.86%   80.87%   +0.01%     
==========================================
  Files        1456     1456              
  Lines      191141   191137       -4     
  Branches     2728     2728              
==========================================
+ Hits       154562   154585      +23     
+ Misses      36073    36046      -27     
  Partials      506      506              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@stinodego stinodego left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whew, quite a lot of churn to get through it looks like!

I know this is still in draft, but I wanted to mention: we should keep melt but deprecate it, so we can remove it with 2.0.0.

@MarcoGorelli
Copy link
Collaborator Author

ah sorry, when you said to get rid of melt I may have taken you too literally 😄 ok, will update tomorrow to just make it a deprecated alias

@Julian-J-S
Copy link
Contributor

Love it! 😄

Question:
what about the columns parameter of pivot?
Is this also going to be renamed to on like I suggested in my issue?

Asking because the idea was to align the parameters of pivot/unpivot.

@ritchie46
Copy link
Member

Alright! Ready to undraft?

@MarcoGorelli
Copy link
Collaborator Author

almost, just finishing up pivot

@MarcoGorelli MarcoGorelli marked this pull request as ready for review June 21, 2024 08:18
@MarcoGorelli MarcoGorelli changed the title feat: rename DataFrame.melt to DataFrame.unpivot, and rename its args: id_vars -> index, value_vars -> on feat: rename DataFrame.melt to DataFrame.unpivot, and rename its args: id_vars -> index, value_vars -> on. Rename column in DataFrame.pivot to on Jun 21, 2024
@MarcoGorelli
Copy link
Collaborator Author

Sure, done

I do like how it becomes symmetrical, which probably makes things easier for new users, so thanks @JulianCologne for the suggestion. There is some code churn for existing users, but it's a fairly easy find-and-replace

@stinodego
Copy link
Member

Thanks Marco, looks good!

Just one question about the design: What do you think about making on the first parameter, like suggested in the linked issue? No strong opinion myself, just checking.

@stinodego stinodego changed the title feat: rename DataFrame.melt to DataFrame.unpivot, and rename its args: id_vars -> index, value_vars -> on. Rename column in DataFrame.pivot to on feat: Rename DataFrame.melt to unpivot and make parameters consistent with pivot Jun 21, 2024
@stinodego stinodego changed the title feat: Rename DataFrame.melt to unpivot and make parameters consistent with pivot depr(python,rust!): Rename DataFrame.melt to unpivot and make parameters consistent with pivot Jun 21, 2024
@github-actions github-actions bot added breaking rust Change that breaks backwards compatibility for the Rust crate deprecation Add a deprecation warning to outdated functionality labels Jun 21, 2024
@MarcoGorelli
Copy link
Collaborator Author

For pivot I don't think that can be done in a non-breaking way, at least not without first making the arguments keyword-only. And so if the order in pivot stays <index, on>, then it should probably stay that way too in unpivot?

@stinodego
Copy link
Member

stinodego commented Jun 21, 2024

For pivot I don't think that can be done in a non-breaking way, at least not without first making the arguments keyword-only. And so if the order in pivot stays <index, on>, then it should probably stay that way too in unpivot?

For pivot, non-keyword args were already deprecated, related to this issue: #12087

I had just removed this deprecation for 1.0, so we can change their order without breaking anything right now. So we are really free in determining the most intuitive API here. Would it be better with on first?

@MarcoGorelli
Copy link
Collaborator Author

ah, cool! in that case, agree, on would look better first

@MarcoGorelli MarcoGorelli marked this pull request as draft June 21, 2024 11:56
@MarcoGorelli
Copy link
Collaborator Author

I had just removed this deprecation for 1.0, so we can change their order without breaking anything right now

not sure I understand sorry - am I OK to make the arguments keyword-only as part of this PR? It still needs enforcing, right?

@MarcoGorelli MarcoGorelli marked this pull request as ready for review June 21, 2024 12:21
@stinodego
Copy link
Member

stinodego commented Jun 21, 2024

not sure I understand sorry - am I OK to make the arguments keyword-only as part of this PR? It still needs enforcing, right?

You don't need to make them keyword-only, you can just move the parameters as you see fit.

If you look at the latest release (0.20.31), you can see that non-keyword arguments were already deprecated:
https://github.com/pola-rs/polars/blob/py-0.20.31/py-polars/polars/dataframe/frame.py#L7561-L7780

So at this point, everyone has their args as keyword arguments (theoretically speaking). So we are free to move around the parameters as we see fit for the 1.0 release.

@MarcoGorelli
Copy link
Collaborator Author

I feel a bit uneasy about this one, I think it's going to silently break people's code. Though I think the current change in argument order is already going to break it anyway, so let's think if there's a better way

I think the following would feel natural, and would loudly break for anyone who didn't pay attention to their warnings (based on some clients I've worked with, I'd estimate that that's the majority 😩 ):

def pivot(self, on, *, values=None, index=None): ...

Then the call becomes:

df.pivot('subject', index='name', values=['test_1', 'test_2'])

if you don't specify either of index or values, then all the remaining columns are used for that argument (I think Marshall was suggesting this some time ago). But you need to specify at least one of index or values

If we could do that, I'd feel a lot easier about changing the order. I also think it would lead to more readable code if index and values are keyword-only, else df.pivot('a', 'b', 'c') could do anything as far as the uninitiated reader is concerned

@stinodego
Copy link
Member

stinodego commented Jun 21, 2024

Sure that sounds good. Let's do on, *, index, values though, because values is optional. And unpivot also starts with on, *, index then.

Copy link
Member

@stinodego stinodego left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks Marco!

@stinodego stinodego merged commit 78a31b4 into pola-rs:main Jun 21, 2024
30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking rust Change that breaks backwards compatibility for the Rust crate deprecation Add a deprecation warning to outdated functionality enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rename melt to unpivot and make parameters consistent with pivot
4 participants