Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(python): Verify the integrity of pandas column names before implied string conversion #17433

Merged

Conversation

tylerriccio33
Copy link
Contributor

In the issue #16025 , the author converts a pandas dataframe who's columns are 0 and '0'. In polars dataframes, this is illegal, as all the column names are converted to strings. This is done in the pl.from_pandas, but the conversion happens in a way that overwrites a prior column. For example, if 0 and '0' are columns, the '0' will overwrite the first column, since that column was converted first, to a '0'.

I check the uniqueness of the stringified names in from_pandas. This should also catch some unusual column names, like arbitrary objects with str methods.

I did change the behavior of test_from_pandas_duplicated_columns in test_interop. The test now raises the error message I wrote since the duplicate columns were caught earlier. I understand this may be undesirable since it's more consistent to propagate the pandas error up the stack to the user, which was the original behavior.

Also, I opened a pull request on this issue yesterday but deleted it since I was on the wrong branch. Excuse my git skills, I'm a beginner with it. Let me know if you need any changes to this or have any questions! Thank you.

@stinodego stinodego changed the title fix(python): verify the integrity of pandas column names before implied string conversion. fix(python): Verify the integrity of pandas column names before implied string conversion Jul 5, 2024
@github-actions github-actions bot added fix Bug fix python Related to Python Polars and removed title needs formatting labels Jul 5, 2024
Copy link
Member

@stinodego stinodego left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rebase your branch on the main branch, then we can review this.

@tylerriccio33 tylerriccio33 force-pushed the tr-pandas-convert-colname-bug branch from 791c135 to 186de2b Compare July 5, 2024 18:42
Copy link

codecov bot commented Jul 5, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 80.45%. Comparing base (200c6a4) to head (2f15ea2).

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #17433      +/-   ##
==========================================
- Coverage   80.46%   80.45%   -0.01%     
==========================================
  Files        1483     1483              
  Lines      194832   194836       +4     
  Branches     2770     2771       +1     
==========================================
- Hits       156767   156752      -15     
- Misses      37556    37574      +18     
- Partials      509      510       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@tylerriccio33
Copy link
Contributor Author

Please rebase your branch on the main branch, then we can review this.

I rebased and re-pushed the pr, which triggered the ci checks. Let me know if you need anything else!

@tylerriccio33 tylerriccio33 force-pushed the tr-pandas-convert-colname-bug branch from 186de2b to 2f15ea2 Compare July 6, 2024 14:56
@ritchie46
Copy link
Member

Thanks @tylerriccio33

@ritchie46 ritchie46 merged commit c256a02 into pola-rs:main Jul 6, 2024
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix Bug fix python Related to Python Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants