fix(python): Verify the integrity of pandas column names before implied string conversion #17433
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In the issue #16025 , the author converts a pandas dataframe who's columns are 0 and '0'. In polars dataframes, this is illegal, as all the column names are converted to strings. This is done in the pl.from_pandas, but the conversion happens in a way that overwrites a prior column. For example, if 0 and '0' are columns, the '0' will overwrite the first column, since that column was converted first, to a '0'.
I check the uniqueness of the stringified names in
from_pandas
. This should also catch some unusual column names, like arbitrary objects with str methods.I did change the behavior of
test_from_pandas_duplicated_columns
intest_interop
. The test now raises the error message I wrote since the duplicate columns were caught earlier. I understand this may be undesirable since it's more consistent to propagate the pandas error up the stack to the user, which was the original behavior.Also, I opened a pull request on this issue yesterday but deleted it since I was on the wrong branch. Excuse my git skills, I'm a beginner with it. Let me know if you need any changes to this or have any questions! Thank you.