Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pl.from_pandas with columns that mix strings and NAs results in TypeError #17355

Closed
2 tasks done
luukhopman opened this issue Jul 2, 2024 · 3 comments · Fixed by #17397
Closed
2 tasks done

pl.from_pandas with columns that mix strings and NAs results in TypeError #17355

luukhopman opened this issue Jul 2, 2024 · 3 comments · Fixed by #17397
Assignees
Labels
A-interop-pandas Area: interoperability with pandas accepted Ready for implementation bug Something isn't working P-high Priority: high python Related to Python Polars regression Issue introduced by a new release
Milestone

Comments

@luukhopman
Copy link

luukhopman commented Jul 2, 2024

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

> pl.from_pandas(pd.DataFrame({"col": ["a", pd.NA]}))

Log output

No response

Issue description

This issue is new in version 1.0.0.

Expected behavior

In polars==0.20.31 this works fine:

> pl.from_pandas(pd.DataFrame({"col": ["a", pd.NA]}))
test
str
"a"
null

I went through the breaking changes of polars==1.0.0 and this new error does not seem foreseen. Although it might be related to #16939.

Installed versions

--------Version info---------
Polars:               1.0.0
Index type:           UInt32
Platform:             Windows-10-10.0.22621-SP0
Python:               3.10.0 | packaged by conda-forge | (default, Nov 20 2021, 02:18:13) [MSC v.1916 64 bit (AMD64)]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          <not installed>
connectorx:           <not installed>
deltalake:            <not installed>
fastexcel:            <not installed>
fsspec:               2023.6.0
gevent:               <not installed>
great_tables:         <not installed>
hvplot:               <not installed>
matplotlib:           3.8.3
nest_asyncio:         1.6.0
numpy:                1.26.4
openpyxl:             3.1.2
pandas:               2.2.2
pyarrow:              15.0.0
pydantic:             <not installed>
pyiceberg:            <not installed>
sqlalchemy:           1.4.47
torch:                <not installed>
xlsx2csv:             0.8.2
xlsxwriter:           <not installed>
@luukhopman luukhopman added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Jul 2, 2024
@stinodego stinodego added A-interop-pandas Area: interoperability with pandas regression Issue introduced by a new release P-high Priority: high and removed needs triage Awaiting prioritization by a maintainer labels Jul 2, 2024
@github-project-automation github-project-automation bot moved this to Ready in Backlog Jul 2, 2024
@stinodego stinodego moved this from Ready to Next in Backlog Jul 2, 2024
@stinodego stinodego added this to the 1.0.1 milestone Jul 2, 2024
@luukhopman luukhopman changed the title pl.from_pandas with columns that mix strings and NAs results in TypeError` pl.from_pandas with columns that mix strings and NAs results in TypeError Jul 2, 2024
@stinodego
Copy link
Contributor

stinodego commented Jul 2, 2024

Thanks for the report, can reproduce and confirmed this works on 0.20.31. Prioritizing this as 'high' as it's a 1.0 regression.

@stinodego
Copy link
Contributor

Update:

The problem started with this PR, which I sort of figured: #16939

pd.NA is not a valid string value. It works if you pass None instead. But when parsing pandas inputs, we should accept pd.NA as null when nan_as_null is enabled (which it is by default).

@luukhopman
Copy link
Author

Thanks for the fix 🐻‍❄️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-interop-pandas Area: interoperability with pandas accepted Ready for implementation bug Something isn't working P-high Priority: high python Related to Python Polars regression Issue introduced by a new release
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants