Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Impossible to create a new column containing a struct with one column containing a datetime literal #19086

Closed
2 tasks done
ismailhammounou opened this issue Oct 3, 2024 · 1 comment · Fixed by #19094
Closed
2 tasks done
Assignees
Labels
accepted Ready for implementation bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars

Comments

@ismailhammounou
Copy link

ismailhammounou commented Oct 3, 2024

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

I tested the code in 1.8.0 and above and it did not work but it seems to work < 1.8.0 (tested on 1.7.1)

import polars as pl
from datetime import datetime
import pytz
 
df = pl.DataFrame({"col1": ["a", "b"]})
metadata = pl.struct(
        pl.lit(datetime.now(pytz.UTC)).alias('creation_ts'), pl.lit("a").alias("col_2"),
        schema={"creation_ts": pl.Datetime(time_unit='us', time_zone="UTC"), "col_2": pl.Utf8()}
    )
df = df.with_columns(metadata.alias("__metadata__"))
print(df)

Log output

thread '<unnamed>' panicked at crates/polars-core/src/scalar/mod.rs:46:92:
called `Result::unwrap()` on an `Err` value: SchemaMismatch(ErrString("unexpected value while building Series of type Datetime(Microseconds, Some(\"UTC\")); found value of type Datetime(Microseconds, Some(\"UTC\")): 2024-10-03 12:18:33.247992 UTC"))
---------------------------------------------------------------------------
PanicException                            Traceback (most recent call last)
/tmp/ipykernel_11476/316418030.py in ?()
      7         pl.lit(datetime.now(pytz.UTC)).alias('creation_ts'), pl.lit("a").alias("col_2"),
      8         schema={"creation_ts": pl.Datetime(time_unit='us', time_zone="UTC"), "col_2": pl.Utf8()}
      9     )
     10 df = df.with_columns(metadata.alias("__metadata__"))
---> 11 print(df)

/opt/conda/lib/python3.11/site-packages/polars/dataframe/frame.py in ?(self)
   1170     def __str__(self) -> str:
-> 1171         return self._df.as_str()

PanicException: called `Result::unwrap()` on an `Err` value: SchemaMismatch(ErrString("unexpected value while building Series of type Datetime(Microseconds, Some(\"UTC\")); found value of type Datetime(Microseconds, Some(\"UTC\")): 2024-10-03 12:18:33.247992 UTC"))

Issue description

When I use datetime object in literal inside a struct, I am getting an error but when I use an existing column from the df inside the struct it works along with the datetime literal.

This code works:

import polars as pl
from datetime import datetime
import pytz
 
df = pl.DataFrame({"col1": ["a", "b"], "col_2": ["a", "a"]})
metadata = pl.struct(
        pl.lit(datetime.now(pytz.UTC)).alias('creation_ts'), pl.col("col_2"),
        schema={"creation_ts": pl.Datetime(time_unit='us', time_zone="UTC"), "col_2": pl.Utf8()}
    )
df = df.with_columns(metadata.alias("__metadata__"))
print(df)

but this code does not work as expected:

import polars as pl
from datetime import datetime
import pytz
 
df = pl.DataFrame({"col1": ["a", "b"]})
metadata = pl.struct(
        pl.lit(datetime.now(pytz.UTC)).alias('creation_ts'), pl.lit("a").alias("col_2"),
        schema={"creation_ts": pl.Datetime(time_unit='us', time_zone="UTC"), "col_2": pl.Utf8()}
    )
df = df.with_columns(metadata.alias("__metadata__"))
print(df)

Expected behavior

shape: (2, 3)
┌──────┬─────---─-------------------┐
│ col1 ┆__metadata__ │
│ --- ┆--- │
│ str ┆struct[2] │
╞══════╪══════════════════╡
│ a ┆{2024-10-03 12:23:55.949877 UT… │
│ b ┆{2024-10-03 12:23:55.949877 UT… │
└───┴────-------──────────────┘

Installed versions

--------Version info---------
Polars:              1.8.0
Index type:          UInt32
Platform:            Linux-5.15.0-1071-azure-x86_64-with-glibc2.35
Python:              3.11.10 | packaged by conda-forge | (main, Sep 22 2024, 14:10:38) [GCC 13.3.0]

----Optional dependencies----
adbc_driver_manager  <not installed>
altair               5.4.1
cloudpickle          3.0.0
connectorx           <not installed>
deltalake            0.20.1
fastexcel            <not installed>
fsspec               2022.11.0
gevent               <not installed>
great_tables         <not installed>
matplotlib           3.9.2
nest_asyncio         1.6.0
numpy                2.0.2
openpyxl             3.1.5
pandas               2.2.3
pyarrow              17.0.0
pydantic             2.9.2
pyiceberg            <not installed>
sqlalchemy           2.0.35
torch                <not installed>
xlsx2csv             <not installed>
xlsxwriter           <not installed>
@ismailhammounou ismailhammounou added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Oct 3, 2024
@ismailhammounou ismailhammounou changed the title Impossible to create a new column containing a struct with one column containing a datetime literam Impossible to create a new column containing a struct with one column containing a datetime literal Oct 3, 2024
@cmdlineluser
Copy link
Contributor

cmdlineluser commented Oct 3, 2024

Can reproduce on 1.9.0 also.

import pytz
import polars as pl
from datetime import datetime

pl.DataFrame({"a": [1]}).with_columns(pl.struct(pl.lit(datetime.now(pytz.UTC))))
# shape: (1, 2)
# ┌─────┬─────────────────────────────────┐
# │ a   ┆ literal                         │
# │ --- ┆ ---                             │
# │ i64 ┆ struct[1]                       │
# ╞═════╪═════════════════════════════════╡
# │ 1   ┆ {2024-10-03 12:45:25.810150 UT… │
# └─────┴─────────────────────────────────┘
pl.DataFrame({"a": [1, 2]}).with_columns(pl.struct(pl.lit(datetime.now(pytz.UTC))))

# thread '<unnamed>' panicked at crates/polars-core/src/scalar/mod.rs:46:92:
# PanicException: called `Result::unwrap()` on an `Err` value: 
# SchemaMismatch(ErrString("unexpected value while building Series of type 
# Datetime(Microseconds, Some(\"UTC\")); found value of type 
# Datetime(Microseconds, Some(\"UTC\")): 2024-10-03 12:45:27.592791 UTC"))

It works in 1.7.2

@coastalwhite coastalwhite self-assigned this Oct 4, 2024
@c-peters c-peters added the accepted Ready for implementation label Oct 6, 2024
@c-peters c-peters moved this to Done in Backlog Oct 6, 2024
@c-peters c-peters added this to Backlog Oct 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted Ready for implementation bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants