Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change .str.to_datetime to default to microsecond precision for format specifiers "%f" and "%.f" #13592

Closed
2 tasks done
gilnribeiro opened this issue Jan 10, 2024 · 1 comment · Fixed by #13597
Closed
2 tasks done
Assignees
Labels
A-temporal Area: date/time functionality bug Something isn't working P-medium Priority: medium python Related to Python Polars
Milestone

Comments

@gilnribeiro
Copy link

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

df = pl.DataFrame({"dates":["2022-08-31 00:00:00.0", "0920-09-18 00:00:00.0"]})

df.with_columns(pl.col("dates").str.to_datetime(format="%Y-%m-%d %H:%M:%S%.f"))

Log output

---------------------------------------------------------------------------
PanicException                            Traceback (most recent call last)
c:\Users\gilnr\Downloads\win_error.ipynb Cell 3 line 2
      1 print(df)
----> 2 df.with_columns(pl.col("dates").str.to_datetime(format="%Y-%m-%d %H:%M:%S%.f"))

File c:\Users\gilnr\Anaconda3\envs\r_env\lib\site-packages\polars\dataframe\frame.py:8235, in DataFrame.with_columns(self, *exprs, **named_exprs)
   8088 def with_columns(
   8089     self,
   8090     *exprs: IntoExpr | Iterable[IntoExpr],
   8091     **named_exprs: IntoExpr,
   8092 ) -> DataFrame:
   8093     """
   8094     Add columns to this DataFrame.
   8095 
   (...)
   8233 
   8234     """
-> 8235     return self.lazy().with_columns(*exprs, **named_exprs).collect(_eager=True)

File c:\Users\gilnr\Anaconda3\envs\r_env\lib\site-packages\polars\lazyframe\frame.py:1749, in LazyFrame.collect(self, type_coercion, predicate_pushdown, projection_pushdown, simplify_expression, slice_pushdown, comm_subplan_elim, comm_subexpr_elim, no_optimization, streaming, background, _eager)
   1746 if background:
   1747     return InProcessQuery(ldf.collect_concurrently())
-> 1749 return wrap_df(ldf.collect())

PanicException: called `Option::unwrap()` on a `None` value

Issue description

This also seems to happen for all dates above "2262-09-18 00:00:00.0" and below "1677-09-18 00:00:00.0".

Note: I only tested the upper and lower boundary of the years until the exception was thrown. Month and day may vary.

Expected behavior

The expected behavior, to my mind, would be a conversion to Datetime. Especially in the cases of "1677-09-18 00:00:00.0" or "2262-09-18 00:00:00.0".

For the case of "0920-09-18 00:00:00.0" the original date that threw this exception, if indeed it should fail at least the error should be more descriptive than the existing one. For example, using the same exception that is thrown when the wrong format is applied pl.ComputeError

Installed versions

--------Version info---------
Polars:               0.20.3
Index type:           UInt32
Platform:             Windows-10-10.0.22621-SP0
Python:               3.9.2 (default, Mar  3 2021, 15:03:14) [MSC v.1916 64 bit (AMD64)]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          <not installed>
connectorx:           <not installed>
deltalake:            <not installed>
fsspec:               <not installed>
gevent:               <not installed>
hvplot:               <not installed>
matplotlib:           <not installed>
numpy:                1.20.2
openpyxl:             <not installed>
pandas:               <not installed>
pyarrow:              <not installed>
pydantic:             <not installed>
pyiceberg:            <not installed>
pyxlsb:               <not installed>
sqlalchemy:           <not installed>
xlsx2csv:             <not installed>
xlsxwriter:           <not installed>
@gilnribeiro gilnribeiro added bug Something isn't working python Related to Python Polars labels Jan 10, 2024
@MarcoGorelli
Copy link
Collaborator

thanks for the report - for now I'd suggest explicitly specifying 'us' or 'ms' as your time unit:

   ...: df.with_columns(pl.col("dates").str.to_datetime(format="%Y-%m-%d %H:%M:%S%.f", time_unit='us'))
Out[3]:
shape: (2, 1)
┌─────────────────────┐
│ dates               │
│ ---                 │
│ datetime[μs]        │
╞═════════════════════╡
│ 2022-08-31 00:00:00 │
│ 0920-09-18 00:00:00 │
└─────────────────────┘

@MarcoGorelli MarcoGorelli added the A-temporal Area: date/time functionality label Jan 10, 2024
@stinodego stinodego added the accepted Ready for implementation label Jan 10, 2024
@github-project-automation github-project-automation bot moved this to Ready in Backlog Jan 10, 2024
@stinodego stinodego added the P-medium Priority: medium label Jan 10, 2024
@stinodego stinodego removed the accepted Ready for implementation label Jan 12, 2024
@stinodego stinodego added this to the 1.0.0 milestone May 23, 2024
@stinodego stinodego changed the title PanicException in Datetime conversion Change .str.to_datetime to default to microsecond precision for format specifiers "%f" and "%.f" May 23, 2024
@stinodego stinodego moved this from Ready to Blocked in Backlog May 26, 2024
@github-project-automation github-project-automation bot moved this from Blocked to Done in Backlog Jun 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-temporal Area: date/time functionality bug Something isn't working P-medium Priority: medium python Related to Python Polars
Projects
Archived in project
3 participants