Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyO3 incorrectly converts local to UTC times on FixedOffset conversion #3267

Closed
grantslatton opened this issue Jun 24, 2023 · 9 comments · Fixed by #3269
Closed

PyO3 incorrectly converts local to UTC times on FixedOffset conversion #3267

grantslatton opened this issue Jun 24, 2023 · 9 comments · Fixed by #3269
Labels

Comments

@grantslatton
Copy link
Contributor

Bug Description

When converting to/from datetime.datetime/DateTime<FixedOffset>, PyO3 is using DateTime::naive_utc and DateTime::from_utc -- I believe these should be naive_local and from_local, respectively.

Steps to Reproduce

    #[test]
    fn pyo3_tz_bug() {
        Python::with_gil(|py| {
            let code = r#"
import datetime
t = datetime.datetime.fromtimestamp(86400).replace(tzinfo=datetime.timezone(datetime.timedelta(hours=-8)))
            "#.trim();

            let globals = PyDict::new(py);
            py.run(code, Some(globals), None).unwrap();

            // Get ISO 8601 string from python
            let t = globals.get_item("t").unwrap();
            let py_iso_str = t.call_method0("isoformat").unwrap();

            // Get ISO 8601 string from rust
            let t = t.extract::<DateTime<FixedOffset>>().unwrap();
            let rust_iso_str = t.to_rfc3339();

            // They should be equal
            assert_eq!(py_iso_str.to_string(), rust_iso_str);
        })
    }

This test fails with

assertion failed: `(left == right)`
  left: `"1970-01-02T00:00:00-08:00"`,
 right: `"1970-01-01T16:00:00-08:00"`

I think the bug probably went unnoticed because it exists on both the IntoPy and FromPy paths so cancels itself out when you go to and from, but you can induce it by creating dates in native python as shown above.

Backtrace

No response

Your operating system and version

OS X Ventura 13.4

Your Python version (python --version)

3.11

Your Rust version (rustc --version)

1.70

Your PyO3 version

0.19

How did you install python? Did you use a virtualenv?

brew, no

Additional Info

No response

@adamreichold
Copy link
Member

@Psykopear @pickfire Do you have the bandwidth to look into this?

@grantslatton
Copy link
Contributor Author

grantslatton commented Jun 24, 2023

FWIW I think the fix here is to change these two naive_utc to naive_local, and this from_utc to from_local, some tests may need to be updated too.

@adamreichold
Copy link
Member

@grantslatton Care to create a pull request?

grantslatton added a commit to grantslatton/pyo3 that referenced this issue Jun 24, 2023
grantslatton added a commit to grantslatton/pyo3 that referenced this issue Jun 25, 2023
grantslatton added a commit to grantslatton/pyo3 that referenced this issue Jun 25, 2023
@Psykopear
Copy link
Contributor

Hi @grantslatton @adamreichold . So, it's been a long time since that was merged, I need some time to go through the PRs and discussion to remember what was the reasoning behind that, but I think the UTC conversion was not accidental, see this comment (and in general the discussion in the PR. It's a long one, but I think we talked about almost every detail of the code in there):

#2612 (comment)

There might still be a bug in the conversion, but reading that, I'm not sure changing from *_utc to *_local is right.
I might have been wrong then though, so I'd like to take some time to review why I did think that at the time.

Anyway if you are in a rush to merge #3269 , I trust pyo3 maintainers and if they are ok with the fix, I'm ok with it too, and if I find problems with it in the future I'll open a new issue to discuss this.

@adamreichold
Copy link
Member

Anyway if you are in a rush to merge #3269 , I trust pyo3 maintainers and if they are ok with the fix, I'm ok with it too, and if I find problems with it in the future I'll open a new issue to discuss this.

I don't think there is any particular pressure here and I would prefer if we take the time to resolve this properly. More eyes make bugs shallower.

@Psykopear
Copy link
Contributor

Psykopear commented Jun 26, 2023

Ok, I just noticed the other issue (#3266 ), and there is some more to extract from the previous discussion to properly answer that too. I'll try to find some time for this between this week and the next, but if I don't, please ping me again. It's important to get this right, and I'd like to share the output of the weeks we (me, the other authors, the reviewers) spent on the previous PRs because a lot of valuable insights was gathered at the time. Luckily everything's written down, but I understand that trying to find the important info in that discussion might be complicated if you weren't participating. I'd love to see proper timezone info support, but as usual when you deal with timezones (and python's timezones handling), getting it right is always harder than it looks.

@grantslatton
Copy link
Contributor Author

Yes, no pressure, as we have already implemented all of these fixes in our code manually, but it would be nice to move that code from our codebase to a fully supported replacement in pyo3 itself :)

I think converting to UTC is not necessarily wrong, but if you do, you need to change the fixed offset timezone to always be UTC (i.e. offset 0) too. 0400+0200 is unambiguously equal to 0200+0000, so converting to UTC is lossless if you convert both the left and right side (time and offset). The code before was just mutating the time but not the offset. The test I provided proves this is a buggy (but was a hidden bug because going IntoPy and then FromPyObject reverses this process, hiding the bug).

That said, there are maybe some philosophical reasons to prefer not always converting to UTC. If a user does 0400+2000 IntoPy, and then does FromPy, they might be surprised to find their data converted to UTC. Provenance information might be important.

@Psykopear
Copy link
Contributor

Yes, no pressure, as we have already implemented all of these fixes in our code manually, but it would be nice to move that code from our codebase to a fully supported replacement in pyo3 itself :)

Yes that's one of the things I really like about the way PyO3 is organized. I reimplemented the chrono integration in my own codebase after I found a bug in the original crate we were using, before deciding to try to upstream it (and with all the eyes on the review, including the author of the original crate, it did come up way better than my first implementation 😄 ). You need to create several newtypes and reimplement PyO3 traits on them, but you usually end up doing it for other reasons anyway.

I think converting to UTC is not necessarily wrong, but if you do, you need to change the fixed offset timezone to always be UTC (i.e. offset 0) too. 0400+0200 is unambiguously equal to 0200+0000, so converting to UTC is lossless if you convert both the left and right side (time and offset). The code before was just mutating the time but not the offset. The test I provided proves this is a buggy (but was a hidden bug because going IntoPy and then FromPyObject reverses this process, hiding the bug).

Uh, I see now, and yep this looks like a bug. I'll look into the test and comment on the PR when I get to a checkpoint for what I'm working at.

That said, there are maybe some philosophical reasons to prefer not always converting to UTC. If a user does 0400+2000 IntoPy, and then does FromPy, they might be surprised to find their data converted to UTC. Provenance information might be important.

Yep, I'd obviously rather have the exact same representation when passing a datetime object from Python->Rust->Python. I think there was a reason behind this, and it was not accidental. But as I said it's been some time, so I might be confusing this with something we did elsewhere in the integration, that's why I want to double check the discussion before saying more.

@Psykopear
Copy link
Contributor

Ok, sorry for delaying this, I misinterpreted the problem at first, but I reviewed the PR, and it's good for me, the issue is there and the PR fixes it. I added a comment there, the only concern I have is about the chrono_tz integration (I think we'll need to have a separate feature for chrono_tz, and keep chrono as it is), but that can be discussed in the other issue. Thanks @grantslatton for fixing this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants