Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong result when a shift involves a DST change. #72

Closed
ewjoachim opened this issue Oct 31, 2013 · 36 comments · Fixed by #845
Closed

Wrong result when a shift involves a DST change. #72

ewjoachim opened this issue Oct 31, 2013 · 36 comments · Fixed by #845
Assignees
Labels
Milestone

Comments

@ewjoachim
Copy link

In paris in 2013, we changed to DST in 2013-03-31 (at 2 AM it was 3 AM), from UTC+1 to UTC+2 (DST)

Here's an example of the issue I have with arrow :

>>> import arrow, datetime

>>> just_before = arrow.get(datetime.datetime(2013, 03, 31, 1, 50, 45), "Europe/Paris").ceil("hour")
>>> just_before
<Arrow [2013-03-31T01:59:59.999999+01:00]>

>>> just_after = just_before.replace(microseconds=+1)
>>> just_after
<Arrow [2013-03-31T02:00:00+02:00]>
>>> # That is not right... It should be either 2 AM UTC+1 or preferably  3AM UTC +2
>>> # One could argue that depending on what you ask for, it may be the correct answer, but then the following is :

>>> (just_after - just_before).total_seconds()
1e-06
>>> # That is right but not consistent with the dates

>>> (just_after.to("utc") - just_before.to("utc").total_seconds()
-3599.999999
# That should be the same value as the previous computation. Plus, the value is negative because :

>>> just_before.to('utc'), just_after.to('utc')
(<Arrow [2013-03-31T00:59:59.999999+00:00]>,
 <Arrow [2013-03-31T00:00:00+00:00]>)

I think the problem is pretty clear...

@crsmithdev
Copy link
Collaborator

Interesting. Conversions / shifting around DST are tricky and early on there were a number of problems with this. Will check it out.

@ewjoachim
Copy link
Author

The really annoying part is that around DST, when you shift by 1h from 1.30AM, the time difference should be 60 minutes (what you asked for) and the "visual difference" should be 2 hours (not what you asked) (3.30AM), but when you shift by 1 day, say from midnight, the time difference should be 23h, but the visual difference should be exactly 1 day, so it's the other way around (well, if you define 1 day as "adding 1 to the "day" field for the calendar" and not "24 hours")

Among the different algorithms that have worked on some cases for me was :

  • Compute the utcoffset
  • Do the shift
  • Ensure that the tzinfo changes have been taken into account (if around DST)
  • Compute the utcoffset again. If it has changed and on certain conditions (whether the shift is >0 or <0, whether you are shifting by a "time" unit or a "date" unit, ...), add the difference of utcoffsets to the result of the shift
  • return

Or differenciate shifts for date and shifts for time,

  • For time shift : add a time delta (and ensure proper selection of the timezone that may have changed)
  • For date shift : delocalize (make naive), apply the shift, relocalize

This last strategy can lead to potential issues : when you want to relocalize, there might be an ambiguity (for exemple, if you are in Paris, 2013-03-30 2:30:00+01:00 and you add 1 day, you arrive on an hour that does not exist. Same for DST end, you can land on 2 different hours that have the same name)

Hopping these thoughts could help you 😄

@iamed2
Copy link

iamed2 commented Feb 19, 2014

This is (should be) a barrier to anyone using Arrow in production code with non-UTC date math. I'll be giving Arrow serious consideration only after this is fixed.

I will say that ewjoachim's first paragraph is the behaviour I hope Arrow will adopt.

@trashstack
Copy link

This is unfortunately a problem with dateutil and Python. The problem can be reproduced as simply as:

>>> import datetime
>>> from dateutil import zoneinfo
>>>
>>> tz = zoneinfo.gettz("Europe/Paris")
>>> just_before = datetime.datetime(2013, 03, 31, 1, 59, 59, 999999, tz)
>>> print just_before
2013-03-31 01:59:59.999999+01:00
>>>
>>> just_after = just_before + datetime.timedelta(microseconds=1)
>>> print just_after
2013-03-31 02:00:00+02:00

The issue is that datetime.datetime.__add__ totally ignores tzinfo and whether or not it is in DST. It makes a new datetime.datetime instance with all the attributes updated naively based on the timedelta and then copies across the tzinfo.

dateutil's tzfile implementation tries to do the right thing by making the return of tzfile.utcoffset based on the value of the datetime passed into it but because Python ignores this it ends up representing the date incorrectly. The rest of the time attributes keep their values as if in +0100 but tzfile knows that the new datetime.datetime is past the DST boundary so reports +0200.

I think ewjoachim has hit the nail on the head with his first suggestion. This is effectively what pytz does with its normalize function and by having tzinfo instances with fixed UTC offsets.

@crsmithdev crsmithdev added the bug label Jul 28, 2014
@nigelmcnie
Copy link

This may be of interest to the participants: https://gist.github.com/nigelmcnie/ae856b5804fa9713f54e

We had the same issues with python's datetime lib, and we evaulated Arrow and noticed it couldn't handle this case either, so we wrote our own lib - Quantum. We've been using it internally for a while and it's worked well, so others might find it interesting - maybe the Arrow devs can use it as inspiration, or maybe Quantum could be improved and released properly.

(am aware this is a bug tracker for Arrow - if you have comments about Quantum that aren't related to fixing this bug, please put them on the gist linked to instead)

@chebee7i
Copy link

Isn't the "proper" solution to only add and subtract datetimes in UTC and then convert back to localtime?

@ewjoachim
Copy link
Author

@chebee7i I don't think so. When you're on 2013-03-30 12:00:00+01:00, in Paris and you you want "1 day later", if you shift to UTC, you'll get 2013-03-30 11:00:00Z, add 1 day that's 2013-03-30 11:00:00Z, shift back to Paris, it's 2013-03-30 13:00:00+02:00.

Going to UTC is "useless" in this case, it's just like considering "1 day equals 24 hours, period." and that's the heart of the problem. Everything under the day (second, minute, hour) works like that (well except with leap seconds but that's something else) and everything from the day and up, what you need (most of the time, or at least what I do need) is not a time shift but a calendar digits replace.

@chebee7i
Copy link

chebee7i commented Feb 7, 2015

I'm still a bit confused. Here is a script:

from __future__ import print_function

import datetime
import pytz

paris = pytz.timezone('Europe/Paris')
utc = pytz.utc

before_naive = datetime.datetime(2013, 3, 31, 1, 59)
before_paris = paris.localize(before_naive)
before_utc = before_paris.astimezone(utc)

delta = datetime.timedelta(minutes=1)
after_utc = (before_utc + delta)
after_paris = after_utc.astimezone(paris)

# Compute aware deltas
minutes_paris = (after_paris - before_paris).total_seconds() / 60
minutes_utc = (after_utc - before_utc).total_seconds() / 60

# Compute naive deltas
naive_minutes_paris = (after_paris.replace(tzinfo=None) - before_paris.replace(tzinfo=None)).total_seconds() / 60
naive_minutes_utc = (after_utc.replace(tzinfo=None) - before_utc.replace(tzinfo=None)).total_seconds() / 60

print("Paris")
print(before_paris)
print(after_paris)
print("Minutes:", minutes_paris) # 1 minute, as expected
print("Naive Minutes:", naive_minutes_paris) # 61 minutes, as visually seen.
print()
print("UTC")
print(before_utc)
print(after_utc)
print("Minutes:", minutes_utc) # 1 minute, as expected.
print("Naive Minutes:", naive_minutes_utc) # 1 minute, as visually seen.

Can you tell me what is wrong here? And what you want?

@andrewelkins
Copy link
Contributor

Anyone want to submit a PR for this?

@andrewelkins
Copy link
Contributor

andrewelkins commented Dec 31, 2016

Another example

import arrow

just_before = arrow.Arrow(2013, 3, 31, 1, 59, 59, 999999, 'Europe/Paris')
just_after = just_before.shift(microseconds=1)
print just_before
print just_after

print just_before.to('utc')
print just_after.to('utc')

print (just_after.to('utc') - just_before.to('utc')).total_seconds()

Produces

1970-01-01T01:00:00+01:00
None
2013-03-31T01:59:59.999999+01:00
2013-03-31T02:00:00+02:00
2013-03-31T00:59:59.999999+00:00
2013-03-31T00:00:00+00:00
-3599.999999

The problem is the hour should be 3 which is then throwing off the other calculations.

@sdispater
Copy link
Contributor

sdispater commented Dec 31, 2016

@andrewelkins It seems that all the examples you point out as to be fixed comes from here: https://github.com/sdispater/pendulum#why-not-arrow (I am the author of Pendulum by the way)

I should point out that this approach is wrong since those are just some non-exhaustive examples and should not be treated as specific cases but rather understanding the underlying problem, in this case understanding how DST transitions work.

This is an issue that has been here for 3 years and neither Chris nor you have been able to fix this while being critical for a datetime library, so the promise made by Arrow seems dubious, at best. I started Pendulum because Arrow is so broken (you can just look at your issues piling up) I wondered if its maintainers had any knowledge of what a datetime library should be or if they had any desire to go forward with the library.

Sorry if I sound too harsh but I have been bitten so many times by Arrow that I could no longer rely on it. To be honest, the first stable version of Pendulum is around the corner and does everything Arrow does and more, without the bugs. What I don't understand is how a 6-month old library can be more complete and less buggy than a 3-year old one.

I don't know what you plan to do with Arrow (it seems to me that you just expect people to fix the issues for you at this point) but you owe it to your users to warn them about the critical shortcomings of the library.

@ewjoachim
Copy link
Author

ewjoachim commented Dec 31, 2016

I get it that you're upset at Arrow not working how you would expect, but there's no need to be aggressive and condescending here. We're in @crsmithdev 's space, AFAIK we both have no idea of what might have happened in their live that kept them from fixing the bug (or I should say, to spend countless hours and/or days maintaining a library they may not even use anymore themselves, working completely for nothing, on their free time in addition to whatever else they might have to deal with) and it's dishonest to assume it has anything to do with their competence.

Using someone else's software won't make them owe you anything. It's a pure question of gratefulness, of trust, of sharing.

What I don't understand is how a 6-month old library can be more complete and less buggy than a 3-year old one.

A library does not grow with time but with contributions. Contributions are more frequent when a project is useful and welcoming. Having people taking over issues to say their own project is better and this project is shit does not really help making the space welcoming. Just saying.

I've been using Arrow a little bit (not quite much, I admit), but I've always looked at that project with interest. When I discovered pendulum lately (and we had an exchange on twitter), I was equally interested. Sadly, I'm not sure I really want to go further now. I'm aware that you have no reason to care, but I sincerly hope that by the time pendulum becomes a must-have in many projects, which might happen, I can find in your community a more welcoming atmosphere :/

@sdispater
Copy link
Contributor

sdispater commented Dec 31, 2016

I agree with what you say. But if you don't have time or the interest anymore to maintain it (and I completely understand that, believe me) you can add new contributors to the project so that you can pass it on and it keeps on living. But Arrow seems to be stagnating bugfixes wise, that's what I think is a shame. When Andrew took over for Chris a lot of issues were fixed and improvements made which I found really good because Arrow could catch up on what was broken. But critical issues were not fixed, like this one.

And don't get me wrong, I am not advocating for Arrow to be completely replaced by Pendulum, Arrow has its user base because they find it useful, which I understand. I merely pointed some flaws and bugs to raise awareness not to force anyone to use something else (How could I?). And, to be honest, I would be really glad if both libraries were to remain and be used along each other because, in my opinion, while they are close they don't have the exact same set of features.

The thing that bothers me is that Arrow is seen by a lot of people as the definitive library when it comes to datetime handling in Python while it is not necessarily, mostly due to critical bugs, and alternatives exist (and I am not talking only about pendulum which is one of many). If these bugs were to be fixed I would have nothing to say about it.

And I agree that open source maintainers and creators don't owe you anything as a user. However, I think that as a library maintainer you owe it to people using your library to, at least, be true to the promise you made initially since this is the reason why people chose to use it, and if not just point the shortcomings of your library. That's all. And if you can't do it anymore just bring someone along to help you with it, everyone would benefit from it.

And sorry if you felt animosity in my post, that was not the intention. Sometimes I get a little carried away when it comes to subject I really care about and I sound like rude. I am sorry about that.

But I want to tell @andrewelkins and @crsmithdev that it is not a personal attack. I mean Chris made a really popular library, useful to a lot of people and that is a great achievement in itself. And as a developer myself I know what it can cost and the time it can take, so I will always recognize the work that has been done. And Andrew took over a project that was already largely popular and that is not small work.

What I would like to see now, however, is what they want to do with the library, what they plan to do with it. Nothing more.

@pganssle
Copy link
Contributor

pganssle commented Dec 31, 2016

FYI, this is relatively simple to fix, because you just need to detect whether it's an imaginary datetime. There's a convenience function in python-dateutil>=2.6.0 called tz.datetime_exists. The way it works is that it puts a datetime through a round trip from UTC and back, and if it doesn't get the same answer, you know it's imaginary, e.g.:

from dateutil import tz
from datetime import datetime

PAR = tz.gettz('Europe/Paris')

im_dt = datetime(2013, 3, 31, 2, 0, tzinfo=PAR)   # This time does not exist

dt_u = im_dt.astimezone(tz.tzutc())
im_dt_back = dt_u.astimezone(PAR)

print(im_dt)
# 2013-03-31 02:00:00+02:00
print(im_dt_back)
# 2013-03-31 01:00:00+01:00

r_dt = datetime(2013, 3, 31, 3, 0, tzinfo=PAR)
r_dt_back = r_dt.astimezone(tz.tzutc()).astimezone(PAR)

print(r_dt)
# 2013-03-31 03:00:00+02:00
print(r_dt_back)
# 2013-03-31 03:00:00+02:00

So you could do something like this in shift:

from dateutil.tz import datetime_exists

def resolve_imaginary(dt):
    if dt.tzinfo is not None and not datetime_exists(dt):
        # You need to shift this by the difference between the current offset
        # and the previous one. Historically, there has never been a time zone
        # shift of more than 24 hours, nor has there been two changes to a time
        # zone within 24 hours, so if we take the "wall time" - 24 hours, we
        # should be OK, barring any insane time zone behavior in the future.
        curr_offset = dt.utcoffset()
        old_offset = (dt - timedelta(hours=24)).utcoffset()

        dt += curr_offset - old_offset

    return dt

Then you can just call resolve_imaginary() on current before returning it in shift().

If you don't want to peg your dateutil dependency at 2.6.0, then you can stub it out when missing:

try:
    from dateutil.tz import datetime_exists
except ImportError:
    def datetime_exists(dt, tz=None):
        # Implement handling the tz-attachment part if necessary here
        
       return (dt.astimezone(dateutil_tz.tzutc()).astimezone(dt.tzinfo) == dt)

I'm not sure if you have a similar problem around DST transitions that create an ambiguous time - that's a tougher problem to solve in a sense because the semantics of addition in non-fixed-offset zones is not clearcut.

@andrewelkins
Copy link
Contributor

@sdispater Thanks for chiming in. Do you have a more exhaustive list known bugs? I would love to have that contribution.
I'm aware the DST translations is an area that needs work. As you can see this ticket is pretty old and was never resolved. It is the number one thing to fix and why I reupped this issue.
I wish you success with your library.

@ewjoachim Thanks for chiming in on my behalf 👍

@andrewelkins
Copy link
Contributor

andrewelkins commented Dec 31, 2016

@pganssle Thanks for the example. Seems straight forward and I'll test it out. Do you have time to make it a PR w/ tests? If not, I'll tackle it over the next couple days.

@pganssle
Copy link
Contributor

@andrewelkins Pretty busy schedule, so probably not, feel free to take what I've got and run with it.

FYI, relevant edge cases:

  • The entire day of datetime(2011, 12, 30) is imaginary in the 'Pacific/Apia' zone.
  • You should probably test against Europe/London, because that's often a source of DST-related issues, since it shifts from 0 to +1.
  • Possibly relevant: on datetime(1969, 10, 1, 0) in 'Pacific/Kwajalein', the time zone offset went back 23 hours, so the interval of [datetime(1969, 9, 30, 1), datetime(1969, 10, 1, 0))is ambiguous. (I don't have an example immediately at hand where there was a full 24 hour adjustment).

@sdispater
Copy link
Contributor

@andrewelkins Thanks for the kind words :-) Like I said, it is not directed towards you personally but rather to raise awareness about this issue. I know quite well DST is a tricky problem.

I can't really provide any solution for Arrow since I came up with my own implementation in the end rather than relying on pytz but it should be doable.

And @pganssle's solution seems like a good starting point.

@pganssle
Copy link
Contributor

FWIW, I did find a -24 hour edge case, but it's a bit dubious because they're all LMT -> LMT transitions that happened in 1901 for some reason:

 ((datetime.timedelta(-1), datetime.datetime(1901, 12, 13, 9, 22, 52)),
  'US/Samoa'),
 ((datetime.timedelta(-1), datetime.datetime(1901, 12, 13, 9, 22, 52)),
  'Pacific/Samoa'),
 ((datetime.timedelta(-1), datetime.datetime(1901, 12, 13, 9, 22, 52)),
  'Pacific/Pago_Pago'),
 ((datetime.timedelta(-1), datetime.datetime(1901, 12, 13, 9, 22, 52)),
  'Pacific/Midway'),
 ((datetime.timedelta(-1), datetime.datetime(1901, 12, 13, 9, 18, 52)),
  'Pacific/Apia')

@andrewelkins
Copy link
Contributor

@sdispater Not looking for solutions, just issues to fix other than DST. Seemed like you had a few issues, and I'd like to get them addressed for the user base. We won't be relying on pytz either, it has some odd edge cases itself.

@pganssle Thanks for bringing that up. I'll start the PR and writing the tests over the next couple days.

@sdispater
Copy link
Contributor

@andrewelkins Most of the issues I have encountered were in timezone handling and DST due to the fact that I implemented my own timezone library since I was not satisfied by the existing solutions (being pytz or dateutil) and I wanted a performant and simple implementation (https://pendulum.eustace.io/blog/a-faster-alternative-to-pyz.html).

Other issues were more related to the API design rather than bugs

@andrewelkins
Copy link
Contributor

@sdispater Great, Thanks for the info!

@obskyr
Copy link

obskyr commented Mar 10, 2017

I'd just like to be clear about this - doing a time shift should automatically resolve DST shifts, both .dst() timedelta-wise and time-wise. That's the goal, right?

@intiocean
Copy link

Here's another example around the EST DST change:

import arrow
import pytz
tz = pytz.timezone('America/New_York')
end_date = tz.localize(datetime(2018, 3, 13, 4, 58, 28))
 # end_date could be in UTC but showing it in the same timezone here for simplicity (hence we need the `.to(tz)` below.
# arrow datetime at the end of the day
end = arrow.get(end_date).to(tz).ceil('day')
# -6 days as we go from the start of the first day to the end of the last day
start = end.shift(days=-6).floor('day')

# Note: If I convert to the timezone again after shifting I get the result I want
correct_start = end.shift(days=-6).to(tz).floor('day')

Results

end # This is correct
Out[23]: <Arrow [2018-03-13T23:59:59.999999-04:00]>
start # This is wrong - should be -5 hours as the date at the start is before DST
Out[22]: <Arrow [2018-03-07T00:00:00-04:00]>
correct_start # This works but I shouldn't have to convert to the timezone again after shifting!
Out[27]: <Arrow [2018-03-07T00:00:00-05:00]>

@pganssle
Copy link
Contributor

@intiocean I would think that's because you're using pytz, which needs to be normalized if it crosses a DST boundary.

@intiocean
Copy link

@pganssle Interesting. I would have thought that passing in a tzinfo to the to() method would be the same as passing the timezone name. However after your suggestion it was pytz I've experimented and it's not the same.

tz = pytz.timezone('America/New_York')
end_date = tz.localize(datetime(2018, 3, 13, 4, 58, 28))
end = arrow.get(end_date).to('America/New_York').ceil('day')
start = end.shift(days=-6).floor('day')

Results in

In [6]: end_date
Out[6]: datetime.datetime(2018, 3, 13, 4, 58, 28, tzinfo=<DstTzInfo 'America/New_York' EDT-1 day, 20:00:00 DST>)

In [7]: end
Out[7]: <Arrow [2018-03-13T23:59:59.999999-04:00]>

In [8]: start
Out[8]: <Arrow [2018-03-07T00:00:00-05:00]>

I'll change my code to use the timezone name but the different behaviour seems very counter intuitive - maybe we should add a warning to the to() docs?

PS. I'm not clear if this is the same issue others in this thread are having or if their issue is more around times that repeat over the actual hour just before/after DST

@pganssle
Copy link
Contributor

@intiocean I imagine there are a lot of subtle ways that using a pytz zone could go wrong because pytz doesn't use the standard datetime time zone interface. You may end up having to sprinkle warnings all over the place in arrow. If it were up to me I'd say "pytz is not supported`", and be done with it, but it's not clear how and where to best communicate this.

@dlopuch
Copy link

dlopuch commented May 2, 2018

This issue isn't limited to .shift(), but also functions like .range().

You probably knew that, but here's a quick .range() test case across the spring DST boundary to check:

before = arrow.get('2018-03-10 23:00:00', 'YYYY-MM-DD HH:mm:ss', tzinfo='US/Pacific')
after = arrow.get('2018-03-11 04:00:00', 'YYYY-MM-DD HH:mm:ss', tzinfo='US/Pacific')

[ (t, t.to('utc')) for t in arrow.Arrow.range('hour', before, after) ]

In the normalized-to-UTC column, note the two 2018-03-11T09:00:00+00:00 entries:

[(<Arrow [2018-03-10T23:00:00-08:00]>, <Arrow [2018-03-11T07:00:00+00:00]>),
 (<Arrow [2018-03-11T00:00:00-08:00]>, <Arrow [2018-03-11T08:00:00+00:00]>),
 (<Arrow [2018-03-11T01:00:00-08:00]>, <Arrow [2018-03-11T09:00:00+00:00]>),
 (<Arrow [2018-03-11T02:00:00-07:00]>, <Arrow [2018-03-11T09:00:00+00:00]>),  <-- duplicated!
 (<Arrow [2018-03-11T03:00:00-07:00]>, <Arrow [2018-03-11T10:00:00+00:00]>),
 (<Arrow [2018-03-11T04:00:00-07:00]>, <Arrow [2018-03-11T11:00:00+00:00]>)]

@dlopuch
Copy link

dlopuch commented Dec 17, 2019

FWIW, the way pandas handles DST boundaries on a range/span seems to me to be the right way (ie treat it as absolute hours localized into the timezone, not as a range across local hours). Useful to consider it as a model?

For example, the equivalent of Arrow.range('hour', ...) over a DST boundary in pandas would be something like these (in both cases, note that every shift is an hour in absolute time if you consider the UTC equivalents):

Spring boundary:

pd.date_range('2018-03-11', tz='US/Pacific', periods=4, freq='H')

[
  '2018-03-11 00:00:00-08:00', 
  '2018-03-11 01:00:00-08:00',
  '2018-03-11 03:00:00-07:00', 
  '2018-03-11 04:00:00-07:00'
]

Fall boundary:

pd.date_range('2018-11-04', tz='US/Pacific', periods=4, freq='H')

[
  '2018-11-04 00:00:00-07:00', 
  '2018-11-04 01:00:00-07:00',
  '2018-11-04 01:00:00-08:00',
  '2018-11-04 02:00:00-08:00'
]

@lazarillo
Copy link

I am not sure if this is the same problem, but I found that I could not subtract times around daylight savings, either.

For example (I am using pandas, and hoping everyone knows it well enough). I have a DataFrame, and I have adjusted the start time so that it "works", but I have not adjusted the end time. Here is a sample around daylight savings (apologies for all the red font, I don't know why GitHub is rendering that):

>>> orig_df.loc[127857:127861,"StartTimeArrow"]

127857    2019-03-31T01:57:00+01:00
127858    2019-03-31T01:58:00+01:00
127859    2019-03-31T01:59:00+01:00
127860    2019-03-31T02:00:00+02:00
127861    2019-03-31T02:01:00+02:00
Name: StartTimeArrow, dtype: object
>>> orig_df.loc[127857:127861,"EndTimeArrow"]

127857    2019-03-31T01:58:00+01:00
127858    2019-03-31T01:59:00+01:00
127859    2019-03-31T03:00:00+02:00
127860    2019-03-31T03:01:00+02:00
127861    2019-03-31T03:02:00+02:00
Name: EndTimeArrow, dtype: object

If we include the time zone, which is capturing daylight savings properly, then I have actually corrupted the data by shifting it. But if I want the proper datetime.timedelta calculations returned, I need to do the shift:

>>> orig_df.loc[127861,"StartTimeArrow"] - orig_df.loc[127857,"StartTimeArrow"]

datetime.timedelta(seconds=240)

Great! I have captured 4 minutes, the true difference in time, and also 2:01 - 1:57.

But... this delta is not considering the time zone shift, which you can see is still there (note that it shifts from +1 to +2).

Considering the time zone shift, the original objects that arrow created which I didn't modify - the "EndTimeArrow" values - should be correct. But when I try to subtract them, it's a mess:

>>> orig_df.loc[127861,"EndTimeArrow"] - orig_df.loc[127857,"EndTimeArrow"]

datetime.timedelta(seconds=3840)

Obviously this is wrong. I am going to use the adjustment I'd made in the "StartTimeArrow" column, but I'm nervous since this is wrong and will break as soon as the time zone is properly captured.


Is there a better way to subtract times using arrow? I tried to play around with Arrow.shift, but that didn't seem like the right solution?


I haven't dug deeply enough, but is there a pull request I can help with? Is this really a dateutil issue, as @ManThursday suggests? In that case, I doubt I can do anything to help.

@dlopuch
Copy link

dlopuch commented Feb 11, 2020

@lazarillo honestly, if you're already using pandas and you're doing timezone-aware stuff, just use pd.Timestamp everywhere rather than trying to mix arrow into pandas. pd.Timestamp handles DST boundaries correctly, and all the pandas stuff like DatetimeIndex, slicing, vector operations, etc. are built using it. The pd.Timestamp + pd.Timedelta interface for time shifts isn't as nice as arrow's APIs, but since arrow has lots of unpatched DST bugs (eg this one first reported in 2013), I've found arrow just doesn't work if you're doing anything timezone-aware. fwiw, I'm migrating everything over to pd.Timestamp, even when not using in a dataframe.

@systemcatch
Copy link
Collaborator

Hello @dlopuch, I have a branch locally that is nearly ready to PR which should fix most of the DST issues currently present in arrow (including this issue).

pandas uses pytz internally which may cause some problems (article) when dealing with DST. pandas does not particularly support arrow objects at the moment (see #755 ) but they seem willing to help out.

@lazarillo
Copy link

OK, thanks @dlopuch and @systemcatch . I'd read somewhere that pytz had more issues with it than arrow when dealing with these corner cases, but I did not realize / research that pandas used pytz under the hood anyway.

So I'll just adjust my code to work with pandas "purely", and avoid arrow in this case. That's a shame, since I really like how arrow works; it's a great API.

I guess I didn't realize how much the Python ecosystem has left to deal with in this datetime world. I'm not even asking about leap seconds or anything bizarre (IMO). I know it's a tricky, pain-in-the-ass problem to solve, but it's also a hugely important one.

@systemcatch
Copy link
Collaborator

@lazarillo that's a pity but I hope you'll revisit arrow in the future.

python's datetime module is rather lacking in basic features, but from 3.6+ there have been some improvements.

For reference this is the branch where I'm working on handling DST properly https://github.com/systemcatch/arrow/tree/DST-handling

@lazarillo
Copy link

@systemcatch Yes, I'll definitely revisit arrow (and in fact, I have not pulled it from my flow yet). I really like it, but I cannot give up the practicality of using pandas for this project.

If there is something you want me to help with on your fork, let me know. I don't have a ton of free time, but I would like to help give back. :)

@systemcatch
Copy link
Collaborator

Hopefully most of the bugs mentioned in this issue have been solved now. We will release 0.17.0 with these fixes asap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment