-
-
Notifications
You must be signed in to change notification settings - Fork 691
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong result when a shift involves a DST change. #72
Comments
Interesting. Conversions / shifting around DST are tricky and early on there were a number of problems with this. Will check it out. |
The really annoying part is that around DST, when you shift by 1h from 1.30AM, the time difference should be 60 minutes (what you asked for) and the "visual difference" should be 2 hours (not what you asked) (3.30AM), but when you shift by 1 day, say from midnight, the time difference should be 23h, but the visual difference should be exactly 1 day, so it's the other way around (well, if you define 1 day as "adding 1 to the "day" field for the calendar" and not "24 hours") Among the different algorithms that have worked on some cases for me was :
Or differenciate shifts for date and shifts for time,
This last strategy can lead to potential issues : when you want to relocalize, there might be an ambiguity (for exemple, if you are in Paris, 2013-03-30 2:30:00+01:00 and you add 1 day, you arrive on an hour that does not exist. Same for DST end, you can land on 2 different hours that have the same name) Hopping these thoughts could help you 😄 |
This is (should be) a barrier to anyone using Arrow in production code with non-UTC date math. I'll be giving Arrow serious consideration only after this is fixed. I will say that ewjoachim's first paragraph is the behaviour I hope Arrow will adopt. |
This is unfortunately a problem with dateutil and Python. The problem can be reproduced as simply as: >>> import datetime
>>> from dateutil import zoneinfo
>>>
>>> tz = zoneinfo.gettz("Europe/Paris")
>>> just_before = datetime.datetime(2013, 03, 31, 1, 59, 59, 999999, tz)
>>> print just_before
2013-03-31 01:59:59.999999+01:00
>>>
>>> just_after = just_before + datetime.timedelta(microseconds=1)
>>> print just_after
2013-03-31 02:00:00+02:00 The issue is that dateutil's tzfile implementation tries to do the right thing by making the return of I think ewjoachim has hit the nail on the head with his first suggestion. This is effectively what pytz does with its normalize function and by having tzinfo instances with fixed UTC offsets. |
This may be of interest to the participants: https://gist.github.com/nigelmcnie/ae856b5804fa9713f54e We had the same issues with python's datetime lib, and we evaulated Arrow and noticed it couldn't handle this case either, so we wrote our own lib - Quantum. We've been using it internally for a while and it's worked well, so others might find it interesting - maybe the Arrow devs can use it as inspiration, or maybe Quantum could be improved and released properly. (am aware this is a bug tracker for Arrow - if you have comments about Quantum that aren't related to fixing this bug, please put them on the gist linked to instead) |
Isn't the "proper" solution to only add and subtract datetimes in UTC and then convert back to localtime? |
@chebee7i I don't think so. When you're on 2013-03-30 12:00:00+01:00, in Paris and you you want "1 day later", if you shift to UTC, you'll get 2013-03-30 11:00:00Z, add 1 day that's 2013-03-30 11:00:00Z, shift back to Paris, it's 2013-03-30 13:00:00+02:00. Going to UTC is "useless" in this case, it's just like considering "1 day equals 24 hours, period." and that's the heart of the problem. Everything under the day (second, minute, hour) works like that (well except with leap seconds but that's something else) and everything from the day and up, what you need (most of the time, or at least what I do need) is not a time shift but a calendar digits replace. |
I'm still a bit confused. Here is a script: from __future__ import print_function
import datetime
import pytz
paris = pytz.timezone('Europe/Paris')
utc = pytz.utc
before_naive = datetime.datetime(2013, 3, 31, 1, 59)
before_paris = paris.localize(before_naive)
before_utc = before_paris.astimezone(utc)
delta = datetime.timedelta(minutes=1)
after_utc = (before_utc + delta)
after_paris = after_utc.astimezone(paris)
# Compute aware deltas
minutes_paris = (after_paris - before_paris).total_seconds() / 60
minutes_utc = (after_utc - before_utc).total_seconds() / 60
# Compute naive deltas
naive_minutes_paris = (after_paris.replace(tzinfo=None) - before_paris.replace(tzinfo=None)).total_seconds() / 60
naive_minutes_utc = (after_utc.replace(tzinfo=None) - before_utc.replace(tzinfo=None)).total_seconds() / 60
print("Paris")
print(before_paris)
print(after_paris)
print("Minutes:", minutes_paris) # 1 minute, as expected
print("Naive Minutes:", naive_minutes_paris) # 61 minutes, as visually seen.
print()
print("UTC")
print(before_utc)
print(after_utc)
print("Minutes:", minutes_utc) # 1 minute, as expected.
print("Naive Minutes:", naive_minutes_utc) # 1 minute, as visually seen. Can you tell me what is wrong here? And what you want? |
Anyone want to submit a PR for this? |
Another example
Produces
The problem is the hour should be 3 which is then throwing off the other calculations. |
@andrewelkins It seems that all the examples you point out as to be fixed comes from here: https://github.com/sdispater/pendulum#why-not-arrow (I am the author of Pendulum by the way) I should point out that this approach is wrong since those are just some non-exhaustive examples and should not be treated as specific cases but rather understanding the underlying problem, in this case understanding how DST transitions work. This is an issue that has been here for 3 years and neither Chris nor you have been able to fix this while being critical for a datetime library, so the promise made by Arrow seems dubious, at best. I started Pendulum because Arrow is so broken (you can just look at your issues piling up) I wondered if its maintainers had any knowledge of what a datetime library should be or if they had any desire to go forward with the library. Sorry if I sound too harsh but I have been bitten so many times by Arrow that I could no longer rely on it. To be honest, the first stable version of Pendulum is around the corner and does everything Arrow does and more, without the bugs. What I don't understand is how a 6-month old library can be more complete and less buggy than a 3-year old one. I don't know what you plan to do with Arrow (it seems to me that you just expect people to fix the issues for you at this point) but you owe it to your users to warn them about the critical shortcomings of the library. |
I get it that you're upset at Arrow not working how you would expect, but there's no need to be aggressive and condescending here. We're in @crsmithdev 's space, AFAIK we both have no idea of what might have happened in their live that kept them from fixing the bug (or I should say, to spend countless hours and/or days maintaining a library they may not even use anymore themselves, working completely for nothing, on their free time in addition to whatever else they might have to deal with) and it's dishonest to assume it has anything to do with their competence. Using someone else's software won't make them owe you anything. It's a pure question of gratefulness, of trust, of sharing.
A library does not grow with time but with contributions. Contributions are more frequent when a project is useful and welcoming. Having people taking over issues to say their own project is better and this project is shit does not really help making the space welcoming. Just saying. I've been using Arrow a little bit (not quite much, I admit), but I've always looked at that project with interest. When I discovered pendulum lately (and we had an exchange on twitter), I was equally interested. Sadly, I'm not sure I really want to go further now. I'm aware that you have no reason to care, but I sincerly hope that by the time pendulum becomes a must-have in many projects, which might happen, I can find in your community a more welcoming atmosphere :/ |
I agree with what you say. But if you don't have time or the interest anymore to maintain it (and I completely understand that, believe me) you can add new contributors to the project so that you can pass it on and it keeps on living. But Arrow seems to be stagnating bugfixes wise, that's what I think is a shame. When Andrew took over for Chris a lot of issues were fixed and improvements made which I found really good because Arrow could catch up on what was broken. But critical issues were not fixed, like this one. And don't get me wrong, I am not advocating for Arrow to be completely replaced by Pendulum, Arrow has its user base because they find it useful, which I understand. I merely pointed some flaws and bugs to raise awareness not to force anyone to use something else (How could I?). And, to be honest, I would be really glad if both libraries were to remain and be used along each other because, in my opinion, while they are close they don't have the exact same set of features. The thing that bothers me is that Arrow is seen by a lot of people as the definitive library when it comes to datetime handling in Python while it is not necessarily, mostly due to critical bugs, and alternatives exist (and I am not talking only about pendulum which is one of many). If these bugs were to be fixed I would have nothing to say about it. And I agree that open source maintainers and creators don't owe you anything as a user. However, I think that as a library maintainer you owe it to people using your library to, at least, be true to the promise you made initially since this is the reason why people chose to use it, and if not just point the shortcomings of your library. That's all. And if you can't do it anymore just bring someone along to help you with it, everyone would benefit from it. And sorry if you felt animosity in my post, that was not the intention. Sometimes I get a little carried away when it comes to subject I really care about and I sound like rude. I am sorry about that. But I want to tell @andrewelkins and @crsmithdev that it is not a personal attack. I mean Chris made a really popular library, useful to a lot of people and that is a great achievement in itself. And as a developer myself I know what it can cost and the time it can take, so I will always recognize the work that has been done. And Andrew took over a project that was already largely popular and that is not small work. What I would like to see now, however, is what they want to do with the library, what they plan to do with it. Nothing more. |
FYI, this is relatively simple to fix, because you just need to detect whether it's an imaginary datetime. There's a convenience function in from dateutil import tz
from datetime import datetime
PAR = tz.gettz('Europe/Paris')
im_dt = datetime(2013, 3, 31, 2, 0, tzinfo=PAR) # This time does not exist
dt_u = im_dt.astimezone(tz.tzutc())
im_dt_back = dt_u.astimezone(PAR)
print(im_dt)
# 2013-03-31 02:00:00+02:00
print(im_dt_back)
# 2013-03-31 01:00:00+01:00
r_dt = datetime(2013, 3, 31, 3, 0, tzinfo=PAR)
r_dt_back = r_dt.astimezone(tz.tzutc()).astimezone(PAR)
print(r_dt)
# 2013-03-31 03:00:00+02:00
print(r_dt_back)
# 2013-03-31 03:00:00+02:00 So you could do something like this in from dateutil.tz import datetime_exists
def resolve_imaginary(dt):
if dt.tzinfo is not None and not datetime_exists(dt):
# You need to shift this by the difference between the current offset
# and the previous one. Historically, there has never been a time zone
# shift of more than 24 hours, nor has there been two changes to a time
# zone within 24 hours, so if we take the "wall time" - 24 hours, we
# should be OK, barring any insane time zone behavior in the future.
curr_offset = dt.utcoffset()
old_offset = (dt - timedelta(hours=24)).utcoffset()
dt += curr_offset - old_offset
return dt Then you can just call If you don't want to peg your try:
from dateutil.tz import datetime_exists
except ImportError:
def datetime_exists(dt, tz=None):
# Implement handling the tz-attachment part if necessary here
return (dt.astimezone(dateutil_tz.tzutc()).astimezone(dt.tzinfo) == dt) I'm not sure if you have a similar problem around DST transitions that create an ambiguous time - that's a tougher problem to solve in a sense because the semantics of addition in non-fixed-offset zones is not clearcut. |
@sdispater Thanks for chiming in. Do you have a more exhaustive list known bugs? I would love to have that contribution. @ewjoachim Thanks for chiming in on my behalf 👍 |
@pganssle Thanks for the example. Seems straight forward and I'll test it out. Do you have time to make it a PR w/ tests? If not, I'll tackle it over the next couple days. |
@andrewelkins Pretty busy schedule, so probably not, feel free to take what I've got and run with it. FYI, relevant edge cases:
|
@andrewelkins Thanks for the kind words :-) Like I said, it is not directed towards you personally but rather to raise awareness about this issue. I know quite well DST is a tricky problem. I can't really provide any solution for Arrow since I came up with my own implementation in the end rather than relying on And @pganssle's solution seems like a good starting point. |
FWIW, I did find a -24 hour edge case, but it's a bit dubious because they're all LMT -> LMT transitions that happened in 1901 for some reason: ((datetime.timedelta(-1), datetime.datetime(1901, 12, 13, 9, 22, 52)),
'US/Samoa'),
((datetime.timedelta(-1), datetime.datetime(1901, 12, 13, 9, 22, 52)),
'Pacific/Samoa'),
((datetime.timedelta(-1), datetime.datetime(1901, 12, 13, 9, 22, 52)),
'Pacific/Pago_Pago'),
((datetime.timedelta(-1), datetime.datetime(1901, 12, 13, 9, 22, 52)),
'Pacific/Midway'),
((datetime.timedelta(-1), datetime.datetime(1901, 12, 13, 9, 18, 52)),
'Pacific/Apia') |
@sdispater Not looking for solutions, just issues to fix other than DST. Seemed like you had a few issues, and I'd like to get them addressed for the user base. We won't be relying on @pganssle Thanks for bringing that up. I'll start the PR and writing the tests over the next couple days. |
@andrewelkins Most of the issues I have encountered were in timezone handling and DST due to the fact that I implemented my own timezone library since I was not satisfied by the existing solutions (being Other issues were more related to the API design rather than bugs |
@sdispater Great, Thanks for the info! |
I'd just like to be clear about this - doing a time shift should automatically resolve DST shifts, both |
Here's another example around the EST DST change: import arrow
import pytz
tz = pytz.timezone('America/New_York')
end_date = tz.localize(datetime(2018, 3, 13, 4, 58, 28))
# end_date could be in UTC but showing it in the same timezone here for simplicity (hence we need the `.to(tz)` below.
# arrow datetime at the end of the day
end = arrow.get(end_date).to(tz).ceil('day')
# -6 days as we go from the start of the first day to the end of the last day
start = end.shift(days=-6).floor('day')
# Note: If I convert to the timezone again after shifting I get the result I want
correct_start = end.shift(days=-6).to(tz).floor('day') Results end # This is correct
Out[23]: <Arrow [2018-03-13T23:59:59.999999-04:00]>
start # This is wrong - should be -5 hours as the date at the start is before DST
Out[22]: <Arrow [2018-03-07T00:00:00-04:00]>
correct_start # This works but I shouldn't have to convert to the timezone again after shifting!
Out[27]: <Arrow [2018-03-07T00:00:00-05:00]> |
@intiocean I would think that's because you're using |
@pganssle Interesting. I would have thought that passing in a tzinfo to the tz = pytz.timezone('America/New_York')
end_date = tz.localize(datetime(2018, 3, 13, 4, 58, 28))
end = arrow.get(end_date).to('America/New_York').ceil('day')
start = end.shift(days=-6).floor('day') Results in In [6]: end_date
Out[6]: datetime.datetime(2018, 3, 13, 4, 58, 28, tzinfo=<DstTzInfo 'America/New_York' EDT-1 day, 20:00:00 DST>)
In [7]: end
Out[7]: <Arrow [2018-03-13T23:59:59.999999-04:00]>
In [8]: start
Out[8]: <Arrow [2018-03-07T00:00:00-05:00]> I'll change my code to use the timezone name but the different behaviour seems very counter intuitive - maybe we should add a warning to the PS. I'm not clear if this is the same issue others in this thread are having or if their issue is more around times that repeat over the actual hour just before/after DST |
@intiocean I imagine there are a lot of subtle ways that using a |
This issue isn't limited to You probably knew that, but here's a quick before = arrow.get('2018-03-10 23:00:00', 'YYYY-MM-DD HH:mm:ss', tzinfo='US/Pacific')
after = arrow.get('2018-03-11 04:00:00', 'YYYY-MM-DD HH:mm:ss', tzinfo='US/Pacific')
[ (t, t.to('utc')) for t in arrow.Arrow.range('hour', before, after) ] In the normalized-to-UTC column, note the two
|
FWIW, the way pandas handles DST boundaries on a range/span seems to me to be the right way (ie treat it as absolute hours localized into the timezone, not as a range across local hours). Useful to consider it as a model? For example, the equivalent of Spring boundary:
Fall boundary:
|
I am not sure if this is the same problem, but I found that I could not subtract times around daylight savings, either. For example (I am using >>> orig_df.loc[127857:127861,"StartTimeArrow"]
127857 2019-03-31T01:57:00+01:00
127858 2019-03-31T01:58:00+01:00
127859 2019-03-31T01:59:00+01:00
127860 2019-03-31T02:00:00+02:00
127861 2019-03-31T02:01:00+02:00
Name: StartTimeArrow, dtype: object >>> orig_df.loc[127857:127861,"EndTimeArrow"]
127857 2019-03-31T01:58:00+01:00
127858 2019-03-31T01:59:00+01:00
127859 2019-03-31T03:00:00+02:00
127860 2019-03-31T03:01:00+02:00
127861 2019-03-31T03:02:00+02:00
Name: EndTimeArrow, dtype: object If we include the time zone, which is capturing daylight savings properly, then I have actually corrupted the data by shifting it. But if I want the proper >>> orig_df.loc[127861,"StartTimeArrow"] - orig_df.loc[127857,"StartTimeArrow"]
datetime.timedelta(seconds=240) Great! I have captured 4 minutes, the true difference in time, and also 2:01 - 1:57. But... this delta is not considering the time zone shift, which you can see is still there (note that it shifts from Considering the time zone shift, the original objects that >>> orig_df.loc[127861,"EndTimeArrow"] - orig_df.loc[127857,"EndTimeArrow"]
datetime.timedelta(seconds=3840) Obviously this is wrong. I am going to use the adjustment I'd made in the Is there a better way to subtract times using I haven't dug deeply enough, but is there a pull request I can help with? Is this really a |
@lazarillo honestly, if you're already using pandas and you're doing timezone-aware stuff, just use |
Hello @dlopuch, I have a branch locally that is nearly ready to PR which should fix most of the DST issues currently present in arrow (including this issue). pandas uses pytz internally which may cause some problems (article) when dealing with DST. pandas does not particularly support arrow objects at the moment (see #755 ) but they seem willing to help out. |
OK, thanks @dlopuch and @systemcatch . I'd read somewhere that So I'll just adjust my code to work with I guess I didn't realize how much the Python ecosystem has left to deal with in this datetime world. I'm not even asking about leap seconds or anything bizarre (IMO). I know it's a tricky, pain-in-the-ass problem to solve, but it's also a hugely important one. |
@lazarillo that's a pity but I hope you'll revisit arrow in the future. python's datetime module is rather lacking in basic features, but from 3.6+ there have been some improvements. For reference this is the branch where I'm working on handling DST properly https://github.com/systemcatch/arrow/tree/DST-handling |
@systemcatch Yes, I'll definitely revisit arrow (and in fact, I have not pulled it from my flow yet). I really like it, but I cannot give up the practicality of using If there is something you want me to help with on your fork, let me know. I don't have a ton of free time, but I would like to help give back. :) |
Hopefully most of the bugs mentioned in this issue have been solved now. We will release 0.17.0 with these fixes asap. |
In paris in 2013, we changed to DST in 2013-03-31 (at 2 AM it was 3 AM), from UTC+1 to UTC+2 (DST)
Here's an example of the issue I have with arrow :
I think the problem is pretty clear...
The text was updated successfully, but these errors were encountered: