-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix offset date in mpas_xarray #46
Conversation
The reference start date for time series data has been corrected. It is now year=yearoffset+1, month=1, day=1, rather than year=yearoffset, month=12, day=31. The previous reference date would lead to the date being the expected start date (e.g. 01/01/1850) only after one day of simulation has elapsed, which is not the desired behavior.
I decided to fix this bug before addressing #45 or #33 because this change is non-bit-for-bit. All time-series plots are shifted by one day as a result of this commit. I believe (strongly) that the fixed dates are correct and the original dates were not. Here is an example plot from the current |
@milenaveneziani, I would like you to be the main reviewer of this PR because it is not bit-for-bit. You will need to decide if that's okay and if you agree that the fix is correct. @pwolfram, obviously I also want your input on whether the fix is correct but more importantly I would like your blessing on the changes I made to make |
TestingI ran the full mpas analysis on Edison using both alpha7 and alpha8 ACME results. I compared several images visually against baseline results from Full results are available here: |
@xylar: I guess I am fine either way. The reason why I chose the offset to be Dec 31 is that I wanted model times to start from 00hours of Jan 1st, instead of Jan 2nd. |
Hi @milenaveneziani, welcome back from the long weekend!
I believe with the code as you had written it, when In any case, as we move to using
So far, I don't see any sign that this is being resolved in |
Thanks, @milenaveneziani. In that case, why don't I assign @pwolfram to merge this. That way he can also merge #47, potentially at the same time. That will still leave #48 for you to have a look at when you get a chance. |
sounds good. |
@milenaveneziani, unfortunately I think that
may not be addressed soon because of technical issues in xarray and extending all the way to numpy (pydata/xarray#789). We chatted about this briefly at the AOSPY workshop and the overall consensus was this is a fairly complicated problem that requires community consensus on a design strategy as well as some dedicated time for refractoring and implementation. In short, it is probably not reasonable to expect it to be fixed anytime soon unless we were to somehow figure out how to fund a solution. Hence, we should make our solution as robust as possible given this constraint. Given this understanding, @milenaveneziani and @xylar, are you both still ok with me going forward on the review / merge? |
@pwolfram, yes, nothing has changed since my previous comment. Go ahead with the review/merge. |
@milenaveneziani also said, "sounds good", which I took to mean you could review/merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only real issue I have with this PR is the hard coding of values away from default function values because removes generality and may break other codes that use the mpas_xarray wrapper.
datetimes = [datetime.datetime(yearoffset + int(x[:4]), int(x[5:7]), | ||
int(x[8:10]), int(x[11:13]), | ||
int(x[14:16]), int(x[17:19])) | ||
for x in time] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
ds = xr.open_mfdataset(path, preprocess=lambda x: preprocess_mpas(x, | ||
timeSeriesStats=True, timestr='timeSeriesStatsMonthly_avg_daysSinceStartOfSim_1')) | ||
def test_load_mpas_xarray_timeSeriesStats_datasets(path): # {{{ | ||
timestr = 'timeSeriesStatsMonthly_avg_daysSinceStartOfSim_1' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be an optional argument to the function above, which may be clearer and provide additional functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that change fit in the scope of this PR. I'm just trying to make the line PEP8 compliant, not add any new functionality. I pulled out timestr simply because the variable name is so long that there's no feasible way of keeping the line under 80 characters without assigning it to a variable.
If you want to upgrade the testing of mpas_xarray, that would be a fine thing to do in another PR.
ds = xr.open_mfdataset(path, preprocess=lambda x: | ||
preprocess_mpas(x, | ||
timeSeriesStats=True, | ||
timestr=timestr)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes the code a lot clearer, thanks for the cleanup.
ds = remove_repeated_time_index(ds) | ||
ds2 = xr.open_mfdataset(path, preprocess=lambda x: preprocess_mpas(x, yearoffset=1850)) | ||
ds2 = xr.open_mfdataset(path, preprocess=lambda x: | ||
preprocess_mpas(x, yearoffset=1850)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was the choice for separation of the lambda argument from its name? I think it looks clearer the previous way but was this change because it overflowed the column limitation, requiring indentation? If so, this change makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could not find a cleaner way to obey the indentation rules and still stay within the 80-character limit per line. It might actually be cleaner to explicitly define a function rather than using lambda. But I'm up for suggestions. This comes up as well when trying to make the analysis scripts PEP8 compliant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, this is certainly a gray-area and the code is clear and aesthetically pleasing so we go for it and just keep and eye out for better ways to do this in the future (if they exist).
timeSeriesStats=False, timestr=None, | ||
yearoffset=1849, monthoffset=12, dayoffset=31): #{{{ | ||
timeSeriesStats=False, timestr=None, | ||
yearoffset=1849): # {{{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are loosing an option here in exchange of hard-coded values. Why aren't we just setting the defaults to new values? I think this is probably better long term by keeping the code more general.
|
||
# compute shifted datetimes | ||
daysSinceStart = ds[timestr] | ||
datetimes = [datetime.datetime(yearoffset, monthoffset, dayoffset) + x | ||
datetimes = [datetime.datetime(yearoffset+1, 1, 1) + x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see comment above about hard coding values. I think we only want the default to be changed to avoid hard-coded "magic" numbers.
Confirming the accuracy of this change: The first ds = xr.open_mfdataset(path, preprocess=lambda x:
preprocess_mpas(x, yearoffset=1850))
ds = remove_repeated_time_index(ds) we get ds.Time[:3]
<xarray.DataArray 'Time' (Time: 3)>
array(['1850-01-01T00:00:00.000000000Z', '1850-01-02T00:00:00.000000000Z',
'1850-01-03T00:00:00.000000000Z'], dtype='datetime64[ns]') as expected. This is visually apparent from the testcase at https://github.com/pwolfram/mpas_xarray via |
LGTM @xylar except for the issue of hard-coding values as noted in the review. After we discuss I can do the merge. |
@pwolfram, thanks for the review.
If you look carefully at the code before my current modifications, the optional arguments However, when My feeling was that the only purpose of I should have documented my thinking more carefully when I made the PR. I wasn't sure whether there would be any concerns about removing those arguments but clearly there is. I still feel strongly that it is better to remove them than keep them because appropriate default values are not clear. |
@xylar, I concur with your reasoning and even if this breaks backward compatibility the pros outweigh the cons in terms of future use / maintenance of the code. |
Thanks @xylar! |
The reference start date for time series data has been corrected.
It is now year=yearoffset+1, month=1, day=1, rather than
year=yearoffset, month=12, day=31. The previous reference date
would lead to the date being the expected start date (e.g.
01/01/1850) only after one day of simulation has elapsed, which
is not the desired behavior.
Also, makes mpas_xarray.py PEP8 compliant.