Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix offset date in mpas_xarray #46

Merged
merged 2 commits into from
Dec 1, 2016

Conversation

xylar
Copy link
Collaborator

@xylar xylar commented Nov 25, 2016

The reference start date for time series data has been corrected.
It is now year=yearoffset+1, month=1, day=1, rather than
year=yearoffset, month=12, day=31. The previous reference date
would lead to the date being the expected start date (e.g.
01/01/1850) only after one day of simulation has elapsed, which
is not the desired behavior.

Also, makes mpas_xarray.py PEP8 compliant.

The reference start date for time series data has been corrected.
It is now year=yearoffset+1, month=1, day=1, rather than
year=yearoffset, month=12, day=31.  The previous reference date
would lead to the date being the expected start date (e.g.
01/01/1850) only after one day of simulation has elapsed, which
is not the desired behavior.
@xylar
Copy link
Collaborator Author

xylar commented Nov 25, 2016

I decided to fix this bug before addressing #45 or #33 because this change is non-bit-for-bit. All time-series plots are shifted by one day as a result of this commit. I believe (strongly) that the fixed dates are correct and the original dates were not.

Here is an example plot from the current develop branch:
iceareacellnh_20160805 a_wcycl1850 ne30_oec edison alpha7_00_b1850c5_ne30_v0 4
and here is the same from this branch:
iceareacellnh_20160805 a_wcycl1850 ne30_oec edison alpha7_00_b1850c5_ne30_v0 4
The difference is small but noticeable if you flicker between the two images.

@xylar
Copy link
Collaborator Author

xylar commented Nov 25, 2016

@milenaveneziani, I would like you to be the main reviewer of this PR because it is not bit-for-bit. You will need to decide if that's okay and if you agree that the fix is correct.

@pwolfram, obviously I also want your input on whether the fix is correct but more importantly I would like your blessing on the changes I made to make mpas_xarray.py PEP8 compliant.

@xylar
Copy link
Collaborator Author

xylar commented Nov 25, 2016

Testing

I ran the full mpas analysis on Edison using both alpha7 and alpha8 ACME results. I compared several images visually against baseline results from develop. I verified that all climatology results are bit-for-bit identical with develop but that time-series plots are not.

Full results are available here:
http://portal.nersc.gov/project/acme/xylar/fix_timeseries_offset_date_results/

@milenaveneziani
Copy link
Collaborator

@xylar: I guess I am fine either way. The reason why I chose the offset to be Dec 31 is that I wanted model times to start from 00hours of Jan 1st, instead of Jan 2nd.
Also let us keep in mind that this offset thing was supposed to be a temporary solution, in the hope that python gets around this idea that times have to start from the 1600s.. So eventually we will hopefully find a way to start from year 1 for time-slice experiments.

@xylar
Copy link
Collaborator Author

xylar commented Nov 28, 2016

Hi @milenaveneziani, welcome back from the long weekend!

I guess I am fine either way. The reason why I chose the offset to be Dec 31 is that I wanted model times to start from 00hours of Jan 1st, instead of Jan 2nd.

I believe with the code as you had written it, when daysSinceStartOfSim = 0, the date is 00 hours on the last day of the previous year, not the first day of the first year. That is, daysSinceStartOfSim doesn't start out being 1 on day 1, but rather 0. Perhaps the fact that the date would be written as 01-01- is causing some confusion, but that doesn't relate to daysSicneStartOfSim.

In any case, as we move to using xtime_start and xtime_end, this bit of confusion should go away.

Also let us keep in mind that this offset thing was supposed to be a temporary solution, in the hope that python gets around this idea that times have to start from the 1600s.. So eventually we will hopefully find a way to start from year 1 for time-slice experiments.

So far, I don't see any sign that this is being resolved in pandas (and therefore in xarray) so I think we are likely stuck with the hack of adding an offset year for the time being.

@milenaveneziani
Copy link
Collaborator

Thanks @xylar.
I see, I thought if daysSinceStart = 0, it would start from 00hr of Jan 1.
ok, then, let us change it. I am perfectly happy with the testing you did. I think either I or @pwolfram can merge this, once you get the blessing from Phillip.

@xylar
Copy link
Collaborator Author

xylar commented Nov 28, 2016

Thanks, @milenaveneziani. In that case, why don't I assign @pwolfram to merge this. That way he can also merge #47, potentially at the same time. That will still leave #48 for you to have a look at when you get a chance.

@milenaveneziani
Copy link
Collaborator

sounds good.

@pwolfram
Copy link
Contributor

@milenaveneziani, unfortunately I think that

The reason why I chose the offset to be Dec 31 is that I wanted model times to start from 00hours of Jan 1st, instead of Jan 2nd.
Also let us keep in mind that this offset thing was supposed to be a temporary solution, in the hope that python gets around this idea that times have to start from the 1600s.. So eventually we will hopefully find a way to start from year 1 for time-slice experiments.

may not be addressed soon because of technical issues in xarray and extending all the way to numpy (pydata/xarray#789). We chatted about this briefly at the AOSPY workshop and the overall consensus was this is a fairly complicated problem that requires community consensus on a design strategy as well as some dedicated time for refractoring and implementation. In short, it is probably not reasonable to expect it to be fixed anytime soon unless we were to somehow figure out how to fund a solution.

Hence, we should make our solution as robust as possible given this constraint.

Given this understanding, @milenaveneziani and @xylar, are you both still ok with me going forward on the review / merge?

@xylar
Copy link
Collaborator Author

xylar commented Nov 29, 2016

@pwolfram, yes, nothing has changed since my previous comment. Go ahead with the review/merge.

@xylar
Copy link
Collaborator Author

xylar commented Nov 29, 2016

@milenaveneziani also said, "sounds good", which I took to mean you could review/merge.

Copy link
Contributor

@pwolfram pwolfram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only real issue I have with this PR is the hard coding of values away from default function values because removes generality and may break other codes that use the mpas_xarray wrapper.

datetimes = [datetime.datetime(yearoffset + int(x[:4]), int(x[5:7]),
int(x[8:10]), int(x[11:13]),
int(x[14:16]), int(x[17:19]))
for x in time]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

ds = xr.open_mfdataset(path, preprocess=lambda x: preprocess_mpas(x,
timeSeriesStats=True, timestr='timeSeriesStatsMonthly_avg_daysSinceStartOfSim_1'))
def test_load_mpas_xarray_timeSeriesStats_datasets(path): # {{{
timestr = 'timeSeriesStatsMonthly_avg_daysSinceStartOfSim_1'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be an optional argument to the function above, which may be clearer and provide additional functionality.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that change fit in the scope of this PR. I'm just trying to make the line PEP8 compliant, not add any new functionality. I pulled out timestr simply because the variable name is so long that there's no feasible way of keeping the line under 80 characters without assigning it to a variable.

If you want to upgrade the testing of mpas_xarray, that would be a fine thing to do in another PR.

ds = xr.open_mfdataset(path, preprocess=lambda x:
preprocess_mpas(x,
timeSeriesStats=True,
timestr=timestr))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes the code a lot clearer, thanks for the cleanup.

ds = remove_repeated_time_index(ds)
ds2 = xr.open_mfdataset(path, preprocess=lambda x: preprocess_mpas(x, yearoffset=1850))
ds2 = xr.open_mfdataset(path, preprocess=lambda x:
preprocess_mpas(x, yearoffset=1850))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the choice for separation of the lambda argument from its name? I think it looks clearer the previous way but was this change because it overflowed the column limitation, requiring indentation? If so, this change makes sense.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not find a cleaner way to obey the indentation rules and still stay within the 80-character limit per line. It might actually be cleaner to explicitly define a function rather than using lambda. But I'm up for suggestions. This comes up as well when trying to make the analysis scripts PEP8 compliant.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, this is certainly a gray-area and the code is clear and aesthetically pleasing so we go for it and just keep and eye out for better ways to do this in the future (if they exist).

timeSeriesStats=False, timestr=None,
yearoffset=1849, monthoffset=12, dayoffset=31): #{{{
timeSeriesStats=False, timestr=None,
yearoffset=1849): # {{{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are loosing an option here in exchange of hard-coded values. Why aren't we just setting the defaults to new values? I think this is probably better long term by keeping the code more general.


# compute shifted datetimes
daysSinceStart = ds[timestr]
datetimes = [datetime.datetime(yearoffset, monthoffset, dayoffset) + x
datetimes = [datetime.datetime(yearoffset+1, 1, 1) + x
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see comment above about hard coding values. I think we only want the default to be changed to avoid hard-coded "magic" numbers.

@pwolfram
Copy link
Contributor

Confirming the accuracy of this change:

The first xtime from ncdump -v xtime globalStats.0000-01-01_00.00.00.nc is 0000-01-01_00:00:00 and following

ds = xr.open_mfdataset(path, preprocess=lambda x:
                       preprocess_mpas(x, yearoffset=1850))
ds = remove_repeated_time_index(ds)                              

we get

ds.Time[:3]
<xarray.DataArray 'Time' (Time: 3)>
array(['1850-01-01T00:00:00.000000000Z', '1850-01-02T00:00:00.000000000Z',
       '1850-01-03T00:00:00.000000000Z'], dtype='datetime64[ns]')

as expected.

This is visually apparent from the testcase at https://github.com/pwolfram/mpas_xarray via python mpas_xarray/mpas_xarray.py -f "globalStats*nc":

screenshot 2016-11-30 15 35 08

screenshot 2016-11-30 15 40 25

@pwolfram
Copy link
Contributor

LGTM @xylar except for the issue of hard-coding values as noted in the review. After we discuss I can do the merge.

@xylar
Copy link
Collaborator Author

xylar commented Dec 1, 2016

@pwolfram, thanks for the review.

We are losing an option here in exchange of hard-coded values. Why aren't we just setting the defaults to new values? I think this is probably better long term by keeping the code more general.

If you look carefully at the code before my current modifications, the optional arguments monthOffset and dayOffset are only used when timeSeriesStats == True, though this is not documented in the docstring. When timeSeriesStats == False, only yearOffset is used and the monthOffset and dayOffset are implicitly assumed to be 0. This is because we are reading dates and adding a time offset to them. In Python jargon, we effectively read a datetime and add a timedelta built from yearOffset.

However, when timeSeriesStats == True, we read some variant on daysSinceStartOfSim and use it to effectively construct a timedelta. This means we need to construct a datetime from yearOffset, and that in turn means that the correct month and day are each 1, not zero. It would certainly be possible to add one to monthOffset and dayOffset to construct this datetime, but that would not be very intuitive for a user.

My feeling was that the only purpose of yearOffset and the only use case I can conceive of for it is when we use it to put a date into the acceptable range for a numpy datetime64[ns]. Thus, my feeling was that monthOffset and dayOffset were both needlessly confusing and not useful.

I should have documented my thinking more carefully when I made the PR. I wasn't sure whether there would be any concerns about removing those arguments but clearly there is.

I still feel strongly that it is better to remove them than keep them because appropriate default values are not clear.

@pwolfram
Copy link
Contributor

pwolfram commented Dec 1, 2016

@xylar, I concur with your reasoning and even if this breaks backward compatibility the pros outweigh the cons in terms of future use / maintenance of the code.

@pwolfram pwolfram merged commit a9c655b into MPAS-Dev:develop Dec 1, 2016
@pwolfram
Copy link
Contributor

pwolfram commented Dec 1, 2016

Thanks @xylar!

@xylar xylar deleted the fix_timeseries_offset_date branch December 1, 2016 16:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants