-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix time-series scripts use time bounds properly #48
Fix time-series scripts use time bounds properly #48
Conversation
TestingI verified that the results of the 3 time-series analysis scripts are bit-for-bit identical to those from #47 for both ACME alpha7 and alpha8 output. I didn't test the model vs. observations scripts, since they are unchanged. I ran both alpha7 and alpha8 tests with just years 1-3 and an alpha8 test with years 3-5, see results in the following directories: |
17db7a9
to
7cf12f9
Compare
I just added additional tests to make sure the |
@xylar, @pwolfram: just wanted to let you know that, while testing the ACME script on rhea, I hit the xarray mfdatasets error: 'too many open files'... this was while trying to open 100 years of monthly files. |
That's only true if you choose to plot a time series from a subset of files.
I took a look a the issue on the xarray GitHub site (pydata/xarray#463) and it doesn't look like there's been any activity on this issue since June. We may need to find our own workaround such as concatenating files together (see this recommended tool: https://github.com/NCAR/PyReshaper) |
@milenaveneziani and @xylar, note that Kevin Paul (@kmpaul) of https://github.com/NCAR/PyReshaper was at the AOSPY workshop and it would be great to be able to work with him because it is likely that we will need PyReshaper, if not now, at some point. I'm also following up with the xarray folks (namely @shoyer and @mrocklin) to get some advice on the best way to proceed. See also pydata/xarray#444 |
It sounds like the time to live with the open file limitation of xarray has passed and we should champion a solution. Do you both agree @milenaveneziani and @xylar? |
Yes, I agree @pwolfram: I ran into that problem with the full alpha6 run, and future runs are likely to span a period longer than 100 years. Plus, it is a machine-memory-dependent problem so perhaps on smaller machines the cut is even on a smaller number of files. |
@milenaveneziani, @xylar, @pwolfram: I'm happy to help you with the PyReshaper, if you need it. I'm hoping to have a new release before 2017, incidentally, but that won't prevent it from helping you now. |
Thanks @kmpaul for following up with us. Ideally we would fix this issue upstream in xarray / dask but the best path forward is not yet clear to me yet and I need to look into this further. |
One could argue that this needs to get fixed upstream of xarray / dask,
too. Like in the models themselves...or anywhere the data is generated.
…On Tue, Nov 29, 2016 at 12:29 PM, Phillip Wolfram ***@***.***> wrote:
Thanks @kmpaul <https://github.com/kmpaul> for following up with us.
Ideally we would fix this issue upstream in xarray / dask but the best path
forward is not yet clear to me yet and I need to look into this further.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#48 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AK4fg1GMWT18FvsKr8HVPTAsSDbFamKfks5rDH0CgaJpZM4K93XA>
.
--
*Kevin Paul, PhD*
Project Scientist, Head of I/O & Workflow Applications (IOWA)
The National Center for Atmospheric Research
Computational and Information Systems Laboratory
1850 Table Mesa Dr
Boulder, CO 80305
Phone: (303) 497-2441
Office: ML460B
|
7cf12f9
to
e1b0d64
Compare
@milenaveneziani, I think this PR should be independent of #50 and #52. I tested it myself before making those changes. So I think this PR should be tested independently of those two. I agree to some extent that we will test this PR more rigorously as we test those other PRs but I do not think it will be helpful to us to try to test all three at once. Instead, I think it is important to make sure this PR is robust without those other changes, at least to the extent we can. |
@milenaveneziani wrote:
I don't think were at the point of worrying about ACME just yet. Once #52 has been completed, tested and merged, we can think about testing develop in |
@xylar: I guess what I meant is that, once we merge #47, we won't be able to test on alpha7 or alpha8 because #47 is not backward compatible as is (it will be once #50 is merged as well). And we can't test using alpha9 data either because we are missing #20. |
@milenaveneziani, I'm not quite sure why this would be the case. This PR is intended to be compatible with both alpha7 and alpha8. What aspect are you concerned is not backward compatible? It supports both time variables of type |
@xylar: you are right. I recalled that xtime_start and xtime_end are indeed supported in alpha, so we can test #47 and this PR on alpha output. |
Okay, sounds good. I think we're on the same page. |
3e166dc
to
3c52f04
Compare
3c52f04
to
77d2e3a
Compare
@xylar: this needs to be rebased too before testing, right? |
@milenaveneziani, I rebased it already, but the answer in general is, no, branches don't need to be rebased for testing. It's just convenient on GitHub itself that the commits that have already been merged disappear. To test, you would want to do a test merge into your local checkout of develop and that would work whether I rebased or not. Anyway, I did rebase already this morning, so you're good to go. |
hour=0, minute=0, second=0) | ||
maxDate = datetime.datetime(year=2262, month=1, day=1, | ||
hour=0, minute=0, second=0) | ||
outDate = max(minDate, min(maxDate, outDate)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking we should issue a warning if we are on the clipping boundary so the unawares user will know there may be a potential problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree that this should be a warning. Typically want the end year of the time series analysis to be 9999
so we don't have to think about it. This is the function that makes sure that 9999 + 1849
still gives you a reasonable year that the analysis can handle. If you didn't want the clamping and the year offset, you wouldn't bother calling this function.
If I add a warning, it's going to be annoying that there are warnings throughout our output even though I'm intentionally using this function to clamp the date. The whole point of clamping the date here is so I don't have to do it elsewhere and so I can have better code reuse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I see what you mean. We really aren't at a point where our simulations will go past 2262 anyway so I'm ok leaving this as-is for now. However, I'm not sure if we can necessarily plan on this for the future (in general) because I know Jeremy runs multi-century simulations. But I agree this is probably sufficient for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pwolfram, we don't support going past 100 years right now (because of limited numbers of files) so I think 2262 is the least of our concerns...
But also fixing Date
with a new min/max allowed date will be super easy once xarray and pandas support a wider range. For now, we're at their mercy and this function is really only a side effect of that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Thanks for the reminder about the file limitation issue. I need to take a closer look at that once I get a chance.
timedelta2 = datetime.timedelta(days=20) | ||
self.assertEqual(timedelta1, timedelta2) | ||
|
||
date = Date(dateString='0001-01-01', isInterval=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would put a comment above this line reminding the reader that the date is specified based on the quasi-arbitrary xarray datetime boundary.
datetime2 = datetime.datetime(year=1850, month=1, day=1) | ||
self.assertEqual(datetime1, datetime2) | ||
|
||
date = Date(dateString='9999-01-01', isInterval=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This tests the clipping capability, right? I think we should note this here and if there is a warning thrown we should make sure it is what we expect to get.
The to_datetime method can be used (optionally with an offset year) to convert a Date to a datetime. The date is clamped to a valid range supported by numpy's datetime64[ns], used by xarray and pandas. This method will be useful for computing datetime bounds on time-series data sets. The to_timedelta method converts a Date object with isInterval==True to a timedelta object.
This merge updates the 3 time-series analysis scripts to make use of the time series start and end dates, computed via timeseries_yr1 and timeseries_yr2, allowing analysis of a subset of the output data.
419df44
to
9f2a10c
Compare
This branch has been rebased yet again following #57 |
@milenaveneziani, please review and merge at your earliest convenience. |
@xylar, @pwolfram: I tested this on alpha7 and alpha8, changing yr1 and yr2 in various ways (starting from 1 or not, ending with 9999 or other number). I always get the plots I'd expect. One thing I noticed is that, when starting from a year different from 1, the plots x-axis still starts from year 1: we will probably want to change this in the future (for example, when splitting the time series plots for sea-ice, when the time series are too long for xarray to handle or for seasonal cycles to be visible). |
@milenaveneziani, could you report the incorrect x axis as an issue? I'll fix it as soon as I have some time. |
This merge updates the 3 time-series analysis scripts to make use
of the time series start and end dates, computed via
timeseries_yr1
and
timeseries_yr2
, allowing analysis of a subset of the outputdata.
To accomplish this, this merge adds the method
to_datetime
tothe
Date
class. This method can be used (optionally with an offsetyear) to convert a Date to a datetime object. The date is
clamped to the valid range supported by numpy's datetime64[ns],
used by xarray and pandas.