-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Map variables and streams from MPAS dycore names to MPAS-Analysis names #52
Conversation
@milenaveneziani and @pwolfram, this is a work-in-progress PR. I just wanted to give you a chance to follow the progress as I go. This way, you can hopefully raise concerns about the approach I'm using sooner rather than later. Given that it's a WIP, there are a lot of small commits that will later need to be squashed. It is also, of course, subject to change, and will need to be rebased after various previous PRs (#47, #48, #51 and #52) are fulfilled. |
This PR is intended to address #20. |
Testing (so far)I have successfully run the OHC analysis. It differed from the case that I thought was the correct baseline but it will be a lot easier to be sure what the baseline should be once the 4 PRs that come before this one are merged. This PR should not be answer-changing, so this needs to be verified against the appropriate version of |
dfe4d46
to
8a4a26e
Compare
4e88177
to
32be3ea
Compare
TestingI verified that this branch works on ACME alpha7, alpha8 and beta0 output on Edison. The specific config files I used are here: I also verified that the results are bit-for-bit identical to those produced with commit 52b961b, which I used as my baseline. |
32be3ea
to
129f9bd
Compare
@xylar, could you please do a rebase so that this is easier to review on github when you get the chance? Thanks! |
129f9bd
to
8b339c4
Compare
Thanks @xylar for the rebase! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work @xylar. I had a few conceptual comments in the review but by and large I think this is exactly what we want and need.
""" | ||
|
||
field = field.lower() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this always a good idea? Is it possible that we may have case-sensitive names that have non-unique character strings? I think the odds of this happening are very low and we primarily use camel case for clarity but just wanted to double-check this as a group.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment regarding needing field to be lower case would be helpful here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, a bit of mission creep. I didn't decide that this was the way, I just moved this to the top because every time field
was accessed it was as field.lower()
. I agree, though, that it should just always be called with a lowercase variable name if that's what we want.
I'll take this out, though, because I agree that it doesn't really make sense to me.
@vanroekel or @milenaveneziani, can you explain why it was important that the field name be all lowercase?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pwolfram, a side note. I think you're reviewing the whole code because I've made a lot of little changes as part of the PEP8 compliance. But a lot of these things aren't related to this PR so let's be careful that we focus on the job at hand and not the entire code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed @xylar. I'm wondering if we shouldn't keep all PEP8 changes in their own PRs for simplicity in the future. It is pretty hard to review the code with substantive changes as well as PEP8 changes because it is not always clear what changes correspond to what.
Note, I think we could merge the existing code without serious ramifications but wanted to make these different points, not so much to indicate that they need changed, but so that we are aware of these issues moving forward to help avoid future problems. Some of the error handling changes, however, probably are worth making sooner than later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we shouldn't keep all PEP8 changes in their own PRs for simplicity in the future. It is pretty hard to review the code with substantive changes as well as PEP8 changes because it is not always clear what changes correspond to what.
I'm sympathetic to this. It's hard for me to work on code effectively at this point that isn't PEP8 compliant because spyder gives me a ton of warnings that I usually use to find bugs in my code. At the same time, I don't really have time to go through the repo and make everything PEP8 compliant first, as a separate PR. So I fix things up as I go along.
A compromise approach for the future might be for me to do a separate commit in a given PR for the PEP8 changes and for the substantive changes. I usually try to do it that way but I combined the two in this PR and I'm sorry for doing that.
I'm very concerned that too many PRs will make this work untenable. As it is, it is taking a surprisingly long time in general to get PRs reviewed in this repository. (I generally try to do reviews within a day or two of receiving them, and I leave them unreviewed only if there are significant changes for the submitter to make before I can proceed.) So I am also concerned that breaking PRs into more PRs will not make things move any faster.
|
||
If present, variableMap is a dictionary of MPAS-O variable names that map | ||
to their mpas_analysis counterparts. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have a description of config
and field
in the doc string? It will be helpful for people who aren't as familiar with the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit of mission creep but I'll do this.
streamName = streams.find_stream(streamMap['timeSeriesStats']) | ||
infiles = streams.readpath(streamName, startDate=startDate, | ||
endDate=endDate) | ||
print 'Reading files {} through {}'.format(infiles[0], infiles[-1]) | ||
|
||
plots_dir = config.get('paths', 'plots_dir') | ||
obsdir = config.get('paths', 'obs_' + field.lower() + 'dir') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
field.lower()
is redundant following above field=field.lower()
.
@@ -57,55 +67,46 @@ def ocn_modelvsobs(config, field): | |||
outputTimes = config.getExpression(field.lower() + '_modelvsobs', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
field.lower()
-> field
selvals={'nVertLevels':1})) | ||
ds = remove_repeated_time_index(ds) | ||
ds.rename({'time_avg_activeTracers_temperature':'mpasData'}, inplace = True) | ||
selvals = {'nVertLevels': 1} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this correct? Python is 0-indexed so I just want to make sure the 1 is supposed to be for the 2nd to the top layer, not the top layer with 0 index. I guess I'm confused here and a discussion / clarification would be helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, here is an example from a MPAS file I easily had on hand:
In [1]: import xarray as xr
In [2]: ds = xr.open_dataset('/Users/pwolfram/Downloads/last_step.nc')
In [3]: ds.nVertLevels
Out[3]:
<xarray.DataArray 'nVertLevels' (nVertLevels: 100)>
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,
68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,
85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99])
Coordinates:
* nVertLevels (nVertLevels) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is a bug we may need a separate PR for it. It is possible we have similar errors in other places in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@milenaveneziani or @vanroekel, could you comment on this. I would agree that it's likely a bug and we're looking at the temperature in layer 1 instead of layer 0. But this needs to be reported as an issue and addressed with a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I don't think we should change things here but if this is a bug it needs to be reported and fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this is a bug on my part. I'll issue the report and make a PR.
""" | ||
possible_variables = variable_map[variable_name] | ||
for var in possible_variables: | ||
if isinstance(var, (list, tuple)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm glad you are supporting lists here for greater generality. That is really nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is necessary because we need ['xtime_start', 'xtime_end']
as a possible mapping of 'Time'
, for example.
hour=0, minute=0, second=0) | ||
maxDate = datetime.datetime(year=2262, month=1, day=1, | ||
hour=0, minute=0, second=0) | ||
outDate = max(minDate, min(maxDate, outDate)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking we should issue a warning if we are on the clipping boundary so the unawares user will know there may be a potential problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is part of #48 and you should be reviewing this code there. I'm happy to respond if you make your suggestion there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying, I've done so.
timedelta2 = datetime.timedelta(days=20) | ||
self.assertEqual(timedelta1, timedelta2) | ||
|
||
date = Date(dateString='0001-01-01', isInterval=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would put a comment above this line reminding the reader that the date is specified based on the quasi-arbitrary xarray datetime boundary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, this is part of #48. Please comment there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed and done.
datetime2 = datetime.datetime(year=1850, month=1, day=1) | ||
self.assertEqual(datetime1, datetime2) | ||
|
||
date = Date(dateString='9999-01-01', isInterval=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This tests the clipping capability, right? I think we should note this here and if there is a warning thrown we should make sure it is what we expect to get.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, part of #48. Please move the comment there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
onlyvars=varList, | ||
yearoffset=1850, | ||
variable_map=varMap)) | ||
self.assertEqual(sorted(ds.data_vars.keys()), sorted(varList)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd put a comment here that the preprocess function is essentially remapping variables within the dataset, otherwise the assert statement could be a little confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I'll do that. In general, I think we should all be doing a better job of commenting our tests so the intention is clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree and please be on the look out for this in my code!
I broke this branch. Working on it... |
The to_datetime method can be used (optionally with an offset year) to convert a Date to a datetime. The date is clamped to a valid range supported by numpy's datetime64[ns], used by xarray and pandas. This method will be useful for computing datetime bounds on time-series data sets. The to_timedelta method converts a Date object with isInterval==True to a timedelta object.
This merge updates the 3 time-series analysis scripts to make use of the time series start and end dates, computed via timeseries_yr1 and timeseries_yr2, allowing analysis of a subset of the output data.
has_stream checks if a given stream is present in a streams file. The ability to check for a given string is useful in determining which stream name is appropriate for a given MPAS-O version. find_stream uses has_stream to find which stream in a list of possibilities is present in the file. Also, a few changes have been made to make the module namelist_streams_interface.py PEP8 compliant.
The new function map_variable(...) finds the MPAS dycore variable in a given dataset that corresponds to a given mpas_analysis variable name. The new function rename_variables(...) renames all vriables in a given dataset from MPAS dycore names to mpas_analysis names if they are found in the variable map. An argument for a variable_map has ben added to preprocess_mpas(...). This map, if present, is used to fine the appropriate time variable(s) in the dataset and to rename variables (other than the time variable(s)) to their mpas_analysis names.
This merge adds two dictionaries that are used to map variable and stream names from various MPAS-O versions to the corresponding values in MPAS-Analysis.
Stream and variable maps for use in the ocean analysis are now imported as part of initializing the analysis. The maps are passed on to the OHC time-series analysis function. The OHC time-series analysis has been updated to used the maps to find the appropriate time-series stream and to map variable names to more simplified names that will become the MPAS-Analysis standard names. These more simplified names are then used in the remainder of the analysis. Also, ohc_timeseries.py has been made PEP8 compliant.
Also, make SST analysis script PEP8 compliant.
Also, make sea-ice scripts PEP8 compliant.
267637e
to
9b9434e
Compare
This branch has been rebased yet again following #57 |
CI seems to have disappeared or gotten stuck. A manual test passed without a problem. |
CI seems to be hanging on cloning the repo. I am also having a lot of trouble updating to/from GitHub, so there must be an issue on their side. Hopefully it'll work okay again soon... |
It has passed now. @xylar, assuming @milenaveneziani's testing is good I think we getting close to the point where we can merge this PR. I would say that any issues we have left are minor, would you concur? |
Yes, @milenaveneziani please merge as soon as the other tests have passed and you are comfortable with the approach, but only after you merge #48 |
@milenaveneziani, could you report this as an issue? I will attempt to address it as soon as I can. |
This merge adds dictionaries that can be used to map between the names of variables and streams used internally in MPAS-Analysis to the associated names used in various versions of MPAS dycores. This helps with cross-compatibility with various MPAS and ACME versions.
mpas_xarray has been modified to include the ability to find the MPAS dycore name of a variable in a data set given the MPAS-Analysis name and the associated map. mpas_xarray can also now rename variables in a dataset from their MPAS dycore names to those used by MPAS-Analysis using the same map.
The streams file reader has been augmented to include a function that informs the user if a given stream is present, allowing analysis scripts to find the correct name of a given stream using the streams mapping dictionary.