Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Map variables and streams from MPAS dycore names to MPAS-Analysis names #52

Merged
merged 12 commits into from
Dec 12, 2016

Conversation

xylar
Copy link
Collaborator

@xylar xylar commented Dec 4, 2016

This merge adds dictionaries that can be used to map between the names of variables and streams used internally in MPAS-Analysis to the associated names used in various versions of MPAS dycores. This helps with cross-compatibility with various MPAS and ACME versions.

mpas_xarray has been modified to include the ability to find the MPAS dycore name of a variable in a data set given the MPAS-Analysis name and the associated map. mpas_xarray can also now rename variables in a dataset from their MPAS dycore names to those used by MPAS-Analysis using the same map.

The streams file reader has been augmented to include a function that informs the user if a given stream is present, allowing analysis scripts to find the correct name of a given stream using the streams mapping dictionary.

@xylar
Copy link
Collaborator Author

xylar commented Dec 4, 2016

@milenaveneziani and @pwolfram, this is a work-in-progress PR. I just wanted to give you a chance to follow the progress as I go. This way, you can hopefully raise concerns about the approach I'm using sooner rather than later.

Given that it's a WIP, there are a lot of small commits that will later need to be squashed. It is also, of course, subject to change, and will need to be rebased after various previous PRs (#47, #48, #51 and #52) are fulfilled.

@xylar
Copy link
Collaborator Author

xylar commented Dec 4, 2016

This PR is intended to address #20.

@xylar
Copy link
Collaborator Author

xylar commented Dec 4, 2016

Testing (so far)

I have successfully run the OHC analysis. It differed from the case that I thought was the correct baseline but it will be a lot easier to be sure what the baseline should be once the 4 PRs that come before this one are merged. This PR should not be answer-changing, so this needs to be verified against the appropriate version of develop before this PR gets merged.

@xylar xylar force-pushed the map_variable_names branch 2 times, most recently from dfe4d46 to 8a4a26e Compare December 5, 2016 13:58
@xylar xylar force-pushed the map_variable_names branch 5 times, most recently from 4e88177 to 32be3ea Compare December 7, 2016 20:28
@xylar
Copy link
Collaborator Author

xylar commented Dec 7, 2016

Testing

I verified that this branch works on ACME alpha7, alpha8 and beta0 output on Edison. The specific config files I used are here:
https://gist.github.com/xylar/2741cd76c7569ec8ab24edcff4a9cf6f

I also verified that the results are bit-for-bit identical to those produced with commit 52b961b, which I used as my baseline.

@xylar xylar force-pushed the map_variable_names branch from 32be3ea to 129f9bd Compare December 7, 2016 23:28
@xylar xylar removed the in progress label Dec 7, 2016
@pwolfram
Copy link
Contributor

pwolfram commented Dec 8, 2016

@xylar, could you please do a rebase so that this is easier to review on github when you get the chance? Thanks!

@xylar xylar force-pushed the map_variable_names branch from 129f9bd to 8b339c4 Compare December 8, 2016 06:55
@pwolfram
Copy link
Contributor

pwolfram commented Dec 8, 2016

Thanks @xylar for the rebase!

Copy link
Contributor

@pwolfram pwolfram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @xylar. I had a few conceptual comments in the review but by and large I think this is exactly what we want and need.

"""

field = field.lower()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this always a good idea? Is it possible that we may have case-sensitive names that have non-unique character strings? I think the odds of this happening are very low and we primarily use camel case for clarity but just wanted to double-check this as a group.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment regarding needing field to be lower case would be helpful here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, a bit of mission creep. I didn't decide that this was the way, I just moved this to the top because every time field was accessed it was as field.lower(). I agree, though, that it should just always be called with a lowercase variable name if that's what we want.

I'll take this out, though, because I agree that it doesn't really make sense to me.

@vanroekel or @milenaveneziani, can you explain why it was important that the field name be all lowercase?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pwolfram, a side note. I think you're reviewing the whole code because I've made a lot of little changes as part of the PEP8 compliance. But a lot of these things aren't related to this PR so let's be careful that we focus on the job at hand and not the entire code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed @xylar. I'm wondering if we shouldn't keep all PEP8 changes in their own PRs for simplicity in the future. It is pretty hard to review the code with substantive changes as well as PEP8 changes because it is not always clear what changes correspond to what.

Note, I think we could merge the existing code without serious ramifications but wanted to make these different points, not so much to indicate that they need changed, but so that we are aware of these issues moving forward to help avoid future problems. Some of the error handling changes, however, probably are worth making sooner than later.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xylar and @pwolfram I think I had put in the field.lower. I don't exactly remember why I did this originally, probably unnecessary redundancy in case the call in run_analysis.py was changed. I think it can be removed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we shouldn't keep all PEP8 changes in their own PRs for simplicity in the future. It is pretty hard to review the code with substantive changes as well as PEP8 changes because it is not always clear what changes correspond to what.

I'm sympathetic to this. It's hard for me to work on code effectively at this point that isn't PEP8 compliant because spyder gives me a ton of warnings that I usually use to find bugs in my code. At the same time, I don't really have time to go through the repo and make everything PEP8 compliant first, as a separate PR. So I fix things up as I go along.

A compromise approach for the future might be for me to do a separate commit in a given PR for the PEP8 changes and for the substantive changes. I usually try to do it that way but I combined the two in this PR and I'm sorry for doing that.

I'm very concerned that too many PRs will make this work untenable. As it is, it is taking a surprisingly long time in general to get PRs reviewed in this repository. (I generally try to do reviews within a day or two of receiving them, and I leave them unreviewed only if there are significant changes for the submitter to make before I can proceed.) So I am also concerned that breaking PRs into more PRs will not make things move any faster.


If present, variableMap is a dictionary of MPAS-O variable names that map
to their mpas_analysis counterparts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have a description of config and field in the doc string? It will be helpful for people who aren't as familiar with the code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit of mission creep but I'll do this.

streamName = streams.find_stream(streamMap['timeSeriesStats'])
infiles = streams.readpath(streamName, startDate=startDate,
endDate=endDate)
print 'Reading files {} through {}'.format(infiles[0], infiles[-1])

plots_dir = config.get('paths', 'plots_dir')
obsdir = config.get('paths', 'obs_' + field.lower() + 'dir')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

field.lower() is redundant following above field=field.lower().

@@ -57,55 +67,46 @@ def ocn_modelvsobs(config, field):
outputTimes = config.getExpression(field.lower() + '_modelvsobs',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

field.lower() -> field

selvals={'nVertLevels':1}))
ds = remove_repeated_time_index(ds)
ds.rename({'time_avg_activeTracers_temperature':'mpasData'}, inplace = True)
selvals = {'nVertLevels': 1}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct? Python is 0-indexed so I just want to make sure the 1 is supposed to be for the 2nd to the top layer, not the top layer with 0 index. I guess I'm confused here and a discussion / clarification would be helpful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, here is an example from a MPAS file I easily had on hand:

In [1]: import xarray as xr

In [2]: ds = xr.open_dataset('/Users/pwolfram/Downloads/last_step.nc')

In [3]: ds.nVertLevels
Out[3]: 
<xarray.DataArray 'nVertLevels' (nVertLevels: 100)>
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
       34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
       51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,
       68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,
       85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99])
Coordinates:
  * nVertLevels  (nVertLevels) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is a bug we may need a separate PR for it. It is possible we have similar errors in other places in the code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@milenaveneziani or @vanroekel, could you comment on this. I would agree that it's likely a bug and we're looking at the temperature in layer 1 instead of layer 0. But this needs to be reported as an issue and addressed with a separate PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I don't think we should change things here but if this is a bug it needs to be reported and fixed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is a bug on my part. I'll issue the report and make a PR.

"""
possible_variables = variable_map[variable_name]
for var in possible_variables:
if isinstance(var, (list, tuple)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm glad you are supporting lists here for greater generality. That is really nice.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is necessary because we need ['xtime_start', 'xtime_end'] as a possible mapping of 'Time', for example.

hour=0, minute=0, second=0)
maxDate = datetime.datetime(year=2262, month=1, day=1,
hour=0, minute=0, second=0)
outDate = max(minDate, min(maxDate, outDate))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking we should issue a warning if we are on the clipping boundary so the unawares user will know there may be a potential problem.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is part of #48 and you should be reviewing this code there. I'm happy to respond if you make your suggestion there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying, I've done so.

timedelta2 = datetime.timedelta(days=20)
self.assertEqual(timedelta1, timedelta2)

date = Date(dateString='0001-01-01', isInterval=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put a comment above this line reminding the reader that the date is specified based on the quasi-arbitrary xarray datetime boundary.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, this is part of #48. Please comment there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed and done.

datetime2 = datetime.datetime(year=1850, month=1, day=1)
self.assertEqual(datetime1, datetime2)

date = Date(dateString='9999-01-01', isInterval=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tests the clipping capability, right? I think we should note this here and if there is a warning thrown we should make sure it is what we expect to get.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, part of #48. Please move the comment there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

onlyvars=varList,
yearoffset=1850,
variable_map=varMap))
self.assertEqual(sorted(ds.data_vars.keys()), sorted(varList))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd put a comment here that the preprocess function is essentially remapping variables within the dataset, otherwise the assert statement could be a little confusing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I'll do that. In general, I think we should all be doing a better job of commenting our tests so the intention is clearer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree and please be on the look out for this in my code!

@xylar
Copy link
Collaborator Author

xylar commented Dec 8, 2016

I broke this branch. Working on it...

xylar added 2 commits December 8, 2016 23:39
The to_datetime method can be used (optionally with an offset year)
to convert a Date to a datetime.  The date is clamped to a valid
range supported by numpy's datetime64[ns], used by xarray and
pandas.  This method will be useful for computing datetime bounds on
time-series data sets.

The to_timedelta method converts a Date object with
isInterval==True to a timedelta object.
This merge updates the 3 time-series analysis scripts to make use
of the time series start and end dates, computed via timeseries_yr1
and timeseries_yr2, allowing analysis of a subset of the output
data.
xylar added 10 commits December 8, 2016 23:43
has_stream checks if a given stream is present in a streams
file.  The ability to check for a given string is useful in
determining which stream name is appropriate for a given MPAS-O
version.

find_stream uses has_stream to find which stream in a list of
possibilities is present in the file.

Also, a few changes have been made to make the module
namelist_streams_interface.py PEP8 compliant.
The new function map_variable(...) finds the MPAS dycore variable
in a given dataset that corresponds to a given mpas_analysis
variable name.

The new function rename_variables(...) renames all vriables in a
given dataset from MPAS dycore names to mpas_analysis names if they
are found in the variable map.

An argument for a variable_map has ben added to preprocess_mpas(...).
This map, if present, is used to fine the appropriate time
variable(s) in the dataset and to rename variables (other than the
time variable(s)) to their mpas_analysis names.
This merge adds two dictionaries that are used to map variable
and stream names from various MPAS-O versions to the corresponding
values in MPAS-Analysis.
Stream and variable maps for use in the ocean analysis are now
imported as part of initializing the analysis.  The maps are
passed on to the OHC time-series analysis function.

The OHC time-series analysis has been updated to used the maps
to find the appropriate time-series stream and to map variable
names to more simplified names that will become the MPAS-Analysis
standard names.  These more simplified names are then used in
the remainder of the analysis.

Also, ohc_timeseries.py has been made PEP8 compliant.
Also, make SST analysis script PEP8 compliant.
Also, make sea-ice scripts PEP8 compliant.
@xylar xylar force-pushed the map_variable_names branch from 267637e to 9b9434e Compare December 8, 2016 22:46
@xylar
Copy link
Collaborator Author

xylar commented Dec 8, 2016

This branch has been rebased yet again following #57

@xylar
Copy link
Collaborator Author

xylar commented Dec 8, 2016

CI seems to have disappeared or gotten stuck. A manual test passed without a problem.

@xylar
Copy link
Collaborator Author

xylar commented Dec 8, 2016

CI seems to be hanging on cloning the repo. I am also having a lot of trouble updating to/from GitHub, so there must be an issue on their side. Hopefully it'll work okay again soon...

@pwolfram
Copy link
Contributor

pwolfram commented Dec 8, 2016

It has passed now. @xylar, assuming @milenaveneziani's testing is good I think we getting close to the point where we can merge this PR. I would say that any issues we have left are minor, would you concur?

@xylar
Copy link
Collaborator Author

xylar commented Dec 9, 2016

Yes, @milenaveneziani please merge as soon as the other tests have passed and you are comfortable with the approach, but only after you merge #48

@milenaveneziani
Copy link
Collaborator

milenaveneziani commented Dec 12, 2016

I just tested on beta0 and everything worked! Thanks @xylar and @pwolfram.
Only one comment: I noticed that now the time series go from Jan of yr1 to Jan of yr2+1, while before they went through Dec of yr2. Do you understand the reason?

@xylar
Copy link
Collaborator Author

xylar commented Dec 12, 2016

@milenaveneziani, could you report this as an issue? I will attempt to address it as soon as I can.

@xylar xylar deleted the map_variable_names branch December 12, 2016 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants