Map variables and streams from MPAS dycore names to MPAS-Analysis names #52

xylar · 2016-12-04T22:45:28Z

This merge adds dictionaries that can be used to map between the names of variables and streams used internally in MPAS-Analysis to the associated names used in various versions of MPAS dycores. This helps with cross-compatibility with various MPAS and ACME versions.

mpas_xarray has been modified to include the ability to find the MPAS dycore name of a variable in a data set given the MPAS-Analysis name and the associated map. mpas_xarray can also now rename variables in a dataset from their MPAS dycore names to those used by MPAS-Analysis using the same map.

The streams file reader has been augmented to include a function that informs the user if a given stream is present, allowing analysis scripts to find the correct name of a given stream using the streams mapping dictionary.

xylar · 2016-12-04T22:48:40Z

@milenaveneziani and @pwolfram, this is a work-in-progress PR. I just wanted to give you a chance to follow the progress as I go. This way, you can hopefully raise concerns about the approach I'm using sooner rather than later.

Given that it's a WIP, there are a lot of small commits that will later need to be squashed. It is also, of course, subject to change, and will need to be rebased after various previous PRs (#47, #48, #51 and #52) are fulfilled.

xylar · 2016-12-04T22:49:23Z

This PR is intended to address #20.

xylar · 2016-12-04T22:51:27Z

Testing (so far)

I have successfully run the OHC analysis. It differed from the case that I thought was the correct baseline but it will be a lot easier to be sure what the baseline should be once the 4 PRs that come before this one are merged. This PR should not be answer-changing, so this needs to be verified against the appropriate version of develop before this PR gets merged.

xylar · 2016-12-07T21:10:43Z

Testing

I verified that this branch works on ACME alpha7, alpha8 and beta0 output on Edison. The specific config files I used are here:
https://gist.github.com/xylar/2741cd76c7569ec8ab24edcff4a9cf6f

I also verified that the results are bit-for-bit identical to those produced with commit 52b961b, which I used as my baseline.

pwolfram · 2016-12-08T05:37:30Z

@xylar, could you please do a rebase so that this is easier to review on github when you get the chance? Thanks!

pwolfram · 2016-12-08T15:39:13Z

Thanks @xylar for the rebase!

pwolfram

Great work @xylar. I had a few conceptual comments in the review but by and large I think this is exactly what we want and need.

pwolfram · 2016-12-08T15:40:35Z

mpas_analysis/ocean/ocean_modelvsobs.py

    """

+    field = field.lower()


Is this always a good idea? Is it possible that we may have case-sensitive names that have non-unique character strings? I think the odds of this happening are very low and we primarily use camel case for clarity but just wanted to double-check this as a group.

A comment regarding needing field to be lower case would be helpful here.

Again, a bit of mission creep. I didn't decide that this was the way, I just moved this to the top because every time field was accessed it was as field.lower(). I agree, though, that it should just always be called with a lowercase variable name if that's what we want.

I'll take this out, though, because I agree that it doesn't really make sense to me.

@vanroekel or @milenaveneziani, can you explain why it was important that the field name be all lowercase?

@pwolfram, a side note. I think you're reviewing the whole code because I've made a lot of little changes as part of the PEP8 compliance. But a lot of these things aren't related to this PR so let's be careful that we focus on the job at hand and not the entire code.

Agreed @xylar. I'm wondering if we shouldn't keep all PEP8 changes in their own PRs for simplicity in the future. It is pretty hard to review the code with substantive changes as well as PEP8 changes because it is not always clear what changes correspond to what.

Note, I think we could merge the existing code without serious ramifications but wanted to make these different points, not so much to indicate that they need changed, but so that we are aware of these issues moving forward to help avoid future problems. Some of the error handling changes, however, probably are worth making sooner than later.

@xylar and @pwolfram I think I had put in the field.lower. I don't exactly remember why I did this originally, probably unnecessary redundancy in case the call in run_analysis.py was changed. I think it can be removed.

I'm wondering if we shouldn't keep all PEP8 changes in their own PRs for simplicity in the future. It is pretty hard to review the code with substantive changes as well as PEP8 changes because it is not always clear what changes correspond to what.

I'm sympathetic to this. It's hard for me to work on code effectively at this point that isn't PEP8 compliant because spyder gives me a ton of warnings that I usually use to find bugs in my code. At the same time, I don't really have time to go through the repo and make everything PEP8 compliant first, as a separate PR. So I fix things up as I go along.

A compromise approach for the future might be for me to do a separate commit in a given PR for the PEP8 changes and for the substantive changes. I usually try to do it that way but I combined the two in this PR and I'm sorry for doing that.

I'm very concerned that too many PRs will make this work untenable. As it is, it is taking a surprisingly long time in general to get PRs reviewed in this repository. (I generally try to do reviews within a day or two of receiving them, and I leave them unreviewed only if there are significant changes for the submitter to make before I can proceed.) So I am also concerned that breaking PRs into more PRs will not make things move any faster.

pwolfram · 2016-12-08T15:41:42Z

mpas_analysis/ocean/ocean_modelvsobs.py

+
+    If present, variableMap is a dictionary of MPAS-O variable names that map
+    to their mpas_analysis counterparts.
+


Should we have a description of config and field in the doc string? It will be helpful for people who aren't as familiar with the code.

A bit of mission creep but I'll do this.

pwolfram · 2016-12-08T15:43:27Z

mpas_analysis/ocean/ocean_modelvsobs.py

+    streamName = streams.find_stream(streamMap['timeSeriesStats'])
+    infiles = streams.readpath(streamName, startDate=startDate,
+                               endDate=endDate)
+    print 'Reading files {} through {}'.format(infiles[0], infiles[-1])

    plots_dir = config.get('paths', 'plots_dir')
    obsdir = config.get('paths', 'obs_' + field.lower() + 'dir')


field.lower() is redundant following above field=field.lower().

pwolfram · 2016-12-08T15:43:38Z

mpas_analysis/ocean/ocean_modelvsobs.py

@@ -57,55 +67,46 @@ def ocn_modelvsobs(config, field):
    outputTimes = config.getExpression(field.lower() + '_modelvsobs',


field.lower() -> field

pwolfram · 2016-12-08T15:46:21Z

mpas_analysis/ocean/ocean_modelvsobs.py

-		                               selvals={'nVertLevels':1}))
-        ds = remove_repeated_time_index(ds)
-        ds.rename({'time_avg_activeTracers_temperature':'mpasData'}, inplace = True)
+        selvals = {'nVertLevels': 1}


Is this correct? Python is 0-indexed so I just want to make sure the 1 is supposed to be for the 2nd to the top layer, not the top layer with 0 index. I guess I'm confused here and a discussion / clarification would be helpful.

For example, here is an example from a MPAS file I easily had on hand:

In [1]: import xarray as xr In [2]: ds = xr.open_dataset('/Users/pwolfram/Downloads/last_step.nc') In [3]: ds.nVertLevels Out[3]: <xarray.DataArray 'nVertLevels' (nVertLevels: 100)> array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]) Coordinates: * nVertLevels (nVertLevels) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ...

If this is a bug we may need a separate PR for it. It is possible we have similar errors in other places in the code.

@milenaveneziani or @vanroekel, could you comment on this. I would agree that it's likely a bug and we're looking at the temperature in layer 1 instead of layer 0. But this needs to be reported as an issue and addressed with a separate PR.

Agreed, I don't think we should change things here but if this is a bug it needs to be reported and fixed.

Yes this is a bug on my part. I'll issue the report and make a PR.

pwolfram · 2016-12-08T19:39:36Z

mpas_analysis/shared/mpas_xarray/mpas_xarray.py

+    """
+    possible_variables = variable_map[variable_name]
+    for var in possible_variables:
+        if isinstance(var, (list, tuple)):


I'm glad you are supporting lists here for greater generality. That is really nice.

This is necessary because we need ['xtime_start', 'xtime_end'] as a possible mapping of 'Time', for example.

pwolfram · 2016-12-08T19:43:56Z

mpas_analysis/shared/timekeeping/Date.py

+                                    hour=0, minute=0, second=0)
+        maxDate = datetime.datetime(year=2262, month=1, day=1,
+                                    hour=0, minute=0, second=0)
+        outDate = max(minDate, min(maxDate, outDate))


I'm thinking we should issue a warning if we are on the clipping boundary so the unawares user will know there may be a potential problem.

This change is part of #48 and you should be reviewing this code there. I'm happy to respond if you make your suggestion there.

Thanks for clarifying, I've done so.

pwolfram · 2016-12-08T19:46:35Z

mpas_analysis/test/test_date.py

+        timedelta2 = datetime.timedelta(days=20)
+        self.assertEqual(timedelta1, timedelta2)
+
+        date = Date(dateString='0001-01-01', isInterval=False)


I would put a comment above this line reminding the reader that the date is specified based on the quasi-arbitrary xarray datetime boundary.

Again, this is part of #48. Please comment there.

Agreed and done.

pwolfram · 2016-12-08T19:47:15Z

mpas_analysis/test/test_date.py

+        datetime2 = datetime.datetime(year=1850, month=1, day=1)
+        self.assertEqual(datetime1, datetime2)
+
+        date = Date(dateString='9999-01-01', isInterval=False)


This tests the clipping capability, right? I think we should note this here and if there is a warning thrown we should make sure it is what we expect to get.

Again, part of #48. Please move the comment there.

pwolfram · 2016-12-08T19:48:31Z

mpas_analysis/test/test_mpas_xarray.py

+                onlyvars=varList,
+                yearoffset=1850,
+                variable_map=varMap))
+        self.assertEqual(sorted(ds.data_vars.keys()), sorted(varList))


I'd put a comment here that the preprocess function is essentially remapping variables within the dataset, otherwise the assert statement could be a little confusing.

Yep, I'll do that. In general, I think we should all be doing a better job of commenting our tests so the intention is clearer.

I agree and please be on the look out for this in my code!

xylar · 2016-12-08T21:21:27Z

I broke this branch. Working on it...

The to_datetime method can be used (optionally with an offset year) to convert a Date to a datetime. The date is clamped to a valid range supported by numpy's datetime64[ns], used by xarray and pandas. This method will be useful for computing datetime bounds on time-series data sets. The to_timedelta method converts a Date object with isInterval==True to a timedelta object.

This merge updates the 3 time-series analysis scripts to make use of the time series start and end dates, computed via timeseries_yr1 and timeseries_yr2, allowing analysis of a subset of the output data.

has_stream checks if a given stream is present in a streams file. The ability to check for a given string is useful in determining which stream name is appropriate for a given MPAS-O version. find_stream uses has_stream to find which stream in a list of possibilities is present in the file. Also, a few changes have been made to make the module namelist_streams_interface.py PEP8 compliant.

The new function map_variable(...) finds the MPAS dycore variable in a given dataset that corresponds to a given mpas_analysis variable name. The new function rename_variables(...) renames all vriables in a given dataset from MPAS dycore names to mpas_analysis names if they are found in the variable map. An argument for a variable_map has ben added to preprocess_mpas(...). This map, if present, is used to fine the appropriate time variable(s) in the dataset and to rename variables (other than the time variable(s)) to their mpas_analysis names.

This merge adds two dictionaries that are used to map variable and stream names from various MPAS-O versions to the corresponding values in MPAS-Analysis.

Stream and variable maps for use in the ocean analysis are now imported as part of initializing the analysis. The maps are passed on to the OHC time-series analysis function. The OHC time-series analysis has been updated to used the maps to find the appropriate time-series stream and to map variable names to more simplified names that will become the MPAS-Analysis standard names. These more simplified names are then used in the remainder of the analysis. Also, ohc_timeseries.py has been made PEP8 compliant.

Also, make SST analysis script PEP8 compliant.

Also, make sea-ice scripts PEP8 compliant.

xylar · 2016-12-08T22:54:56Z

This branch has been rebased yet again following #57

xylar · 2016-12-08T22:55:19Z

CI seems to have disappeared or gotten stuck. A manual test passed without a problem.

xylar · 2016-12-08T23:03:29Z

CI seems to be hanging on cloning the repo. I am also having a lot of trouble updating to/from GitHub, so there must be an issue on their side. Hopefully it'll work okay again soon...

pwolfram · 2016-12-08T23:36:07Z

It has passed now. @xylar, assuming @milenaveneziani's testing is good I think we getting close to the point where we can merge this PR. I would say that any issues we have left are minor, would you concur?

xylar · 2016-12-09T07:37:54Z

Yes, @milenaveneziani please merge as soon as the other tests have passed and you are comfortable with the approach, but only after you merge #48

milenaveneziani · 2016-12-12T07:31:44Z

I just tested on beta0 and everything worked! Thanks @xylar and @pwolfram.
Only one comment: I noticed that now the time series go from Jan of yr1 to Jan of yr2+1, while before they went through Dec of yr2. Do you understand the reason?

xylar · 2016-12-12T15:34:44Z

@milenaveneziani, could you report this as an issue? I will attempt to address it as soon as I can.

xylar added clean up in progress enhancement mpas_xarray labels Dec 4, 2016

xylar assigned pwolfram and milenaveneziani Dec 4, 2016

xylar mentioned this pull request Dec 4, 2016

Switch from getlist to getExpression in config parser #50

Closed

xylar force-pushed the map_variable_names branch 2 times, most recently from dfe4d46 to 8a4a26e Compare December 5, 2016 13:58

milenaveneziani mentioned this pull request Dec 5, 2016

Fix time-series scripts use time bounds properly #48

Merged

xylar force-pushed the map_variable_names branch 5 times, most recently from 4e88177 to 32be3ea Compare December 7, 2016 20:28

xylar force-pushed the map_variable_names branch from 32be3ea to 129f9bd Compare December 7, 2016 23:28

xylar removed the in progress label Dec 7, 2016

xylar force-pushed the map_variable_names branch from 129f9bd to 8b339c4 Compare December 8, 2016 06:55

xylar mentioned this pull request Dec 8, 2016

Adds SSS model vs obs capability #55

Merged

pwolfram reviewed Dec 8, 2016

View reviewed changes

pwolfram mentioned this pull request Dec 8, 2016

Incorrect value of nVertLevels used in for temperature in ocean_modelvsobs #56

Closed

xylar added 2 commits December 8, 2016 23:39

Make time-series scripts use timeseries_yr1 & 2

9f2a10c

This merge updates the 3 time-series analysis scripts to make use of the time series start and end dates, computed via timeseries_yr1 and timeseries_yr2, allowing analysis of a subset of the output data.

xylar added 10 commits December 8, 2016 23:43

Add variable map test to mpas_xarray testing

6bbc655

Add ocean dicts mapping stream and variable names

2162604

This merge adds two dictionaries that are used to map variable and stream names from various MPAS-O versions to the corresponding values in MPAS-Analysis.

Add stream and variable maps to SST analysis

3d94d67

Also, make SST analysis script PEP8 compliant.

Add variable and stream maps to ocean/modelvsobs

666978e

Add variable and stream maps to sea-ice analysis

0ed2b5c

Also, make sea-ice scripts PEP8 compliant.

Update docstrings and formatting in analysis scripts

8de2a50

Update variable_map --> varmap in analysis scripts and test

9b9434e

xylar force-pushed the map_variable_names branch from 267637e to 9b9434e Compare December 8, 2016 22:46

milenaveneziani merged commit 9b9434e into MPAS-Dev:develop Dec 12, 2016

milenaveneziani mentioned this pull request Dec 12, 2016

Supports ingesting arbitrary MPAS output files (in general, input info from namelist and streams files) #20

Closed

xylar deleted the map_variable_names branch December 12, 2016 15:34


		If present, variableMap is a dictionary of MPAS-O variable names that map
		to their mpas_analysis counterparts.

		@@ -57,55 +67,46 @@ def ocn_modelvsobs(config, field):
		outputTimes = config.getExpression(field.lower() + '_modelvsobs',

Map variables and streams from MPAS dycore names to MPAS-Analysis names #52

Map variables and streams from MPAS dycore names to MPAS-Analysis names #52

Conversation

xylar commented Dec 4, 2016

xylar commented Dec 4, 2016

xylar commented Dec 4, 2016

xylar commented Dec 4, 2016

Testing (so far)

xylar commented Dec 7, 2016

Testing

pwolfram commented Dec 8, 2016

pwolfram commented Dec 8, 2016

pwolfram left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xylar commented Dec 8, 2016

xylar commented Dec 8, 2016

xylar commented Dec 8, 2016

xylar commented Dec 8, 2016

pwolfram commented Dec 8, 2016 • edited Loading

xylar commented Dec 9, 2016 • edited Loading

milenaveneziani commented Dec 12, 2016 • edited Loading

xylar commented Dec 12, 2016

pwolfram commented Dec 8, 2016 •

edited

Loading

xylar commented Dec 9, 2016 •

edited

Loading

milenaveneziani commented Dec 12, 2016 •

edited

Loading