Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First Pass Zonal Average #82

Open
wants to merge 30 commits into
base: main
Choose a base branch
from
Open

Conversation

mgrover1
Copy link

@mgrover1 mgrover1 commented Mar 18, 2021

Copy link
Contributor

@dcherian dcherian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mgrover1

Mostly looks good.

One issue is how to handle TLONG/TLAT vs ULONG/ULAT. It would be nice if there were standard attributes that told us that one of them are the corner points for the other. I don't know that this exists but it could be worth digging around the sgrid conventions to see if there's something we can use. (https://sgrid.github.io/sgrid/)

Second, with this approach, the lowest order operation is the regrid to uniformly spaced grid (dsregridded). Operations after that are real easy regridded.mean("latitude") and also let you do things other than the average (e.g. regridded.std("latitude")).

Should we instead have users do

uniform = pop_tools.to_uniform_grid(ds, dx=0.1, dy=0.1)
zonal_average = uniform.mean("latitude")

self.method_gen_grid = method_gen_grid

# If the user does not input a mask, use default mask
if not mask:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

applying a mask when mask is None is counter-intuitive =)

Should the default be mask=True

).rename({'TLAT': 'lat', 'TLONG': 'lon'})

# Inlcude only points that will have surrounding corners
data_ds = data_ds.isel({'nlon': data_ds.nlon[1:], 'nlat': data_ds.nlat[1:]})
Copy link
Contributor

@dcherian dcherian Mar 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work for datasets returned by to_xgcm_grid_and_dataset which renames to nlon_t, nlon_u, nlat_t, nlat_u.

I think the solution here is to use cf_xarray

data_ds = data_ds.cf.isel(X=slice(1,None), Y=slice(1, None))

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a suggestion for adding a new X/Y coordinate? I noticed that the only cf axis within the POP grids is time

Coordinates:
- CF Axes: * T: ['time']
             X, Y, Z: n/a

- CF Coordinates:   longitude: ['TLONG', 'ULONG']
                    latitude: ['TLAT', 'ULAT', 'lat_aux_grid']
                  * vertical: ['z_t']
                  * time: ['time']

- Cell Measures:   area, volume: n/a

- Standard Names:   n/a

Data Variables:
- Cell Measures:   area, volume: n/a

- Standard Names:   n/a

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ugh yeah I forgot. We need to make nlat, nlon dimension coordinates for this to work. That allows us to assign attributes like axis: "X"

ds["nlon"] = ("nlon", np.arange(ds.sizes["nlon"]), {"axis": "X"})
ds["nlat"] = ("nlon", np.arange(ds.sizes["nlon"]), {"axis": "Y"})

and similarly for nlat_*, nlon_* in to_xgcm_grid_dataset. This too seems like a followup PR>

data_ds = data_ds.isel({'nlon': data_ds.nlon[1:], 'nlat': data_ds.nlat[1:]})

# Use ulat and ulong values as grid corners, rename variables to match xESMF syntax
grid_corners = grid_ds[['ULAT', 'ULONG']].rename(
Copy link
Contributor

@dcherian dcherian Mar 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is wrong for UVEL for which the centers are ULAT, ULONG and corners are TLAT, TLONG

EDIT: I see you've fixed this :) but a good solution would be to use UVEL.cf["longitude"] to get ULON and in theory there should be some standard attribute to lead you to TLONG from there. I do not know what that attribute is. Might have to look in to the sgrid conventions (https://sgrid.github.io/sgrid/).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alternately this kind of inference requires knowledge of the underlying grid. so xgcm.Grid object would be useful input.

We can hardcode this for POP now and move on, but solving UVEL regridding vs TEMP regridding by interpreting some standard attribute or using the xgcm object allows us to robustly solve it for the POP & MOM & (insert-NCAR-model-here) regridding. SO some brainstorming would be useful...

else:
self.mask_labels = self.grid['region_name']

else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we never need no mask? I guess a global average across all basins is somewhat useless but would be required for the atmosphere.

pop_tools/zonal_average.py Outdated Show resolved Hide resolved
pop_tools/zonal_average.py Outdated Show resolved Hide resolved
and ('velocity' not in obj[var].long_name.lower())
):
ds_list.append(obj[var])
return xr.merge(ds_list).map(self._regrid_dataarray, keep_attrs=True, **kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead build a list of variables names and then ds[list_of_varnames].map(...)

for region in tqdm(np.unique(mask)):

if region != 0:
ds_list.append(data.where(mask == region).groupby('lat').mean())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why groupby? Just a simple .mean("lat").

instead of this loop, I would recommend taking a 2D mask variable and expanding out to 3D with a new region dimension (so that each slice along region has 1's only for the appropriate region) and then multiply. That way you're doing where on a small array rather than a giant one. It also avoids the concat down below

Copy link
Author

@mgrover1 mgrover1 Mar 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Matt and were discussing, making use of the pop_tools.region_mask_3d tool would be helpful with this, replacing "0" with nan values...

# Read in the 3D mask
mask3d = pop_tools.region_mask_3d(grid_name, mask_name='default')

#  Apply the 3D mask
data_masked = mask3d.where(mask3d > 0) * data[list_relavant_variables]

where the dimensions of data_masked would be ('region', 'nlat', 'nlon', 'time', 'z_t')

if vertical_average:

# Run the vertical, weighted average
out = out.weighted(out['z_t'].fillna(0)).mean(dim=['z_t'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why weight by z_t and not DZT?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be calculated by using the following?

DZT = out.z_t.diff(dim='z_t')

Or am I misunderstanding what DZT is?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out.z_t.diff(dim='z_t') will return the distance between layer centers, and for 60 levels it will only return an array of length 59:

>>> import pop_tools
>>> gx1v7_grid = pop_tools.get_grid("POP_gx1v7")
>>> gx1v7_grid.z_t.diff(dim='z_t')
<xarray.DataArray 'z_t' (z_t: 59)>
array([ 1000.     ,  1000.     ,  1000.     ,  1000.     ,  1000.     ,
        1000.     ,  1000.     ,  1000.     ,  1000.     ,  1000.     ,
        1000.     ,  1000.     ,  1000.     ,  1000.     ,  1000.     ,
        1009.8404 ,  1038.0646 ,  1081.22175,  1136.90105,  1205.11015,
        1286.69055,  1383.0544 ,  1496.13345,  1628.40275,  1782.946  ,
        1963.55735,  2174.8772 ,  2422.5496 ,  2713.41105,  3055.7061 ,
        3459.30485,  3935.90165,  4499.1272 ,  5164.489  ,  5948.97315,
        6870.06835,  7943.9187 ,  9182.2754 , 10588.062  , 12149.8325 ,
       13836.2715 , 15593.0264 , 17344.9219 , 19005.6846 , 20494.0957 ,
       21751.2334 , 22751.50195, 23502.97165, 24038.333  , 24401.99805,
       24638.8965 , 24787.6709 , 24878.15135, 24931.6328 , 24962.4424 ,
       24979.77735, 24989.31735, 24994.45895, 24997.17675])
>>>

You want the cell thickness - the difference between layer interfaces, which is z_w_bot - z_w. Those arrays should be the same length although the dimension names are different, but here's a kludgy example of at least returning a DataArray (the coordinates need to be cleaned up):

>>> gx1v7_grid.z_w_bot.swap_dims(z_w_bot="z_t") - gx1v7_grid.z_w.swap_dims(z_w="z_t")
<xarray.DataArray (z_t: 60)>
array([ 1000.    ,  1000.    ,  1000.    ,  1000.    ,  1000.    ,
        1000.    ,  1000.    ,  1000.    ,  1000.    ,  1000.    ,
        1000.    ,  1000.    ,  1000.    ,  1000.    ,  1000.    ,
        1000.    ,  1019.6808,  1056.4484,  1105.9951,  1167.807 ,
        1242.4133,  1330.9678,  1435.141 ,  1557.1259,  1699.6796,
        1866.2124,  2060.9023,  2288.8521,  2556.2471,  2870.575 ,
        3240.8372,  3677.7725,  4194.0308,  4804.2236,  5524.7544,
        6373.1919,  7366.9448,  8520.8926,  9843.6582, 11332.4658,
       12967.1992, 14705.3438, 16480.709 , 18209.1348, 19802.2344,
       21185.957 , 22316.5098, 23186.4941, 23819.4492, 24257.2168,
       24546.7793, 24731.0137, 24844.3281, 24911.9746, 24951.291 ,
       24973.5938, 24985.9609, 24992.6738, 24996.2441, 24998.1094])
Coordinates:
    z_w_bot  (z_t) float64 1e+03 2e+03 3e+03 4e+03 ... 5e+05 5.25e+05 5.5e+05
    z_w      (z_t) float64 0.0 1e+03 2e+03 3e+03 ... 4.75e+05 5e+05 5.25e+05
Dimensions without coordinates: z_t
>>>

I think out might already have a dz data array, though, which would be the best thing to use:

>>> gx1v7_grid["dz"]
<xarray.DataArray 'dz' (z_t: 60)>
array([ 1000.    ,  1000.    ,  1000.    ,  1000.    ,  1000.    ,
        1000.    ,  1000.    ,  1000.    ,  1000.    ,  1000.    ,
        1000.    ,  1000.    ,  1000.    ,  1000.    ,  1000.    ,
        1000.    ,  1019.6808,  1056.4484,  1105.9951,  1167.807 ,
        1242.4133,  1330.9678,  1435.141 ,  1557.1259,  1699.6796,
        1866.2124,  2060.9023,  2288.8521,  2556.2471,  2870.575 ,
        3240.8372,  3677.7725,  4194.0308,  4804.2236,  5524.7544,
        6373.1919,  7366.9448,  8520.8926,  9843.6582, 11332.4658,
       12967.1992, 14705.3438, 16480.709 , 18209.1348, 19802.2344,
       21185.957 , 22316.5098, 23186.4941, 23819.4492, 24257.2168,
       24546.7793, 24731.0137, 24844.3281, 24911.9746, 24951.291 ,
       24973.5938, 24985.9609, 24992.6738, 24996.2441, 24998.1094])
Coordinates:
  * z_t      (z_t) float64 500.0 1.5e+03 2.5e+03 ... 5.125e+05 5.375e+05
Attributes:
    units:      cm
    long_name:  thickness of layer k
>>>

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I adjusted this to utilize the dz dataarray within the grid

out = out.weighted(self.grid.dz).mean(dim='z_t')

Which seems to work well.

@mgrover1
Copy link
Author

One issue is how to handle TLONG/TLAT vs ULONG/ULAT. It would be nice if there were standard attributes that told us that one of them are the corner points for the other. I don't know that this exists but it could be worth digging around the sgrid conventions to see if there's something we can use. (https://sgrid.github.io/sgrid/)

I will look into that.. for now I was just using TLONG/TLAT and ULONG/ULAT since we read the grid directly from pop-tools, but moving toward this would definitely make this more extensible.

Second, with this approach, the lowest order operation is the regrid to uniformly spaced grid (dsregridded). Operations after that are real easy regridded.mean("latitude") and also let you do things other than the average (e.g. regridded.std("latitude")).

uniform = pop_tools.to_uniform_grid(ds, dx=0.1, dy=0.1)
zonal_average = uniform.mean("latitude")

I like the that suggested approach - the reason for the current implementation is to reproduce the previously existing ZA code, but following this would make more sense.

mgrover1 and others added 4 commits March 19, 2021 15:32
Co-authored-by: Anderson Banihirwe <[email protected]>
Co-authored-by: Anderson Banihirwe <[email protected]>
@dcherian
Copy link
Contributor

dcherian commented Mar 20, 2021

. for now I was just using TLONG/TLAT and ULONG/ULAT since we read the grid directly from pop-tools, but moving toward this would definitely make this more extensible.

👍 we could hardcode it for now, choose the bounds based on dataarray.attrs["coordinates"], to move quickly and adapt later


import numpy as np
import xarray as xr
import xesmf as xe
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make xesmf a soft dependency i.e.

try:
	import xesmf as xe
except ImportError:
	message = ("Zonal averaging requires xesmf package.\n\n"
                      "Please conda install as follows:\n\n"
					  " conda install -c conda-forge xesmf>=0.4.0"
               )
    raise ImportError(msg)

??

@mgrover1
Copy link
Author

So... I went back and took a look at Xoak which is another way to approach this problem, allowing the user to index based on coordinates not within the default dimensions... below is the function I used to accomplish a zonal average, and the nice thing about this approach is inputting the output latitudes is easier than the previous approach using xESMF, where in this example, I use the lat_aux_grid .

def zonal_average(ds, out_lats):
    
    # Set the index to be TLAT and TLONG
    ds.xoak.set_index(['TLAT', 'TLONG'], 'sklearn_geo_balltree')
    
    # output lons
    lons = np.arange(-180, 180, 1)
    
    # Build a meshgrid with the lats and lons
    nlons, nlats = np.meshgrid(lons, lats)
    
    # Build an output dataset
    ds_data = xr.Dataset({
        'lat': (('y', 'x'), nlats),
        'lon': (('y', 'x'), nlons),
    })
    
    # Use xoak to select lats/lons
    out = ds.xoak.sel(TLAT=ds_data.lat, TLONG=ds_data.lon)
    
    # Reset the coordidinates
    out = out.reset_coords()
    
    # Add one-dimensional lats and lons
    out['lon'] = ('x', lons)
    out['lat'] = ('y', lats)
    
    # Swap the dimensions
    out = out.swap_dims({'x':'lon',
                         'y':'lat'})
    
    return out

za_temp = zonal_average(ds[['TEMP']], ds.lat_aux_grid.values).mean('lon')

Where ds is the input dataset

@mgrover1
Copy link
Author

mgrover1 commented Apr 1, 2021

Second, with this approach, the lowest order operation is the regrid to uniformly spaced grid (dsregridded). Operations after that are real easy regridded.mean("latitude") and also let you do things other than the average (e.g. regridded.std("latitude")).

Should we instead have users do

uniform = pop_tools.to_uniform_grid(ds, dx=0.1, dy=0.1)
zonal_average = uniform.mean("latitude")

Should we allow users to specify which pop-grid to use (ex. ''POP_gx1v7'')? Also, with this approach, would we automatically apply the region masking so the zonal average would include this? These are reasons why I am currently using the approach similar to xESMF where one sets up the regridder (using the associated pop-grid), then apply the regridder to some data.

@mgrover1
Copy link
Author

mgrover1 commented Apr 1, 2021

Any suggestions on how to speed up this calculation? Currently, the calculation a few minutes on the text files in pop-tools, regridding to 0.25 degree grid... this is the current zonal average section

def zonal_average(self, obj, vertical_average=False, **kwargs):

        data = self.regrid(obj, **kwargs)
        mask = self.regrid(self.mask, regrid_method='nearest_s2d', **kwargs)

        # Attach a name to the mask
        mask.name = self.mask.name

        # Replace zeros with nans and group into regions
        out = mask.where(mask > 0) * data

        # Check to see if a weighted vertical average is needed
        if vertical_average:

            # Calculate the vertical weighte average based on depth of the layer
            out = out.weighted(self.grid.dz).mean(dim='z_t')

        return out

I think the main slow-down may be that it is separating into 13 regions?

@dcherian
Copy link
Contributor

dcherian commented Apr 1, 2021

Should we allow users to specify which pop-grid to use (ex. ''POP_gx1v7'')?

It looks like you only need *LAT, *LONG, dz. Seems like we should just expect the input dataset to have all these variables.

Re: speed. What does snakeviz say? It looks like you're regenerating the destination grid for every variable. That could be avoided.

@mgrover1
Copy link
Author

mgrover1 commented Apr 8, 2021

After some discussion this week with @matt-long and @klindsay28 , I think it would be best to implement a destination grid that matches the input grid.. here is a function that is hard coded for now in terms of dealing with POP, I am not sure how to generalize this since the grid corner information is specific to POP.

Also, @klindsay28, how did you deal with the cyclic point in the previous implementation? Using conservative regridding, there are missing values near the origin of the grid. Using another interpolation method (ex. bilinear) would fix this, but in order to conserve areas/values, it would be good to stick with conservative regridding.

The resultant dataset matches the input for xESMF, which makes it easy to implement into the current development.

def flip_coords(ds):
    """
    Given a set of longitudes and latitudes, flips s-hemisphere into northern hemisphere
    """
    
    tlat = ds.TLAT.values
    ulat = ds.ULAT.values
    nlat = ds.nlat.values
    
    nlat_shem_t = nlat[np.where(tlat[:, 0] < 0)]
    nlat_shem_u = nlat[np.where(ulat[:, 0] < 0)]
    
    new_ds_t = ds.sel(nlat=nlat_shem_t)
    new_ds_u = ds.sel(nlat=nlat_shem_u)
    
    # Pull out tlats/tlons
    tlat = new_ds_t.TLAT.values[1:]
    tlon = new_ds_t.TLONG.values
    
    # Pull out ulats/ulons
    ulat = new_ds_u.ULAT.values
    ulon = new_ds_u.ULONG.values
    
    # Use the grid spacing from the farthest south
    grid_spacing = np.diff(tlat[:, 0])[0]
    
    out_tlats = np.append(np.append(tlat[:, 0], -tlat[:, 0][::-1]), np.arange(np.max(-tlat[:, 0])+grid_spacing, 90, grid_spacing))
    out_tlons = sorted(tlon[0, :])
    
    tlons, tlats = np.meshgrid(out_tlons, out_tlats)
    
    out_ulats = np.append(np.append(np.append(ulat[:, 0], 0), -ulat[:, 0][::-1]), np.arange(np.max(-ulat[:, 0])+grid_spacing, 90, grid_spacing))
    out_ulons = sorted(ulon[0, :])
    
    ulons, ulats = np.meshgrid(out_ulons, out_ulats)
    
    
    ds_out = xr.Dataset(coords=dict(
        lon=(["x"], out_tlons[1:]),
        lat=(["y"], out_tlats[1:]),
        lon_b=(["x_b"], out_ulons),
        lat_b=(["y_b"], out_ulats)))
    
    
    return ds_out

@mgrover1
Copy link
Author

I am struggling with finding the correct bounds for the pop-grid... here is a function I have that uses the grid corner function in pop-tools... would this be the right way to go about this?

def gen_corner_calc(ULAT, ULONG):
    """
    Generates corner information and creates single dataset with output
    """
    
    # Use the function in pop-tools to get the grid corner information
    corn_lat, corn_lon = pop_tools.grid._compute_corners(ULAT, ULONG)

    lon_shape, lat_shape = corn_lon[:, :, 0].shape
    out_shape = (lon_shape + 1, lat_shape + 1)
    
    # Generate numpy arrays to store destination lats/lons
    out_lons = np.zeros((out_shape))
    out_lats = np.zeros((out_shape))
    
    # Assign the northeast corner information
    out_lons[1:, 1:] = corn_lon[:, :, 0]
    out_lats[1:, 1:] = corn_lat[:, :, 0]
    
    # Assign the northwest corner information
    out_lons[:-1, 1:] = corn_lon[:, :, 1]
    out_lats[:-1, 1:] = corn_lat[:, :, 1]
    
    # Assign the southwest corner information
    out_lons[:-1, :-1] = corn_lon[:, :, 2]
    out_lats[:-1, :-1] = corn_lat[:, :, 2]
    
    # Assign the southeast corner information
    out_lons[1:, :-1] = corn_lon[:, :, 3]
    out_lats[1:, :-1] = corn_lat[:, :, 3]
    
    ds_out = xr.Dataset(coords=dict(lon_b=(["nlat_b", "nlon_b"], out_lons),
                                    lat_b=(["nlat_b", "nlon_b"], out_lats)))
    
    return ds_out

pop_tools/regrid.py Outdated Show resolved Hide resolved
pop_tools/regrid.py Outdated Show resolved Hide resolved
print(data, dst_grid)

# Convert the mask to a uniform grid, using nearest neighbor interpolation
mask_regrid = to_uniform_grid(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we have this support DataArrays directly? Call .to_dataset() on the dataarray in to_uniform_grid

pop_tools/regrid.py Outdated Show resolved Hide resolved
pop_tools/regrid.py Outdated Show resolved Hide resolved
pop_tools/regrid.py Outdated Show resolved Hide resolved
else:
raise TypeError('Missing defined longitude axis bounds')

lons, lats = np.meshgrid(sorted(lon), lat)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the sorting really required?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily - I can go ahead and remove it

for var in obj:
if 'nlat' in obj[var].dims and 'nlon' in obj[var].dims:

if obj[var].cf['latitude'].name == 'TLAT':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mrege this with the above?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So adjust that to this?

if 'nlat' in obj[var].dims and 'nlon' in obj[var].dims and obj[var].cf['latitude'].name

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

print(np.unique(mask_regrid))

# Add a mask to the regridding
dst_grid['mask'] = (('y', 'x'), (mask_regrid.where(mask_regrid == 0, 1, 0)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does np.logical_not do what you want here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're inverting the mask here:

(mask_regrid.where(mask_regrid == 0, 1, 0))

I think np.logical_not(mask_regrid) could be equivalent and possibly faster?

Comment on lines 418 to 424
ds_list = [
data_regrid.where(mask_regrid == region)
.weighted(weights_regrid.where(mask_regrid == region).fillna(0))
.mean('x')
for region in tqdm(np.unique(mask_regrid))
if region != 0
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments here:

  1. This function should have one job: compute a zonal average. Why should it bring in these regions.
  2. Do you really need the where on the weights. Also .where(..., 0) will save an extra copy.
  3. This could be a lot faster if you made mask_regrid a 3D variable: (lat, lon, region) and then passed it in as a weight.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah...

  1. From the user API perspective, I thought it would be helpful to be able to specify the grid name and handle the logic within here... where would that be better placed? Or what is the higher level api that we would want the user to specify the region information?
  2. I can remove the where on the weights
  3. That is true; this might also be a better way of dealing with the fractional areas of the masked grid cells...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One way would be to expect users to pass in ds * ds.region_mask if region_mask was a 3D variable (region, nlat, nlon). Would this be equivalent?

Does the order of operations here: regrid, then multiply by region mask and mean, follow what's in Keith's FORTRAN program?

pop_tools/regrid.py Outdated Show resolved Hide resolved
elif method == 'lat_axis':
out_ds = xr.Dataset()

if lat_axis_bnds is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears that lat_axis_bnds is required for this function to properly work. Instead of having it as a keyword argument, could we just make it a required argument and remove this check?

regrid_method = str
Regridding method for xESMF, default is conservative

**kwargs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand this correctly, these kwargs are meant to be passed to

def _regrid_dataset(da_in, dst_grid, regrid_method=None)

This function only has one keyword argument regrid_method. Could we just expose this argument in to_uniform_grid signature instead of using the catch-all kwargs?

pop_tools/regrid.py Outdated Show resolved Hide resolved
Comment on lines +127 to +135
if isinstance(ds, xr.DataArray):
grid_center_lat = ds.cf['latitude'].name
grid_center_lon = ds.cf['longitude'].name
ds = ds.to_dataset()

elif isinstance(ds, xr.Dataset):
var = list(ds.variables)[0]
grid_center_lat = ds[var].cf['latitude'].name
grid_center_lon = ds[var].cf['longitude'].name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are using cf_xarray accessor here, but I don't see cf_xarray in the imports. Is this working as expected without the cf_xarray import?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also list(ds.variables)[0]?

I think the only sane way to do this is to look at loop over DataArrays and check "TLAT" in DataArray.attrs["coordinates"].So take an input dataset then split it in to two: those with coordinates TLAT, TLONG; and then those with ULAT, ULONG; process the two datasets separately then combine them to a single dataset and return that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dcherian This logic is handled within the to_uniform_grid function...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mgrover1, as Deepak pointed out, this line is going to result in errors when the "first" grabbed variable doesn't have spatial dimensions...

var = list(ds.variables)[0]

Could we make this more robust?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can go ahead and remove the check to see if it is a dataset since it should be a data array, and this is a function that is meant to used internally... in to_uniform_grid, we check to see if nlat, nlon, and TLAT are in the dimensions, creating a list of variables. These variables are then passed into prep_for_xesmf function, iterating over those variables.

pop_tools/regrid.py Outdated Show resolved Hide resolved
pop_tools/regrid.py Outdated Show resolved Hide resolved
for var in obj:
if 'nlat' in obj[var].dims and 'nlon' in obj[var].dims:

if obj[var].cf['latitude'].name == 'TLAT':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Comment on lines +127 to +135
if isinstance(ds, xr.DataArray):
grid_center_lat = ds.cf['latitude'].name
grid_center_lon = ds.cf['longitude'].name
ds = ds.to_dataset()

elif isinstance(ds, xr.Dataset):
var = list(ds.variables)[0]
grid_center_lat = ds[var].cf['latitude'].name
grid_center_lon = ds[var].cf['longitude'].name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also list(ds.variables)[0]?

I think the only sane way to do this is to look at loop over DataArrays and check "TLAT" in DataArray.attrs["coordinates"].So take an input dataset then split it in to two: those with coordinates TLAT, TLONG; and then those with ULAT, ULONG; process the two datasets separately then combine them to a single dataset and return that.

@clyne
Copy link

clyne commented Aug 13, 2021

@mgrover1 is this function something that would be generally useful for geoscience applications, or is it specific to POP? Just curious. Thanks!

@mgrover1 mgrover1 requested a review from andersy005 August 16, 2021 19:28
@mgrover1
Copy link
Author

@mgrover1 is this function something that would be generally useful for geoscience applications, or is it specific to POP? Just curious. Thanks!

In its current state, it is fairly POP specific... but it should be relatively straight forward to make this more generic to the point where it could be applied to other use cases.

pop_tools/regrid.py Outdated Show resolved Hide resolved
Comment on lines +16 to +19
assert isinstance(out, xr.Dataset)
assert 'region' in out.dims
assert 'TEMP' in out.variables
assert 14 == len(out.region)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we produce some output with keith's za functionality, and then do some numerical comparison as part of the test?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am planning on holding off on this until we can figure out why it's next lining up exactly... It has something to do with the region masks, but I can't nail it down

pop_tools/regrid.py Outdated Show resolved Hide resolved
pop_tools/regrid.py Show resolved Hide resolved
pop_tools/regrid.py Outdated Show resolved Hide resolved
pop_tools/regrid.py Outdated Show resolved Hide resolved
pop_tools/regrid.py Outdated Show resolved Hide resolved
pop_tools/regrid.py Show resolved Hide resolved
pop_tools/regrid.py Outdated Show resolved Hide resolved
pop_tools/regrid.py Show resolved Hide resolved
pop_tools/regrid.py Outdated Show resolved Hide resolved
Parameters
----------
method: str
Method to use for generating the destination grid, options include 'regular_grid' or 'define_lat_aux'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's lat_aux_grid down below.

@dcherian
Copy link
Contributor

An ASP grad student is asking about regridding POP to uniform grid. Can we merge theto_uniform_grid bit and wait on the zonal_average bit?

@mgrover1
Copy link
Author

@dcherian that makes sense to me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add function to compute zonal mean
6 participants