Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support in subset_bbox for unstructured grids such as station data #161

Open
huard opened this issue Apr 6, 2021 · 2 comments
Open

Support in subset_bbox for unstructured grids such as station data #161

huard opened this issue Apr 6, 2021 · 2 comments

Comments

@huard
Copy link
Contributor

huard commented Apr 6, 2021

Description

Point observations are usually formatted with a variable defined along region and time dimensions. Geographical coordinates (lat, lon) for stations are themselves defined along region. With clisops 0.6.3, this data model is wrongly interpreted as a curvilinear grid, and masks stations outside of the bounding box instead of removing them entirely from the region dimension.

@aulemahal
Copy link
Collaborator

aulemahal commented Apr 6, 2021

Second issue relating to unstructured data:
In this example, the 1D coord shared by lat and lon is location and is a list of city names. This makes subset_bbox fail:

from xclim.testing import open_dataset
from clisops.core.subset import subset_bbox

ds = open_dataset("ERA5/daily_surface_cancities_1990-1993.nc")
subset_bbox(ds, lat_bnds=[44, 46], lon_bnds=[-75, -60])

fails with

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-a0ff1095b7a5> in <module>
      1 ds2 = ds.copy()
      2 ds2['location'] = [1, 2, 3, 4, 5]
----> 3 subset_bbox(ds, lat_bnds=[44, 46], lon_bnds=[-75, -60])

~/.conda/envs/xclim/lib/python3.8/site-packages/clisops/core/subset.py in func_checker(*args, **kwargs)
    257                     kwargs[lon][kwargs[lon] < 0] -= 360
    258 
--> 259         return func(*args, **kwargs)
    260 
    261     return func_checker

~/.conda/envs/xclim/lib/python3.8/site-packages/clisops/core/subset.py in subset_bbox(da, lon_bnds, lat_bnds, start_date, end_date, first_level, last_level)
    861             try:
    862                 coords = da[d][ind[i]]
--> 863                 bnds = _check_desc_coords(
    864                     coord=da[d],
    865                     bounds=[coords.min().values, coords.max().values],

~/.conda/envs/xclim/lib/python3.8/site-packages/clisops/core/subset.py in _check_desc_coords(coord, bounds, dim)
    955 def _check_desc_coords(coord, bounds, dim):
    956     """If Dataset coordinates are descending reverse bounds."""
--> 957     if np.all(coord.diff(dim=dim) < 0) and len(coord) > 1:
    958         bounds = np.flip(bounds)
    959     return bounds

~/.conda/envs/xclim/lib/python3.8/site-packages/xarray/core/dataarray.py in diff(self, dim, n, label)
   3105         DataArray.differentiate
   3106         """
-> 3107         ds = self._to_temp_dataset().diff(n=n, dim=dim, label=label)
   3108         return self._from_temp_dataset(ds)
   3109 

~/.conda/envs/xclim/lib/python3.8/site-packages/xarray/core/dataset.py in diff(self, dim, n, label)
   5487             if dim in var.dims:
   5488                 if name in self.data_vars:
-> 5489                     variables[name] = var.isel(**kwargs_end) - var.isel(**kwargs_start)
   5490                 else:
   5491                     variables[name] = var.isel(**kwargs_new)

~/.conda/envs/xclim/lib/python3.8/site-packages/xarray/core/variable.py in func(self, other)
   2299             with np.errstate(all="ignore"):
   2300                 new_data = (
-> 2301                     f(self_data, other_data)
   2302                     if not reflexive
   2303                     else f(other_data, self_data)

TypeError: unsupported operand type(s) for -: 'str' and 'str'

@aulemahal
Copy link
Collaborator

I edited my comment above because the first part was no more relevant.

Here are details on the top issue:

MWE, same dataset as my comment above:

from xclim.testing import open_dataset
from clisops.core.subset import subset_bbox

ds = open_dataset("ERA5/daily_surface_cancities_1990-1993.nc")

# Trick to avoid error mentioned above.
ds2 = ds.copy() 
ds2['location'] = [1, 2, 3, 4, 5]

subset_bbox(ds2, lat_bnds=[45, 50], lon_bnds=[-125, -70])

Those lat and lon bounds are supposed to select locations 2 and 5. As expected, location 1 is dropped. However elements 3 and 4 are still in the output dataset, but with NaNs everywhere, masked instead of dropped.

I think this comes from the fact that the lat/lon bounds are translated to bounds on the shared dim (location), as would make sense with a curvilinear grid. May be a special case for unstructured grids would be nice (the case when lat and lon are 1D and share the same dimension).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants