Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a convex hull masking function #237

Merged
merged 10 commits into from
Mar 17, 2020
10 changes: 9 additions & 1 deletion doc/api/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,15 @@ Coordinate Manipulation
inside
block_split

Masking
-------

.. autosummary::
:toctree: generated/

distance_mask
convexhull_mask

Utilities
---------

Expand All @@ -69,7 +78,6 @@ Utilities

test
maxabs
distance_mask
variance_to_weights
grid_to_table
median_distance
Expand Down
45 changes: 45 additions & 0 deletions examples/convex_hull_mask.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
"""
Mask grid points by convex hull
===============================

Sometimes, data points are unevenly distributed. In such cases, we might not
want to have interpolated grid points that are too far from any data point.
Function :func:`verde.convexhull_mask` allows us to set grid points that fall
outside of the convex hull of the data points to NaN or some other value.
"""
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import pyproj
import numpy as np
import verde as vd

# The Baja California bathymetry dataset has big gaps on land. We want to mask
# these gaps on a dummy grid that we'll generate over the region.
data = vd.datasets.fetch_baja_bathymetry()
region = vd.get_region((data.longitude, data.latitude))

# Generate the coordinates for a regular grid mask
spacing = 10 / 60
coordinates = vd.grid_coordinates(region, spacing=spacing)

# Generate a mask for points. The mask is True for points that are within the
# convex hull.
mask = vd.convexhull_mask((data.longitude, data.latitude), coordinates=coordinates)
print(mask)

# Create a dummy grid with ones that we can mask to show the results. Turn
# points that are outside of the convex hull into NaNs so they won't show up in
# our plot.
dummy_data = np.ones_like(coordinates[0])
dummy_data[~mask] = np.nan

# Make a plot of the masked data and the data locations.
crs = ccrs.PlateCarree()
plt.figure(figsize=(7, 6))
ax = plt.axes(projection=ccrs.Mercator())
ax.set_title("Only keep grid points that inside of the convex hull")
ax.plot(data.longitude, data.latitude, ".y", markersize=0.5, transform=crs)
ax.pcolormesh(*coordinates, dummy_data, transform=crs)
vd.datasets.setup_baja_bathymetry_map(ax, land=None)
plt.tight_layout()
plt.show()
7 changes: 1 addition & 6 deletions examples/spline_weights.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,12 +62,7 @@
dims=["latitude", "longitude"],
data_names=["velocity"],
)
grid = vd.distance_mask(
(data.longitude, data.latitude),
maxdist=5 * spacing * 111e3,
grid=grid_full,
projection=projection,
)
grid = vd.convexhull_mask((data.longitude, data.latitude), grid=grid_full)

fig, axes = plt.subplots(
1, 2, figsize=(9, 7), subplot_kw=dict(projection=ccrs.Mercator())
Expand Down
6 changes: 2 additions & 4 deletions examples/vector_uncoupled.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,13 +61,11 @@
print("Cross-validation R^2 score: {:.2f}".format(score))

# Interpolate the wind speed onto a regular geographic grid and mask the data that are
# far from the observation points
# outside of the convex hull of the data points.
grid_full = chain.grid(
region, spacing=spacing, projection=projection, dims=["latitude", "longitude"]
)
grid = vd.distance_mask(
coordinates, maxdist=3 * spacing * 111e3, grid=grid_full, projection=projection
)
grid = vd.convexhull_mask(coordinates, grid=grid_full)

# Make maps of the original and gridded wind speed
plt.figure(figsize=(6, 6))
Expand Down
3 changes: 3 additions & 0 deletions tutorials/weights.py
Original file line number Diff line number Diff line change
Expand Up @@ -246,6 +246,8 @@ def plot_data(coordinates, velocity, weights, title_data, title_weights):
dims=["latitude", "longitude"],
data_names=["velocity"],
)
# Avoid showing interpolation too far away data points.
leouieda marked this conversation as resolved.
Show resolved Hide resolved
grid = vd.convexhull_mask(coordinates, grid=grid)

########################################################################################
# Calculate an unweighted spline as well for comparison.
Expand All @@ -262,6 +264,7 @@ def plot_data(coordinates, velocity, weights, title_data, title_weights):
dims=["latitude", "longitude"],
data_names=["velocity"],
)
grid_unweighted = vd.convexhull_mask(coordinates, grid=grid_unweighted)

########################################################################################
# Finally, plot the weighted and unweighted grids side by side.
Expand Down
2 changes: 1 addition & 1 deletion verde/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
project_region,
longitude_continuity,
)
from .mask import distance_mask
from .mask import distance_mask, convexhull_mask
from .utils import variance_to_weights, maxabs, grid_to_table
from .io import load_surfer
from .distances import median_distance
Expand Down
109 changes: 101 additions & 8 deletions verde/mask.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,9 @@
"""
import numpy as np

# pylint doesn't pick up on this import for some reason
from scipy.spatial import Delaunay # pylint: disable=no-name-in-module

from .base import n_1d_arrays
from .utils import kdtree

Expand Down Expand Up @@ -93,14 +96,7 @@ def distance_mask(
[nan nan nan nan nan nan]]

"""
if coordinates is None and grid is None:
raise ValueError("Either coordinates or grid must be given.")
if coordinates is None:
dims = [grid[var].dims for var in grid.data_vars][0]
coordinates = np.meshgrid(grid.coords[dims[1]], grid.coords[dims[0]])
if len(set(i.shape for i in coordinates)) != 1:
raise ValueError("Coordinate arrays must have the same shape.")
shape = coordinates[0].shape
coordinates, shape = _get_grid_coordinates(coordinates, grid)
if projection is not None:
data_coordinates = projection(*n_1d_arrays(data_coordinates, 2))
coordinates = projection(*n_1d_arrays(coordinates, 2))
Expand All @@ -110,3 +106,100 @@ def distance_mask(
if grid is not None:
return grid.where(mask)
return mask


def convexhull_mask(
data_coordinates, coordinates=None, grid=None,
):
"""
Mask grid points that are outside the convex hull of the given data points.

Either *coordinates* or *grid* must be given:

* If *coordinates* is not None, produces an array that is False when a
point is outside the convex hull and True otherwise.
* If *grid* is not None, produces a mask and applies it to *grid* (an
:class:`xarray.Dataset`).

Parameters
----------
data_coordinates : tuple of arrays
Same as *coordinates* but for the data points.
coordinates : None or tuple of arrays
Arrays with the coordinates of each point that will be masked. Should
be in the following order: (easting, northing, ...). Only easting and
northing will be used, all subsequent coordinates will be ignored.
grid : None or :class:`xarray.Dataset`
2D grid with values to be masked. Will use the first two dimensions of
the grid as northing and easting coordinates, respectively. The mask
will be applied to *grid* using the :meth:`xarray.Dataset.where`
method.

Returns
-------
mask : array or :class:`xarray.Dataset`
If *coordinates* was given, then a boolean array with the same shape as
the elements of *coordinates*. If *grid* was given, then an
:class:`xarray.Dataset` with the mask applied to it.

Examples
--------

>>> from verde import grid_coordinates
>>> region = (0, 5, -10, -4)
>>> spacing = 1
>>> coords = grid_coordinates(region, spacing=spacing)
>>> data_coords = ((2, 3, 2, 3), (-9, -9, -6, -6))
>>> mask = convexhull_mask(data_coords, coordinates=coords)
>>> print(mask)
[[False False False False False False]
[False False True True False False]
[False False True True False False]
[False False True True False False]
[False False True True False False]
[False False False False False False]
[False False False False False False]]
>>> # Mask an xarray.Dataset directly
>>> import xarray as xr
>>> coords_dict = {"easting": coords[0][0, :], "northing": coords[1][:, 0]}
>>> data_vars = {"scalars": (["northing", "easting"], np.ones(mask.shape))}
>>> grid = xr.Dataset(data_vars, coords=coords_dict)
>>> masked = convexhull_mask(data_coords, grid=grid)
>>> print(masked.scalars.values)
[[nan nan nan nan nan nan]
[nan nan 1. 1. nan nan]
[nan nan 1. 1. nan nan]
[nan nan 1. 1. nan nan]
[nan nan 1. 1. nan nan]
[nan nan nan nan nan nan]
[nan nan nan nan nan nan]]

"""
coordinates, shape = _get_grid_coordinates(coordinates, grid)
n_coordinates = len(data_coordinates)
triangles = Delaunay(np.transpose(n_1d_arrays(data_coordinates, n_coordinates)))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does delaunay still use qhull library which uses 32 floats (or something else weird) at some point of it's processing pipeline. This forces one to use "normalized" coordinates, otherwise points are truncated to a "grid"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question. From the docs, I would say "yes": https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.Delaunay.html#scipy.spatial.Delaunay but they don't say anything about that.

What do you mean by "normalized"? Scale them to [0, 1]? If you have any pointers that would be great :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this page but there is nothing on that: http://www.qhull.org/html/qh-impre.htm

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, thanks for finding these things Ari. Always things I would never have thought to look for 👍

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normalized --> small enough

I once did stuff with 1000 m x 1000 m area (points in random position) with real coordinates (northern sweden, some coordinate system with large values) and had this problem. It took sometime to figure out why results were funny.

Good thing is that can be done only for the data going in the Delaunay and then use the original data for rest of the workflow.

Maybe some simple script with large offset from origo could tell if this is still a problem or not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the pointers! I'll try to add a test for that to see if it's something we need to worry about.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://gist.github.com/ahartikainen/bf2c0eb2ed493040ff490e0c97435ade

Not so sure if this can also fail for convex hull, probably it can

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that's severe actually. It probably will since the process I'm using is checking if points fall on any simplex. I'm add this and see if the gallery example changes much.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the normalization code and support for projecting the coordinates prior to the masking so we can get really large coordinates. This is the gallery example without normalization:

qhull

And this is after normalization:

qhull-norm

So no difference in this case. I'll leave the normalization code in there since it doesn't seem to hurt the results and might prevent headaches in the future.

# Find the triangle that contains each grid point.
# -1 indicates that it's not in any triangle.
in_triangle = triangles.find_simplex(
np.transpose(n_1d_arrays(coordinates, n_coordinates))
)
mask = (in_triangle != -1).reshape(shape)
if grid is not None:
return grid.where(mask)
return mask


def _get_grid_coordinates(coordinates, grid):
"""
If coordinates is given, return it and their shape. Otherwise, get
coordinate arrays from the grid.
"""
if coordinates is None and grid is None:
raise ValueError("Either coordinates or grid must be given.")
if coordinates is None:
dims = [grid[var].dims for var in grid.data_vars][0]
coordinates = np.meshgrid(grid.coords[dims[1]], grid.coords[dims[0]])
Comment on lines +222 to +224
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By choosing the northing and easting based on the order of dims could be a source of errors if the dimensions are not in the expected order.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought of this after #229 but we're getting dims from the DataArray so ii should be fine. The only thing that might cause problems is if the dims are different for the different vars. I could add a check for that and error if that's the case.

This is why adding these things back to xarray is hard. I don't think I could generalize this to arbitrary Dataset.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem may appear if the user is using a different order on the dims of the DataArray. I've seen that after applying some xarray functions, sometimes the order of dims are changed (sorry, don't remember exactly which ones, but found this issue that may be related).
I don't have a final solution for this, just wanted to raise the discussion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I hadn't thought of that. It might be worth mentioning in the docs that we're assuming this order of dimensions then.

if len(set(i.shape for i in coordinates)) != 1:
raise ValueError("Coordinate arrays must have the same shape.")
shape = coordinates[0].shape
return coordinates, shape