Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement integrate #2653

Merged
merged 9 commits into from
Jan 31, 2019
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions doc/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -151,6 +151,7 @@ Computation
Dataset.diff
Dataset.quantile
Dataset.differentiate
Dataset.integrate

**Aggregation**:
:py:attr:`~Dataset.all`
Expand Down Expand Up @@ -319,6 +320,7 @@ Computation
DataArray.dot
DataArray.quantile
DataArray.differentiate
DataArray.integrate

**Aggregation**:
:py:attr:`~DataArray.all`
Expand Down
14 changes: 12 additions & 2 deletions doc/computation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -199,6 +199,8 @@ You can also use ``construct`` to compute a weighted rolling sum:
To avoid this, use ``skipna=False`` as the above example.


.. _compute.using_coordinates:

Computation using Coordinates
=============================

Expand All @@ -220,9 +222,17 @@ This method can be used also for multidimensional arrays,
coords={'x': [0.1, 0.11, 0.2, 0.3]})
a.differentiate('x')

:py:meth:`~xarray.DataArray.integrate` computes integration based on
trapezoidal rule using their coordinates,

.. ipython:: python

a.integrate('x')

.. note::
This method is limited to simple cartesian geometry. Differentiation along
multidimensional coordinate is not supported.
These methods are limited to simple cartesian geometry. Differentiation
and integration along multidimensional coordinate are not supported.


.. _compute.broadcasting:

Expand Down
13 changes: 10 additions & 3 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,14 +32,21 @@ Enhancements
as long as the array is not chunked along the resampling dimension.
By `Spencer Clark <https://github.com/spencerkclark>`_.

- :py:meth:`~xarray.DataArray.integrate` and
:py:meth:`~xarray.Dataset.integrate` are newly added.
See :ref:`_compute.using_coordinates` for the detail.
(:issue:`1332`)
By `Keisuke Fujii <https://github.com/fujiisoup>`_.


Bug fixes
~~~~~~~~~

- Interpolating via resample now internally specifies ``bounds_error=False``
as an argument to ``scipy.interpolate.interp1d``, allowing for interpolation
from higher frequencies to lower frequencies. Datapoints outside the bounds
of the original time coordinate are now filled with NaN (:issue:`2197`). By
`Spencer Clark <https://github.com/spencerkclark>`_.
`Spencer Clark <https://github.com/spencerkclark>`_.

.. _whats-new.0.11.2:

Expand Down Expand Up @@ -73,8 +80,8 @@ Breaking changes
- Minimum rasterio version increased from 0.36 to 1.0 (for ``open_rasterio``)
- Time bounds variables are now also decoded according to CF conventions
(:issue:`2565`). The previous behavior was to decode them only if they
had specific time attributes, now these attributes are copied
automatically from the corresponding time coordinate. This might
had specific time attributes, now these attributes are copied
automatically from the corresponding time coordinate. This might
brake downstream code that was relying on these variables to be
fujiisoup marked this conversation as resolved.
Show resolved Hide resolved
not decoded.
By `Fabien Maussion <https://github.com/fmaussion>`_.
Expand Down
52 changes: 52 additions & 0 deletions xarray/core/dataarray.py
Original file line number Diff line number Diff line change
Expand Up @@ -2427,6 +2427,58 @@ def differentiate(self, coord, edge_order=1, datetime_unit=None):
coord, edge_order, datetime_unit)
return self._from_temp_dataset(ds)

def integrate(self, coord, datetime_unit=None):
""" integrate the array with the trapezoidal rule.

.. note::
This feature is limited to simple cartesian geometry, i.e. coord
must be one dimensional.

Parameters
----------
coord: str
The coordinate to be used to compute the gradient.
datetime_unit
Can be specify the unit if datetime coordinate is specified.One of
{'Y', 'M', 'W', 'D', 'h', 'm', 's', 'ms', 'us', 'ns', 'ps', 'fs',
'as'}

Returns
-------
integrated: DataArray

See also
--------
numpy.trapz: corresponding numpy function

Examples
--------

>>> da = xr.DataArray(np.arange(12).reshape(4, 3), dims=['x', 'y'],
... coords={'x': [0, 0.1, 1.1, 1.2]})
>>> da
<xarray.DataArray (x: 4, y: 3)>
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]])
Coordinates:
* x (x) float64 0.0 0.1 1.1 1.2
Dimensions without coordinates: y
>>>
>>> da.differentiate('x')
fujiisoup marked this conversation as resolved.
Show resolved Hide resolved
<xarray.DataArray (x: 4, y: 3)>
array([[30. , 30. , 30. ],
[27.545455, 27.545455, 27.545455],
[27.545455, 27.545455, 27.545455],
[30. , 30. , 30. ]])
Coordinates:
* x (x) float64 0.0 0.1 1.1 1.2
Dimensions without coordinates: y
"""
ds = self._to_temp_dataset().integrate(coord, datetime_unit)
return self._from_temp_dataset(ds)


# priority most be higher than Variable to properly work with binary ufuncs
ops.inject_all_ops_and_reduce_methods(DataArray, priority=60)
64 changes: 64 additions & 0 deletions xarray/core/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -3866,6 +3866,70 @@ def differentiate(self, coord, edge_order=1, datetime_unit=None):
variables[k] = v
return self._replace_vars_and_dims(variables)

def integrate(self, coord, datetime_unit=None):
fujiisoup marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should coord=None have the default behavior of integrating over all dimensions? Or would that be confusing in some way?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally think it would be a little confusing because the result may change depending on which coordinate is used for integrate, e.g. if the DataArray has a dimension without coordinate but another one-dimensional coordinate, it is not very clear which should be used.

It would be a little convenient for 1d arrays, but aswe disallow default argument for diff, I like to disallow default argument here too.

""" integrate the array with the trapezoidal rule.

.. note::
This feature is limited to simple cartesian geometry, i.e. coord
must be one dimensional.

Parameters
----------
coord: str
The coordinate to be used to compute the gradient.
datetime_unit
Can be specify the unit if datetime coordinate is specified.One of
fujiisoup marked this conversation as resolved.
Show resolved Hide resolved
{'Y', 'M', 'W', 'D', 'h', 'm', 's', 'ms', 'us', 'ns', 'ps', 'fs',
'as'}

Returns
-------
integrated: Dataset

See also
--------
DataArray.integrate
numpy.trapz: corresponding numpy function
"""
from .variable import Variable

if coord not in self.variables and coord not in self.dims:
raise ValueError('Coordinate {} does not exist.'.format(coord))

coord_var = self[coord].variable
if coord_var.ndim != 1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is currently not possible due to xarray's data model, but it's a good idea to add this anyways given that we want to change this soon (e.g., see #2405).

I would recommend adjusting this to if coord_var.dims != (dim,), which is a little stricter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I first thought that it would be nice if we could integrate even along non-dimensional (1d) coordinate (as interpolate_na, differential do), but it also sounds something too much.
How do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that seems reasonable to support

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then, coord is a better argument rather than dim?
Or we use dim for argument but support integration along non-dimensional coordinate with a slight avoidance of correctness, as it is more consistent with other reduction methods?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, differentiate uses coord, so maybe integrate should too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, +1 for consistency with differentiate.

raise ValueError('Coordinate {} must be 1 dimensional but is {}'
' dimensional'.format(coord, coord_var.ndim))

dim = coord_var.dims[0]
if _contains_datetime_like_objects(coord_var):
if coord_var.dtype.kind in 'mM' and datetime_unit is None:
datetime_unit, _ = np.datetime_data(coord_var.dtype)
elif datetime_unit is None:
datetime_unit = 's' # Default to seconds for cftime objects
coord_var = datetime_to_numeric(
coord_var, datetime_unit=datetime_unit)

variables = OrderedDict()
coord_names = []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

coords_names should always be a set, since you pass it into _replace_vars_and_dims. Actually, this is a great example of a bug mypy would have caught! (see #2655)

for k, v in self.variables.items():
if k in self.coords:
if dim not in v.dims:
variables[k] = v
coord_names.append(k)
else:
if (k in self.data_vars and dim in v.dims):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: no need for extra parenthesis here

if _contains_datetime_like_objects(v):
v = datetime_to_numeric(v, datetime_unit=datetime_unit)
integ = duck_array_ops.trapz(
v.data, coord_var, axis=v.get_axis_num(dim))
v_dims = list(v.dims)
v_dims.remove(dim)
variables[k] = Variable(v_dims, integ)
else:
variables[k] = v
return self._replace_vars_and_dims(variables, coord_names=coord_names)

@property
def real(self):
return self._unary_op(lambda x: x.real,
Expand Down
5 changes: 5 additions & 0 deletions xarray/core/duck_array_ops.py
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,11 @@ def gradient(x, coord, axis, edge_order):
return npcompat.gradient(x, coord, axis=axis, edge_order=edge_order)


def trapz(x, coord, axis):
# np.trapz should also work for dask array
fujiisoup marked this conversation as resolved.
Show resolved Hide resolved
return np.trapz(x, coord, axis=axis)


masked_invalid = _dask_or_eager_func(
'masked_invalid', eager_module=np.ma,
dask_module=getattr(dask_array, 'ma', None))
Expand Down
69 changes: 69 additions & 0 deletions xarray/tests/test_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -4660,3 +4660,72 @@ def test_differentiate_cftime(dask):
# Test the differentiation of datetimes themselves
actual = da['time'].differentiate('time', edge_order=1, datetime_unit='D')
assert_allclose(actual, xr.ones_like(da['time']).astype(float))


@pytest.mark.parametrize('dask', [True, False])
def test_integrate(dask):
rs = np.random.RandomState(42)
coord = [0.2, 0.35, 0.4, 0.6, 0.7, 0.75, 0.76, 0.8]

da = xr.DataArray(rs.randn(8, 6), dims=['x', 'y'],
coords={'x': coord, 'x2': (('x', ), rs.randn(8)),
'z': 3, 'x2d': (('x', 'y'), rs.randn(8, 6))})
if dask and has_dask:
da = da.chunk({'x': 4})

ds = xr.Dataset({'var': da})

# along x
actual = da.integrate('x')
# coordinate that contains x should be dropped.
expected_x = xr.DataArray(
np.trapz(da, da['x'], axis=0), dims=['y'],
coords={k: v for k, v in da.coords.items() if 'x' not in v.dims})
assert_equal(expected_x, actual)
assert_equal(ds['var'].integrate('x'), ds.integrate('x')['var'])

# along y
actual = da.integrate('y')
expected_y = xr.DataArray(
np.trapz(da, da['y'], axis=1), dims=['x'],
coords={k: v for k, v in da.coords.items() if 'y' not in v.dims})
assert_equal(expected_y, actual)
assert_equal(actual, ds.integrate('y')['var'])
assert_equal(ds['var'].integrate('y'), ds.integrate('y')['var'])

with pytest.raises(ValueError):
da.integrate('x2d')


@pytest.mark.parametrize('dask', [True, False])
@pytest.mark.parametrize('which_datetime', ['np', 'cftime'])
def test_trapz_datetime(dask, which_datetime):
rs = np.random.RandomState(42)
if which_datetime == 'np':
coord = np.array(
['2004-07-13', '2006-01-13', '2010-08-13', '2010-09-13',
'2010-10-11', '2010-12-13', '2011-02-13', '2012-08-13'],
dtype='datetime64')
else:
if not has_cftime:
pytest.skip('Test requires cftime.')
coord = xr.cftime_range('2000', periods=8, freq='2D')

da = xr.DataArray(
rs.randn(8, 6),
coords={'time': coord, 'z': 3, 't2d': (('time', 'y'), rs.randn(8, 6))},
dims=['time', 'y'])

if dask and has_dask:
da = da.chunk({'time': 4})

actual = da.integrate('time', datetime_unit='D')
expected_data = np.trapz(
da, utils.datetime_to_numeric(da['time'], datetime_unit='D'), axis=0)
expected = xr.DataArray(
expected_data, dims=['y'],
coords={k: v for k, v in da.coords.items() if 'time' not in v.dims})
assert_equal(expected, actual)

actual2 = da.integrate('time', datetime_unit='h')
assert_allclose(actual, actual2 / 24.0)