Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/weighted #2922

Merged
merged 60 commits into from
Mar 19, 2020
Merged
Show file tree
Hide file tree
Changes from 53 commits
Commits
Show all changes
60 commits
Select commit Hold shift + click to select a range
0f2da8e
weighted for DataArray
mathause Apr 26, 2019
5f64492
remove some commented code
mathause Apr 26, 2019
685e5c4
pep8 and faulty import tests
mathause Apr 26, 2019
c9d612d
add weighted sum, replace 0s in sum_of_wgt
mathause Apr 30, 2019
a20a4cf
weighted: overhaul tests
mathause Apr 30, 2019
26c24b6
weighted: pep8
mathause Apr 30, 2019
f3c6758
weighted: pep8 lines
mathause Apr 30, 2019
25c3c29
weighted update docs
mathause May 2, 2019
5d37d11
weighted: fix typo
mathause May 2, 2019
b1c572b
weighted: pep8
mathause May 8, 2019
d1d1f2c
undo changes to avoid merge conflict
mathause Oct 17, 2019
6be1414
Merge branch 'master' into feature/weighted
mathause Oct 17, 2019
059263c
add weighted to dataarray again
mathause Oct 17, 2019
8b1904b
remove super
mathause Oct 17, 2019
8cad145
overhaul core/weighted.py
mathause Oct 17, 2019
49d4e43
add DatasetWeighted class
mathause Oct 17, 2019
527256e
_maybe_get_all_dims return sorted tuple
mathause Oct 17, 2019
739568f
work on: test_weighted
mathause Oct 17, 2019
f01305d
black and flake8
mathause Oct 17, 2019
2e3880d
Apply suggestions from code review (docs)
mathause Oct 17, 2019
ae8d048
restructure interim
mathause Oct 18, 2019
dc7f605
restructure classes
mathause Oct 18, 2019
c646568
Merge branch 'master' into feature/weighted
mathause Dec 4, 2019
e2ad69e
update weighted.py
mathause Dec 4, 2019
bd4f048
black
mathause Dec 4, 2019
3c7695a
use map; add keep_attrs
mathause Dec 4, 2019
ef07edd
implement expected_weighted; update tests
mathause Dec 4, 2019
064b5a9
add whats new
mathause Dec 4, 2019
fec1a35
Merge branch 'master' into feature/weighted
mathause Dec 4, 2019
72c7942
undo changes to whats-new
mathause Dec 4, 2019
0e91411
F811: noqa where?
mathause Dec 4, 2019
1eb2913
api.rst
mathause Dec 5, 2019
118dfed
add to computation
mathause Dec 5, 2019
e08c921
small updates
mathause Dec 5, 2019
0fafe0b
add example to gallery
mathause Dec 5, 2019
a8d330d
typo
mathause Dec 5, 2019
ae0012f
another typo
mathause Dec 5, 2019
111259b
correct docstring in core/common.py
mathause Dec 5, 2019
5afc6f3
Merge branch 'master' into feature/weighted
mathause Jan 14, 2020
668b54b
typos
mathause Jan 14, 2020
d877022
adjust review
mathause Jan 14, 2020
ead681e
clean tests
mathause Jan 14, 2020
c4598ba
add test nonequal coords
mathause Jan 14, 2020
866fba5
comment on use of dot
mathause Jan 14, 2020
3cc00c1
fix erroneous merge
mathause Jan 14, 2020
8f34167
Merge branch 'master' into feature/weighted
mathause Jan 21, 2020
9f0a8cd
update tests
mathause Jan 21, 2020
98929f1
Merge branch 'master' into feature/weighted
mathause Mar 5, 2020
62c43e6
move example to notebook
mathause Mar 5, 2020
2e8aba2
move whats-new entry to 15.1
mathause Mar 5, 2020
d14f668
some doc updates
mathause Mar 5, 2020
7fa78ae
dot to own function
mathause Mar 5, 2020
3ebb9d4
simplify some tests
mathause Mar 5, 2020
f01d47a
Doc updates
dcherian Mar 17, 2020
4b184f6
very minor changes.
dcherian Mar 17, 2020
1e06adc
fix & add references
dcherian Mar 17, 2020
706579a
doc: return 0/NaN on 0 weights
mathause Mar 17, 2020
b2718db
Merge branch 'feature/weighted' of https://github.com/mathause/xarray…
mathause Mar 17, 2020
4c17108
Merge branch 'master' into feature/weighted
mathause Mar 17, 2020
8acc78e
Update xarray/core/common.py
dcherian Mar 18, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions doc/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -165,6 +165,7 @@ Computation
Dataset.groupby_bins
Dataset.rolling
Dataset.rolling_exp
Dataset.weighted
Dataset.coarsen
Dataset.resample
Dataset.diff
Expand Down Expand Up @@ -340,6 +341,7 @@ Computation
DataArray.groupby_bins
DataArray.rolling
DataArray.rolling_exp
DataArray.weighted
DataArray.coarsen
DataArray.dt
DataArray.resample
Expand Down
63 changes: 63 additions & 0 deletions doc/computation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -243,6 +243,69 @@ You can also use ``construct`` to compute a weighted rolling sum:

.. _comput.coarsen:

Weighted array reductions
=========================

``DataArray`` and ``Dataset`` objects include :py:meth:`~xarray.DataArray.weighted`
and :py:meth:`~xarray.Dataset.weighted` array reduction methods. They currently
support weighted ``sum`` and weighted ``mean``.

.. ipython:: python

coords = dict(month=('month', [1, 2, 3]))

prec = xr.DataArray([1.1, 1.0, 0.9], dims=('month', ), coords=coords)
weights = xr.DataArray([31, 28, 31], dims=('month', ), coords=coords)

Create a weighted object:

.. ipython:: python

weighted_prec = prec.weighted(weights)
weighted_prec

Calculate the weighted sum:

.. ipython:: python

weighted_prec.sum()

Calculate the weighted mean:

.. ipython:: python

weighted_prec.mean(dim="month")

The weighted sum corresponds to:

.. ipython:: python

weighted_sum = (prec * weights).sum()
weighted_sum

and the weighted mean to:

.. ipython:: python

weighted_mean = weighted_sum / weights.sum()
weighted_mean

However, the functions also take missing values in the data into account:

.. ipython:: python

data = xr.DataArray([np.NaN, 2, 4])
weights = xr.DataArray([8, 1, 1])

data.weighted(weights).mean()

Using ``(data * weights).sum() / weights.sum()`` would (incorrectly) result
in 0.6.

.. note::
``weights`` must be a ``DataArray`` and cannot contain missing values.
Missing values can be replaced manually by `weights.fillna(0)`.

Coarsen large arrays
====================

Expand Down
1 change: 1 addition & 0 deletions doc/examples.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ Examples

examples/weather-data
examples/monthly-means
examples/area_weighted_temperature
examples/multidimensional-coords
examples/visualization_gallery
examples/ROMS_ocean_model
Expand Down
163 changes: 163 additions & 0 deletions doc/examples/area_weighted_temperature.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,163 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Compare weighted and unweighted mean temperature\n",
"\n",
"\n",
"Author: [Mathias Hauser](https://github.com/mathause/)\n",
"\n",
"The data used for this example can be found in the [xarray-data](https://github.com/pydata/xarray-data) repository. You may need to change the path to `air_temperature` below.\n",
"\n",
"We use the air_temperature example dataset to calculate the area-weighted temperature over its domain. This dataset has a regular latitude/ longitude grid, thus the gridcell area decreases towards the pole. For this grid we can use the cosine of the latitude as proxy for the grid cell area.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%matplotlib inline\n",
"\n",
"import cartopy.crs as ccrs\n",
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"\n",
"import xarray as xr"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Data\n",
"\n",
"Load the data, convert to celsius, and resample to daily values"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ds = xr.tutorial.load_dataset(\"air_temperature\")\n",
"\n",
"# to celsius\n",
"air = ds.air - 273.15\n",
"\n",
"# resample from 6-hourly to daily values\n",
"air = air.resample(time=\"D\").mean()\n",
"\n",
"air"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Plot the first timestep:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"projection = ccrs.LambertConformal(central_longitude=-95, central_latitude=45)\n",
"\n",
"f, ax = plt.subplots(subplot_kw=dict(projection=projection))\n",
"\n",
"air.isel(time=0).plot(transform=ccrs.PlateCarree(), cbar_kwargs=dict(shrink=0.7))\n",
"ax.coastlines()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Creating weights\n",
"\n",
"For a for a rectangular grid the cosine of the latitude is proportional to the grid cell area."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"weights = np.cos(np.deg2rad(air.lat))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Weighted mean"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"air_weighted = air.weighted(weights).mean((\"lon\", \"lat\"))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Plot: comparison with unweighted mean\n",
"\n",
"Note how the weighted mean temperature is higher than the unweighted."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"air_weighted.plot(label=\"weighted\")\n",
"air.mean((\"lon\", \"lat\")).plot(label=\"unweighted\")\n",
"\n",
"plt.legend()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
3 changes: 3 additions & 0 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,9 @@ Breaking changes
New Features
~~~~~~~~~~~~

- Weighted array reductions are now supported via the new :py:meth:`DataArray.weighted`
and :py:meth:`Dataset.weighted` methods. By `Mathias Hauser <https://github.com/mathause>`_
(:issue:`422`).
- Added support for :py:class:`pandas.DatetimeIndex`-style rounding of
``cftime.datetime`` objects directly via a :py:class:`CFTimeIndex` or via the
:py:class:`~core.accessor_dt.DatetimeAccessor`.
Expand Down
19 changes: 19 additions & 0 deletions xarray/core/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -748,6 +748,25 @@ def groupby_bins(
},
)

def weighted(self, weights):
"""
Weighted operations.

Parameters
mathause marked this conversation as resolved.
Show resolved Hide resolved
----------
weights : DataArray
An array of weights associated with the values in this Dataset.
Each value in the data contributes to the reduction operation
according to its associated weight.
mathause marked this conversation as resolved.
Show resolved Hide resolved

Note
----
``weights`` must be a ``DataArray`` and cannot contain missing values.
Missing values can be replaced by ``weights.fillna(0)``.
"""

return self._weighted_cls(self, weights)

def rolling(
self,
dim: Mapping[Hashable, int] = None,
Expand Down
2 changes: 2 additions & 0 deletions xarray/core/dataarray.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@
resample,
rolling,
utils,
weighted,
)
from .accessor_dt import CombinedDatetimelikeAccessor
from .accessor_str import StringAccessor
Expand Down Expand Up @@ -258,6 +259,7 @@ class DataArray(AbstractArray, DataWithCoords):
_rolling_cls = rolling.DataArrayRolling
_coarsen_cls = rolling.DataArrayCoarsen
_resample_cls = resample.DataArrayResample
_weighted_cls = weighted.DataArrayWeighted

dt = property(CombinedDatetimelikeAccessor)

Expand Down
2 changes: 2 additions & 0 deletions xarray/core/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@
resample,
rolling,
utils,
weighted,
)
from .alignment import _broadcast_helper, _get_broadcast_dims_map_common_coords, align
from .common import (
Expand Down Expand Up @@ -457,6 +458,7 @@ class Dataset(Mapping, ImplementsDatasetReduce, DataWithCoords):
_rolling_cls = rolling.DatasetRolling
_coarsen_cls = rolling.DatasetCoarsen
_resample_cls = resample.DatasetResample
_weighted_cls = weighted.DatasetWeighted

def __init__(
self,
Expand Down
Loading