You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I took the annual average of some data. In one example grid cell, the values were:
[ nan -0.02 -0.02 nan nan nan nan nan nan nan nan nan]
Since two months have the same value and data is missing for all other months, the weighted average should be -0.02. Instead, I get -0.0032786885; group_average seems to be incorrectly excluding the missing data.
What did you expect to happen?
I expected to get -0.02.
Minimal Complete Verifiable Example
Load some data for an example grid cell:
importxcdat# open datasetfn='/p/user_pub/climate_work/pochedley1/surface/gistemp1200_GHCNv4_ERSSTv5.nc'ds=xcdat.open_dataset(fn)
# fix missing calendartime_encoding=ds.time.encodingtime_encoding['calendar'] ='standard'ds.time.encoding=time_encoding# select grid cell in subsetted datasetdss=ds.isel(lat=[11], lon=[0])
Note that first year of data has two months (of twelve) with data values
print(dss.tempanomaly[0:12, 0, 0].values)
[ nan -0.02 -0.02 nan nan nan nan nan nan nan nan nan]
Taking the annual average I get an unexpected result:
* initial fix for #319
* Refactor `_group_average()`
- Preserve data variable attributes using `xr.set_options(keep_attrs=True)`
- Reuse `self._labeled_time` if it is already set in a previous call to `_group_data()`
- Update group average tests to check data variable test attr is preserved
Co-authored-by: Tom Vo <[email protected]>
Repository owner
moved this from In Progress
to Done
in v0.3.2Aug 25, 2022
What happened?
I took the annual average of some data. In one example grid cell, the values were:
Since two months have the same value and data is missing for all other months, the weighted average should be
-0.02
. Instead, I get-0.0032786885
;group_average
seems to be incorrectly excluding the missing data.What did you expect to happen?
I expected to get
-0.02
.Minimal Complete Verifiable Example
Load some data for an example grid cell:
Note that first year of data has two months (of twelve) with data values
Taking the annual average I get an unexpected result:
This is not the right value. Since there are two values in the first year (and they are the same), then the average should simply be
-0.02
.I think I see what is happening. Each month is assigned some weight in the first year (proportional to the number of days in each month):
If we consider just the first year, then:
A simple weighted average is
WA = sum(T*W)/sum(W)
The problem is that there should be no weight assigned if there is no data for a given month/index. The weights should be corrected to reflect this:
This weighting matrix would yield
-0.02
(which is the correct answer).Relevant log output
No response
Anything else we need to know?
I think this is probably handled correctly in
_average
(via xarray.mean
), but_group_average
is calling.sum
instead of.mean
.Environment
main branch
The text was updated successfully, but these errors were encountered: