Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: incorrect group averaging with missing data #319

Closed
pochedls opened this issue Aug 17, 2022 · 0 comments · Fixed by #320
Closed

[Bug]: incorrect group averaging with missing data #319

pochedls opened this issue Aug 17, 2022 · 0 comments · Fixed by #320
Assignees
Labels
type: bug Inconsistencies or issues which will cause an issue or problem for users or implementors.

Comments

@pochedls
Copy link
Collaborator

pochedls commented Aug 17, 2022

What happened?

I took the annual average of some data. In one example grid cell, the values were:

[ nan -0.02 -0.02 nan nan nan nan nan nan nan nan nan]

Since two months have the same value and data is missing for all other months, the weighted average should be -0.02. Instead, I get -0.0032786885; group_average seems to be incorrectly excluding the missing data.

What did you expect to happen?

I expected to get -0.02.

Minimal Complete Verifiable Example

Load some data for an example grid cell:

import xcdat
# open dataset
fn = '/p/user_pub/climate_work/pochedley1/surface/gistemp1200_GHCNv4_ERSSTv5.nc' 
ds = xcdat.open_dataset(fn)
# fix missing calendar
time_encoding = ds.time.encoding
time_encoding['calendar'] = 'standard'
ds.time.encoding = time_encoding
# select grid cell in subsetted dataset
dss = ds.isel(lat=[11], lon=[0])

Note that first year of data has two months (of twelve) with data values

print(dss.tempanomaly[0:12, 0, 0].values)

[ nan -0.02 -0.02 nan nan nan nan nan nan nan nan nan]

Taking the annual average I get an unexpected result:


ds_yearly = dss.temporal.group_average('tempanomaly', freq='year', weighted=True)

print(ds_yearly.tempanomaly[0, 0, 0].values)

-0.0032786885

This is not the right value. Since there are two values in the first year (and they are the same), then the average should simply be -0.02.

I think I see what is happening. Each month is assigned some weight in the first year (proportional to the number of days in each month):

w = dss.temporal._get_weights()

If we consider just the first year, then:

w1 = w[0:12]
t1 = dss.tempanomaly[0:12, 0, 0]
print(w1.values, t1.values)

[0.08469945 0.07923497 0.08469945 0.08196721 0.08469945 0.08196721
0.08469945 0.08469945 0.08196721 0.08469945 0.08196721 0.08469945]

[ nan -0.02 -0.02 nan nan nan nan nan nan nan nan nan]

A simple weighted average is WA = sum(T*W)/sum(W)

print((np.sum(w1*t1)/np.sum(w1)).values)

-0.0032786884513057645

The problem is that there should be no weight assigned if there is no data for a given month/index. The weights should be corrected to reflect this:

[0.0 0.07923497 0.08469945 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0]

This weighting matrix would yield -0.02 (which is the correct answer).

Relevant log output

No response

Anything else we need to know?

I think this is probably handled correctly in _average (via xarray .mean), but _group_average is calling .sum instead of .mean.

Environment

main branch

@pochedls pochedls added the type: bug Inconsistencies or issues which will cause an issue or problem for users or implementors. label Aug 17, 2022
@tomvothecoder tomvothecoder moved this to Todo in v0.3.2 Aug 18, 2022
@tomvothecoder tomvothecoder moved this from Todo to In Progress in v0.3.2 Aug 18, 2022
@tomvothecoder tomvothecoder moved this from In Progress to Review In Progress in v0.3.2 Aug 18, 2022
@tomvothecoder tomvothecoder moved this from Review In Progress to In Progress in v0.3.2 Aug 18, 2022
pochedls added a commit that referenced this issue Aug 25, 2022
* initial fix for #319 
* Refactor `_group_average()`
- Preserve data variable attributes using `xr.set_options(keep_attrs=True)`
- Reuse `self._labeled_time` if it is already set in a previous call to `_group_data()`
- Update group average tests to check data variable test attr is preserved

Co-authored-by: Tom Vo <[email protected]>
Repository owner moved this from In Progress to Done in v0.3.2 Aug 25, 2022
@tomvothecoder tomvothecoder removed this from v0.3.2 Sep 8, 2022
@tomvothecoder tomvothecoder moved this to Todo in v0.3.2 Sep 8, 2022
@tomvothecoder tomvothecoder moved this from Todo to Done in v0.3.2 Sep 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Inconsistencies or issues which will cause an issue or problem for users or implementors.
Projects
No open projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants