Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion: Add option for default_fillvals to open_dataset #2374

Open
MeraX opened this issue Aug 18, 2018 · 2 comments · May be fixed by #5680
Open

Suggestion: Add option for default_fillvals to open_dataset #2374

MeraX opened this issue Aug 18, 2018 · 2 comments · May be fixed by #5680

Comments

@MeraX
Copy link
Contributor

MeraX commented Aug 18, 2018

Hi,

May I suggest having a default_fillvals option to xarray.open_dataset (and xarray.open_dataarray)?

My problem:

I have netcdf data containing flagged data, that is flagged with the netcdf default fill value of 9.96...e+36. But xarray (0.10.8) only masks arrays that have an explicit fill_value set:

import netCDF4, xarray, numpy

nc = netCDF4.Dataset('test.nc', 'w', format='NETCDF4')
nc.createDimension('x', 3)

var1 = nc.createVariable('var1', 'f8', ('x',))
var2 = nc.createVariable('var2', 'f8', ('x',), fill_value=netCDF4.default_fillvals['f8'])

var1[:] = numpy.array([0., 1., netCDF4.default_fillvals['f8']])
var2[:] = numpy.array([0., 1., netCDF4.default_fillvals['f8']])
print('netCDF4 var1', nc.variables['var1'][:])
print('netCDF4 var2', nc.variables['var2'][:])
nc.close()

ds = xarray.open_dataset('test.nc')
print('xarray var1', ds.var1[:])
print('xarray var2', ds.var2[:])

The problem is, that ds.var1 and ds.var2 are interpreted differently, although netCDF4 shows both as masked:

netCDF4 var1 [0.0 1.0 --]
netCDF4 var2 [0.0 1.0 --]
xarray var1 <xarray.DataArray 'var1' (x: 3)>
array([0.00000e+00, 1.00000e+00, 9.96921e+36])
Dimensions without coordinates: x
xarray var2 <xarray.DataArray 'var2' (x: 3)>
array([ 0.,  1., nan])
Dimensions without coordinates: x

I agree, that it is a good default, to mask data, only if the fill_value attribute is set. But I think it would be useful to be able to pass default_fill values to open_dataset to enable reading data, that uses the implicit default values.

What do you think?

@gcaria
Copy link
Contributor

gcaria commented Jul 17, 2021

This is still relevant, should the argument decode_cf deal with it?

@dcherian
Copy link
Contributor

Yes I think it should go here:

def decode(self, variable, name=None):
dims, data, attrs, encoding = unpack_for_decoding(variable)
raw_fill_values = [
pop_to(attrs, encoding, attr, name=name)
for attr in ("missing_value", "_FillValue")
]
if raw_fill_values:
encoded_fill_values = {
fv
for option in raw_fill_values
for fv in np.ravel(option)
if not pd.isnull(fv)
}
if len(encoded_fill_values) > 1:
warnings.warn(
"variable {!r} has multiple fill values {}, "
"decoding all values to NaN.".format(name, encoded_fill_values),
SerializationWarning,
stacklevel=3,
)
dtype, decoded_fill_value = dtypes.maybe_promote(data.dtype)
if encoded_fill_values:
transform = partial(
_apply_mask,
encoded_fill_values=encoded_fill_values,
decoded_fill_value=decoded_fill_value,
dtype=dtype,
)
data = lazy_elemwise_func(data, transform, dtype)
return Variable(dims, data, attrs, encoding)

where it will be controlled by both decode_cf and mask_and_scale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants