Suggestion: Add option for default_fillvals to open_dataset #2374

MeraX · 2018-08-18T19:47:53Z

Hi,

May I suggest having a default_fillvals option to xarray.open_dataset (and xarray.open_dataarray)?

My problem:

I have netcdf data containing flagged data, that is flagged with the netcdf default fill value of 9.96...e+36. But xarray (0.10.8) only masks arrays that have an explicit fill_value set:

import netCDF4, xarray, numpy

nc = netCDF4.Dataset('test.nc', 'w', format='NETCDF4')
nc.createDimension('x', 3)

var1 = nc.createVariable('var1', 'f8', ('x',))
var2 = nc.createVariable('var2', 'f8', ('x',), fill_value=netCDF4.default_fillvals['f8'])

var1[:] = numpy.array([0., 1., netCDF4.default_fillvals['f8']])
var2[:] = numpy.array([0., 1., netCDF4.default_fillvals['f8']])
print('netCDF4 var1', nc.variables['var1'][:])
print('netCDF4 var2', nc.variables['var2'][:])
nc.close()

ds = xarray.open_dataset('test.nc')
print('xarray var1', ds.var1[:])
print('xarray var2', ds.var2[:])

The problem is, that ds.var1 and ds.var2 are interpreted differently, although netCDF4 shows both as masked:

netCDF4 var1 [0.0 1.0 --]
netCDF4 var2 [0.0 1.0 --]
xarray var1 <xarray.DataArray 'var1' (x: 3)>
array([0.00000e+00, 1.00000e+00, 9.96921e+36])
Dimensions without coordinates: x
xarray var2 <xarray.DataArray 'var2' (x: 3)>
array([ 0.,  1., nan])
Dimensions without coordinates: x

I agree, that it is a good default, to mask data, only if the fill_value attribute is set. But I think it would be useful to be able to pass default_fill values to open_dataset to enable reading data, that uses the implicit default values.

What do you think?

The text was updated successfully, but these errors were encountered:

gcaria · 2021-07-17T09:58:13Z

This is still relevant, should the argument decode_cf deal with it?

dcherian · 2021-07-17T19:59:43Z

Yes I think it should go here:

xarray/xarray/coding/variables.py

Lines 179 to 213 in bc92331

    
           def decode(self, variable, name=None): 
        
               dims, data, attrs, encoding = unpack_for_decoding(variable) 
        
               raw_fill_values = [ 
        
                   pop_to(attrs, encoding, attr, name=name) 
        
                   for attr in ("missing_value", "_FillValue") 
        
               ] 
        
               if raw_fill_values: 
        
                   encoded_fill_values = { 
        
                       fv 
        
                       for option in raw_fill_values 
        
                       for fv in np.ravel(option) 
        
                       if not pd.isnull(fv) 
        
                   } 
        
                   if len(encoded_fill_values) > 1: 
        
                       warnings.warn( 
        
                           "variable {!r} has multiple fill values {}, " 
        
                           "decoding all values to NaN.".format(name, encoded_fill_values), 
        
                           SerializationWarning, 
        
                           stacklevel=3, 
        
                       ) 
        
                   dtype, decoded_fill_value = dtypes.maybe_promote(data.dtype) 
        
                   if encoded_fill_values: 
        
                       transform = partial( 
        
                           _apply_mask, 
        
                           encoded_fill_values=encoded_fill_values, 
        
                           decoded_fill_value=decoded_fill_value, 
        
                           dtype=dtype, 
        
                       ) 
        
                       data = lazy_elemwise_func(data, transform, dtype) 
        
               return Variable(dims, data, attrs, encoding)

where it will be controlled by both decode_cf and mask_and_scale.

dcherian added topic-CF conventions contrib-help-wanted labels Jul 12, 2019

gcaria linked a pull request Aug 6, 2021 that will close this issue

ENH: Add default fill values for decode_cf #5680

Open

4 tasks

kmuehlbauer mentioned this issue Apr 28, 2023

masked_array write/read differences between xarray and netCDF4 #2478

Closed

This was referenced Jul 26, 2023

default fillvalue not parsed by xarray, results in negative values in connectivity Deltares/xugrid#125

Closed

default fillvalues are not automatically parsed by xarray Deltares/dfm_tools#490

Open

kmuehlbauer mentioned this issue Sep 13, 2023

Unexpected type conversion in variables with _FillValue #6055

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestion: Add option for default_fillvals to open_dataset #2374

Suggestion: Add option for default_fillvals to open_dataset #2374

MeraX commented Aug 18, 2018

gcaria commented Jul 17, 2021

dcherian commented Jul 17, 2021

Suggestion: Add option for default_fillvals to open_dataset #2374

Suggestion: Add option for default_fillvals to open_dataset #2374

Comments

MeraX commented Aug 18, 2018

gcaria commented Jul 17, 2021

dcherian commented Jul 17, 2021