Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

expand_dims() modifies numpy.ndarray.flags to write only, upon manually reverting this flag back, attempting to set a single inner value using .loc will instead set all of the inner array values #2891

Closed
davehemming opened this issue Apr 13, 2019 · 6 comments · Fixed by #3114

Comments

@davehemming
Copy link

davehemming commented Apr 13, 2019

I am using the newly updated expand_dims API that was recently updated with this PR #2757. However the flag setting behaviour can also be observed using the old API syntax.

>>> expanded_da = xr.DataArray(np.random.rand(3,3), coords={'x': np.arange(3), 'y': np.arange(3)}, dims=('x', 'y')) # Create a 2D DataArray
>>> expanded_da
<xarray.DataArray (x: 3, y: 3)>
array([[0.148579, 0.463005, 0.224993],
       [0.633511, 0.056746, 0.28119 ],
       [0.390596, 0.298519, 0.286853]])
Coordinates:
  * x        (x) int64 0 1 2
  * y        (y) int64 0 1 2

>>> expanded_da.data.flags # Check current state of numpy flags
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False

>>> expanded_da.loc[0, 0] = 2.22 # Set a single value before expanding
>>> expanded_da # It works, the single value is set
<xarray.DataArray (x: 3, y: 3)>
array([[2.22    , 0.463005, 0.224993],
       [0.633511, 0.056746, 0.28119 ],
       [0.390596, 0.298519, 0.286853]])
Coordinates:
  * x        (x) int64 0 1 2
  * y        (y) int64 0 1 2

>>> expanded_da = expanded_da.expand_dims({'z': 3}, -1) # Add a new dimension 'z'
>>> expanded_da
<xarray.DataArray (x: 3, y: 3, z: 3)>
array([[[2.22    , 2.22    , 2.22    ],
        [0.463005, 0.463005, 0.463005],
        [0.224993, 0.224993, 0.224993]],

       [[0.633511, 0.633511, 0.633511],
        [0.056746, 0.056746, 0.056746],
        [0.28119 , 0.28119 , 0.28119 ]],

       [[0.390596, 0.390596, 0.390596],
        [0.298519, 0.298519, 0.298519],
        [0.286853, 0.286853, 0.286853]]])
Coordinates:
  * x        (x) int64 0 1 2
  * y        (y) int64 0 1 2
Dimensions without coordinates: z

>>> expanded_da['z'] = np.arange(3) # Add new coordinates to the new dimension 'z'
>>> expanded_da
<xarray.DataArray (x: 3, y: 3, z: 3)>
array([[[2.22    , 2.22    , 2.22    ],
        [0.463005, 0.463005, 0.463005],
        [0.224993, 0.224993, 0.224993]],

       [[0.633511, 0.633511, 0.633511],
        [0.056746, 0.056746, 0.056746],
        [0.28119 , 0.28119 , 0.28119 ]],

       [[0.390596, 0.390596, 0.390596],
        [0.298519, 0.298519, 0.298519],
        [0.286853, 0.286853, 0.286853]]])
Coordinates:
  * x        (x) int64 0 1 2
  * y        (y) int64 0 1 2
  * z        (z) int64 0 1 2

>>> expanded_da.loc[0, 0, 0] = 9.99 # Attempt to set a single value, get 'read-only' error
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/dhemming/.ve/unidata_notebooks/lib/python3.6/site-packages/xarray/core/dataarray.py", line 113, in __setitem__
    self.data_array[pos_indexers] = value
  File "/Users/dhemming/.ve/unidata_notebooks/lib/python3.6/site-packages/xarray/core/dataarray.py", line 494, in __setitem__
    self.variable[key] = value
  File "/Users/dhemming/.ve/unidata_notebooks/lib/python3.6/site-packages/xarray/core/variable.py", line 714, in __setitem__
    indexable[index_tuple] = value
  File "/Users/dhemming/.ve/unidata_notebooks/lib/python3.6/site-packages/xarray/core/indexing.py", line 1174, in __setitem__
    array[key] = value
ValueError: assignment destination is read-only

>>> expanded_da.data.flags # Check flags on the DataArray, notice they have changed
  C_CONTIGUOUS : False
  F_CONTIGUOUS : False
  OWNDATA : False
  WRITEABLE : False
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False

>>> expanded_da.data.setflags(write = 1) # Make array writeable again
>>> expanded_da.data.flags
  C_CONTIGUOUS : False
  F_CONTIGUOUS : False
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False

>>> expanded_da.loc[0, 0, 0] # Check the value I want to overwrite
<xarray.DataArray ()>
array(2.22)
Coordinates:
    x        int64 0
    y        int64 0
    z        int64 0

>>> expanded_da.loc[0, 0, 0] = 9.99 # Attempt to overwrite single value, instead it overwrites all values in the array located at [0, 0]
>>> expanded_da
<xarray.DataArray (x: 3, y: 3, z: 3)>
array([[[9.99    , 9.99    , 9.99    ],
        [0.463005, 0.463005, 0.463005],
        [0.224993, 0.224993, 0.224993]],

       [[0.633511, 0.633511, 0.633511],
        [0.056746, 0.056746, 0.056746],
        [0.28119 , 0.28119 , 0.28119 ]],

       [[0.390596, 0.390596, 0.390596],
        [0.298519, 0.298519, 0.298519],
        [0.286853, 0.286853, 0.286853]]])
Coordinates:
  * x        (x) int64 0 1 2
  * y        (y) int64 0 1 2
  * z        (z) int64 0 1 2

Problem description

When applying the operation 'expand_dims({'z': 3}, -1)' on a DataArray the underlying Numpy array flags are changed. 'C_CONTIGUOUS' is set to False, and 'WRITEABLE' is set to False, and 'OWNDATA' is set to False. Upon changing 'WRITEABLE' back to True, when I try to set a single value in the DataArray using the '.loc' operator it will instead set all the values in that selected inner array.

I am new to Xarray so I can't be entirely sure if this expected behaviour. Regardless I would expect that adding a new dimension to the array would not make that array 'read-only'. I would also not expect the '.loc' method to work differently to how it would otherwise.

It's also not congruent with the Numpy 'expand_dims' operation. Because when I call the operation 'np.expand_dims(np_arr, axis=-1)' the 'C_CONTIGUOUS ' and 'WRITEABLE ' flags will not be modified.

Expected Output

Here is a similar flow of operations that demonstrates the behaviour I would expect from the DataArray after applying 'expand_dims':

>>> non_expanded_da = xr.DataArray(np.random.rand(3,3,3), coords={'x': np.arange(3), 'y': np.arange(3)}, dims=('x', 'y', 'z')) # Create the new DataArray to be in the same state as I would expect it to be in after applying the operation 'expand_dims({'z': 3}, -1)'
>>> non_expanded_da
<xarray.DataArray (x: 3, y: 3, z: 3)>
array([[[0.017221, 0.374267, 0.231979],
        [0.678884, 0.512903, 0.737573],
        [0.985872, 0.1373  , 0.4603  ]],

       [[0.764227, 0.825059, 0.847694],
        [0.482841, 0.708206, 0.486576],
        [0.726265, 0.860627, 0.435101]],

       [[0.117904, 0.40569 , 0.274288],
        [0.079321, 0.647562, 0.847459],
        [0.57494 , 0.578745, 0.125309]]])
Coordinates:
  * x        (x) int64 0 1 2
  * y        (y) int64 0 1 2
Dimensions without coordinates: z

>>> non_expanded_da.data.flags # Check flags
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False

>>> non_expanded_da['z'] = np.arange(3) # Set coordinate for dimension 'z'
>>> non_expanded_da
<xarray.DataArray (x: 3, y: 3, z: 3)>
array([[[0.017221, 0.374267, 0.231979],
        [0.678884, 0.512903, 0.737573],
        [0.985872, 0.1373  , 0.4603  ]],

       [[0.764227, 0.825059, 0.847694],
        [0.482841, 0.708206, 0.486576],
        [0.726265, 0.860627, 0.435101]],

       [[0.117904, 0.40569 , 0.274288],
        [0.079321, 0.647562, 0.847459],
        [0.57494 , 0.578745, 0.125309]]])
Coordinates:
  * x        (x) int64 0 1 2
  * y        (y) int64 0 1 2
  * z        (z) int64 0 1 2

>>> non_expanded_da.loc[0, 0, 0] = 2.22 # Set value using .loc method
>>> non_expanded_da # The single value referenced is set which is what I expect to happen
<xarray.DataArray (x: 3, y: 3, z: 3)>
array([[[2.22    , 0.374267, 0.231979],
        [0.678884, 0.512903, 0.737573],
        [0.985872, 0.1373  , 0.4603  ]],

       [[0.764227, 0.825059, 0.847694],
        [0.482841, 0.708206, 0.486576],
        [0.726265, 0.860627, 0.435101]],

       [[0.117904, 0.40569 , 0.274288],
        [0.079321, 0.647562, 0.847459],
        [0.57494 , 0.578745, 0.125309]]])
Coordinates:
  * x        (x) int64 0 1 2
  * y        (y) int64 0 1 2
  * z        (z) int64 0 1 2

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.7 (default, Dec 29 2018, 12:05:36) [GCC 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.11.45.5)] python-bits: 64 OS: Darwin OS-release: 18.2.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_AU.UTF-8 LOCALE: en_AU.UTF-8 libhdf5: 1.10.2 libnetcdf: 4.4.1.1

xarray: 0.12.1
pandas: 0.24.2
numpy: 1.16.2
scipy: 1.2.1
netCDF4: 1.5.0
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.0.3.4
nc_time_axis: None
PseudonetCDF: None
rasterio: None
cfgrib: 0.9.6.1.post1
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: 3.0.3
cartopy: 0.17.0
seaborn: None
setuptools: 39.0.1
pip: 10.0.1
conda: None
pytest: None
IPython: None
sphinx: None

@davehemming davehemming changed the title expand_dims() modifies numpy.ndarray.flags to write only, when overridden setting value using .loc sets whole array expand_dims() modifies numpy.ndarray.flags to write only, upon manually reverting this flag back, attempting to set a single inner value using .loc will instead set the whole inner array values Apr 13, 2019
@davehemming davehemming changed the title expand_dims() modifies numpy.ndarray.flags to write only, upon manually reverting this flag back, attempting to set a single inner value using .loc will instead set the whole inner array values expand_dims() modifies numpy.ndarray.flags to write only, upon manually reverting this flag back, attempting to set a single inner value using .loc will instead set all of the inner array values Apr 13, 2019
@shoyer
Copy link
Member

shoyer commented Apr 13, 2019

As you've noticed, these arrays are "read only" because otherwise indexing can modify more than the original values, e.g., consider:

original = xr.DataArray(np.zeros(3), dims='x')
result = original.expand_dims(y=2)
result.data.flags.writeable = True
result[0, 0] = 1
print(result)

Both "y" values were set to 1!

<xarray.DataArray (y: 2, x: 3)>
array([[1., 0., 0.],
       [1., 0., 0.]])
Dimensions without coordinates: y, x

The work around is to call .copy() on the array after calling expand_dims(), e.g.,

original = xr.DataArray(np.zeros(3), dims='x')
result = original.expand_dims(y=2).copy()
result[0, 0] = 1
print(result)

Now the correct result is printed:

<xarray.DataArray (y: 2, x: 3)>
array([[1., 0., 0.],
       [0., 0., 0.]])
Dimensions without coordinates: y, x

This is indeed intended behavior: by making the result read-only, we can expand dimensions without copying the original data, and without needing to worry about indexing modifying the wrong values.

That said, we could certainly improve the usability of this feature in xarray. Some options:

  • Mention the work-around of using .copy() in the error message xarray prints when an array is read-only.
  • Add a copy argument to expand_dims, so users can write copy=True if they want a writeable result.
  • Consider changing the behavior when a dimension is inserted with size 1 to make the result writeable -- in this case, individual elements of result can be modified (which will also modify the input array). But maybe it would be surprising to users if the result of expand_dims() is sometimes but not always writeable?

@davehemming
Copy link
Author

Thank you @shoyer for taking the time to educate me on this. I understand completely now.

I agree that solutions one and two would be helpful for future developers new to Xarray and the expand_dims operation when they eventually encounter this behaviour. I also agree that option three would be confusing, and had it of been implemented as such I would still have found myself to be asking a similar question about why it is like that.

Another option to consider which might be easier still would be to just update the expand_dims documentation to include a note about this behaviour and the copy solution.

Thanks again!

@shoyer
Copy link
Member

shoyer commented Apr 14, 2019

OK, we would definitely welcome a pull request to improve this error message and the documentation for expand_dims!

I lean slightly against adding the copy argument since it's just as easy to add .copy() afterwards (that's one less function argument).

@davehemming
Copy link
Author

Yes good point. I do also think that the expand_dims interface should really only be responsible for that one single operation. If you then want to make a copy then go ahead and use that separate method afterwards.

Ok great, if I get time later today I'll see if I can't pick that up; that is if someone hasn't already done so in the meantime.

@floogit
Copy link

floogit commented Apr 17, 2019

I am also affected in some code which used to work with earlier versions of xarray. In this case, I call ds.expand_dims('new_dim') on some dataset (not DataArray), e.g.:

ds = xr.Dataset({'testvar': (['x'], np.zeros(3))})
ds1 = ds.expand_dims('y').copy()
ds1.testvar.data.flags

  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : False
  WRITEABLE : False
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False

The .copy() workaround is not helping in this case, I am not sure how to fix this?

@floogit
Copy link

floogit commented Apr 17, 2019

I have just realized that .copy(deep=True) is a possible fix for datasets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants