Rules for propagating attrs and encoding #1614

jhamman · 2017-10-09T22:56:02Z

We need to come up with some clear rules for when and how xarray should propagate metadata (attrs/encoding). This has come up routinely (e.g. #25, #138, #442, #688, #828, #988, #1009, #1271, #1297, #1586) and we don't have a clear direction as to when to keep/drop metadata.

I'll take a first cut:

operation	attrs	encoding	status
reduce	drop	drop
arithmetic	drop	drop	implemented
copy	keep	keep
concat	keep first	keep first	implemented
slice	keep	drop
where	keep	keep

cc @shoyer (following up on #1586 (comment))

ethan-campbell · 2017-11-10T23:50:18Z

I'd also suggest that a global option of always_keep_attrs=True would be useful. While I understand the logic of dropping units during certain operations, it makes attributes unusable for storing other miscellaneous metadata, e.g. on data provenance. As a recent xarray convert, this behavior has been frustrating.

mraspaud · 2018-02-02T09:13:38Z

This issue is very relevant for me too. I would like to also propose that a user could provide a function that would know how to combine the attrs of different DataArrays.

brey · 2018-02-02T15:54:41Z

I am also interested. In terms of the table from @jhamman I am in principle ok with. However, there could be an option to refer to the original attrs in order to provide provenance even on operations like reduce and arithmetic. The idea here is reproducibility and tractability. Maybe an 'origin' attribute?

shoyer · 2018-02-03T00:38:12Z

The challenge with a user-specified function is that there can potentially be weird conflicts if multiple libraries try to override it. Possibly it's worth it for the convenience, but subclasses allowing for explicit hooks (like numpy) is probably the cleanest solution.

SeanDS · 2018-06-18T07:28:32Z

Hi, this feature would be very relevant to the intended use case of a project I'd like to use xarray for. Is the behaviour discussed in the first post implemented anywhere, e.g. in the trunk, for me to play with?

SeanDS · 2018-06-18T07:32:08Z

Also - might I suggest you consider some kind of history tracker as part of the metadata propagation? Perhaps metadata could be saved from each step of a set of operations, so that there is a full paper trail for the set of operations have been applied to the data. It could however get complicated to merge together objects with their own separate histories, especially if they ultimately descend from the same original data set.

This would be very relevant for scientific analyses.

shoyer · 2018-06-18T18:30:16Z

Hi, this feature would be very relevant to the intended use case of a project I'd like to use xarray for. Is the behaviour discussed in the first post implemented anywhere, e.g. in the trunk, for me to play with?

are you referring to a different issue? the first post only summarizes some simple proposed rules.

shoyer · 2018-06-18T18:36:50Z

Also - might I suggest you consider some kind of history tracker as part of the metadata propagation?

Certainly this would be out of scope for xarray itself, but this perhaps be done with a library that wraps xarray's API. If I recall correctly, @pwolfram was also interested in this.

We did discuss customizable hooks for attribute handling in #988 but I'm no longer sure that is a good idea. These sort of overloads are really hard to get right, as we've seen with NumPy's long history of different override protocols (the most recent example being __array_ufunc__).

max-sixty · 2018-06-18T19:46:39Z

consider some kind of history tracker as part of the metadata propagation?

Data lineage is a big, hard, unsolved problem (~~for us~~ internally, above both naming things and cache invalidation :) )

To second @shoyer, I think it's big and difficult enough to be a separate library

SeanDS · 2018-06-18T20:23:41Z

are you referring to a different issue? the first post only summarizes some simple proposed rules.

No, just the proposed feature to keep or delete metadata based on the various operations. Is this behaviour already part of the library, and this issue is just to clarify the intended behaviour, or is this a feature proposal?

shoyer · 2018-06-18T21:12:18Z

No, just the proposed feature to keep or delete metadata based on the various operations. Is this behaviour already part of the library, and this issue is just to clarify the intended behaviour, or is this a feature proposal?

We already have most of this behavior (matching what @jhamman lists in the first comment), though it isn't clearly documented. It should just work if you use xarray methods/functions.

ethan-campbell · 2018-06-18T21:36:32Z

@shoyer, I assume you are referring to the keep_attrs option. Is there a way to persist attrs during arithmetic options? I find myself writing a bunch of boilerplate to transfer the wealth of metadata included with most netCDF files.

I realize that adding a module-level or DataArray instance-specific maintain_attrs configuration flag (as discussed in #131, #988, #1271) could be problematic, but this strikes me as complexity worth adding. The current approach of dropping all metadata (not just units) seems heavy-handed and unintuitive for new/casual users. As you mentioned in #1271, better to have stale metadata than no metadata at all.

shoyer · 2018-06-18T21:41:46Z

I would happy to add a global keep_attrs option to xarray.set_options(), which we could use for controlling arithmetic. I'm not planning on working on it personally, but I would be happy to review a PR.

gerritholl · 2018-10-31T17:56:41Z

Another one to decide is xarray.zeros_like(...) and friends.

shoyer · 2018-11-03T21:18:11Z

I would happy to add a global keep_attrs option to xarray.set_options(), which we could use for controlling arithmetic. I'm not planning on working on it personally, but I would be happy to review a PR.

Note that this was implemented by @TomNicholas in #2482

shoyer mentioned this issue Jun 22, 2018

Indexing preserves outdated attrs which cause trouble downstream #2247

Closed

fmaussion mentioned this issue Jul 16, 2018

Add CRS/projection information to xarray objects #2288

Open

shoyer mentioned this issue Jul 16, 2018

Encoding not preserved when using where function #2291

Closed

shoyer mentioned this issue Nov 26, 2018

Confusing error message when attribute not equal during concat #2060

Closed

TomNicholas mentioned this issue Dec 24, 2018

save "encoding" when using open_mfdataset #2436

Open

TomNicholas mentioned this issue Feb 27, 2019

Improved default behavior when concatenating DataArrays #2777

Closed

3 tasks

klindsay28 mentioned this issue May 6, 2019

to_netcdf with decoded time can create file with inconsistent time:units and time_bounds:units #2921

Closed

klindsay28 mentioned this issue Oct 16, 2019

to_dataset_dict not setting time.encoding when decode_times=True intake/intake-esm#153

Closed

dcherian mentioned this issue Oct 28, 2019

Avoid setting the .data attribute NCAR/esmlab#156

Merged

andersy005 mentioned this issue Oct 28, 2019

Move general functionality upstream NCAR/esmlab#157

Open

dcherian mentioned this issue Dec 17, 2019

concat keeps attrs from first variable. #3637

Merged

4 tasks

shoyer mentioned this issue Mar 26, 2020

Keep attrs by default? (keep_attrs) #3891

Open

TomNicholas added the topic-metadata Relating to the handling of metadata (i.e. attrs and encoding) label Apr 5, 2020

weiji14 mentioned this issue Jun 28, 2020

More informative metadata when loading earth_relief grids GenericMappingTools/pygmt#494

Closed

5 tasks

andersy005 mentioned this issue Dec 15, 2020

How to retain metadata NCAR/geocat-f2py#27

Closed

weiji14 mentioned this issue Feb 12, 2021

GMTDataArrayAccessor doesn't work for temporary files that have been sliced GenericMappingTools/pygmt#524

Open

This was referenced Mar 25, 2021

Zarr chunking fixes #5065

Merged

Move encoding from xarray.Variable to duck arrays? #5082

Open

Write with Zarr pangeo-forge/pangeo-forge-recipes#86

Merged

keewis mentioned this issue Mar 3, 2022

propagation of encoding #6323

Open

jhamman mentioned this issue Dec 14, 2022

'open_mfdataset' zarr zip timestamp issue #7354

Open

4 tasks

seisman mentioned this issue Aug 16, 2023

load_earth_mask: Keep data's encoding to correctly infer data's registration and gtype information GenericMappingTools/pygmt#2632

Merged

7 tasks

TomNicholas mentioned this issue Apr 1, 2024

How to handle encoding zarr-developers/VirtualiZarr#68

Open

TomNicholas mentioned this issue May 7, 2024

Alternative to dropping attributes that vary between datasets pangeo-forge/pangeo-forge-recipes#743

Open

TomNicholas mentioned this issue Jun 20, 2024

Coordinate inheritance for xarray.DataTree #9077

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rules for propagating attrs and encoding #1614

Rules for propagating attrs and encoding #1614

jhamman commented Oct 9, 2017 •

edited by dcherian

Loading

ethan-campbell commented Nov 10, 2017

mraspaud commented Feb 2, 2018

brey commented Feb 2, 2018

shoyer commented Feb 3, 2018

SeanDS commented Jun 18, 2018

SeanDS commented Jun 18, 2018

shoyer commented Jun 18, 2018

shoyer commented Jun 18, 2018

max-sixty commented Jun 18, 2018 •

edited

Loading

SeanDS commented Jun 18, 2018 •

edited

Loading

shoyer commented Jun 18, 2018

ethan-campbell commented Jun 18, 2018

shoyer commented Jun 18, 2018

gerritholl commented Oct 31, 2018

shoyer commented Nov 3, 2018

Rules for propagating attrs and encoding #1614

Rules for propagating attrs and encoding #1614

Comments

jhamman commented Oct 9, 2017 • edited by dcherian Loading

ethan-campbell commented Nov 10, 2017

mraspaud commented Feb 2, 2018

brey commented Feb 2, 2018

shoyer commented Feb 3, 2018

SeanDS commented Jun 18, 2018

SeanDS commented Jun 18, 2018

shoyer commented Jun 18, 2018

shoyer commented Jun 18, 2018

max-sixty commented Jun 18, 2018 • edited Loading

SeanDS commented Jun 18, 2018 • edited Loading

shoyer commented Jun 18, 2018

ethan-campbell commented Jun 18, 2018

shoyer commented Jun 18, 2018

gerritholl commented Oct 31, 2018

shoyer commented Nov 3, 2018

jhamman commented Oct 9, 2017 •

edited by dcherian

Loading

max-sixty commented Jun 18, 2018 •

edited

Loading

SeanDS commented Jun 18, 2018 •

edited

Loading