-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rules for propagating attrs and encoding #1614
Comments
I'd also suggest that a global option of always_keep_attrs=True would be useful. While I understand the logic of dropping units during certain operations, it makes attributes unusable for storing other miscellaneous metadata, e.g. on data provenance. As a recent xarray convert, this behavior has been frustrating. |
This issue is very relevant for me too. I would like to also propose that a user could provide a function that would know how to combine the |
I am also interested. In terms of the table from @jhamman I am in principle ok with. However, there could be an option to refer to the original attrs in order to provide provenance even on operations like reduce and arithmetic. The idea here is reproducibility and tractability. Maybe an 'origin' attribute? |
The challenge with a user-specified function is that there can potentially be weird conflicts if multiple libraries try to override it. Possibly it's worth it for the convenience, but subclasses allowing for explicit hooks (like numpy) is probably the cleanest solution. |
Hi, this feature would be very relevant to the intended use case of a project I'd like to use |
Also - might I suggest you consider some kind of history tracker as part of the metadata propagation? Perhaps metadata could be saved from each step of a set of operations, so that there is a full paper trail for the set of operations have been applied to the data. It could however get complicated to merge together objects with their own separate histories, especially if they ultimately descend from the same original data set. This would be very relevant for scientific analyses. |
are you referring to a different issue? the first post only summarizes some simple proposed rules. |
Certainly this would be out of scope for xarray itself, but this perhaps be done with a library that wraps xarray's API. If I recall correctly, @pwolfram was also interested in this. We did discuss customizable hooks for attribute handling in #988 but I'm no longer sure that is a good idea. These sort of overloads are really hard to get right, as we've seen with NumPy's long history of different override protocols (the most recent example being |
Data lineage is a big, hard, unsolved problem ( To second @shoyer, I think it's big and difficult enough to be a separate library |
No, just the proposed feature to keep or delete metadata based on the various operations. Is this behaviour already part of the library, and this issue is just to clarify the intended behaviour, or is this a feature proposal? |
We already have most of this behavior (matching what @jhamman lists in the first comment), though it isn't clearly documented. It should just work if you use xarray methods/functions. |
@shoyer, I assume you are referring to the I realize that adding a module-level or |
I would happy to add a global |
Another one to decide is |
Note that this was implemented by @TomNicholas in #2482 |
We need to come up with some clear rules for when and how xarray should propagate metadata (attrs/encoding). This has come up routinely (e.g. #25, #138, #442, #688, #828, #988, #1009, #1271, #1297, #1586) and we don't have a clear direction as to when to keep/drop metadata.
I'll take a first cut:
cc @shoyer (following up on #1586 (comment))
The text was updated successfully, but these errors were encountered: