-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dimensions: type as str | Iterable[Hashable]
?
#6142
Comments
Hashable
vs. tuplesstr | Iterable[Hashable]
?
Yes, this was my suggestion. |
Thanks for writing this out. I realize something clever about the proposal: Our very first line of code in the docs passes dims as a tuple This will still work as two dims, since having Does this mean that this would be confusion free? |
Yes, I think that is the gist of the proposal. I also think that it is quite elegant.
I think it would be relatively unambiguous but not necessarily entirely confusion free for the user ("Why is the type of a single dim more restricted than of several dimensions?"). Maybe this warrants an entry in our FAQ or somewhere? Also we need to go over our code and update our typing and make sure stuff like |
Sorry, I don't understand this - if I want to type hint |
I agree this is confusing, what is meant here is to use a tuple as the name for one dimension. An example: ds = xr.Dataset({"x": ([("a", "b")], [1])})
print(ds)
print(f"{ds.x.dims = }") <xarray.Dataset>
Dimensions: (('a', 'b'): 1)
Dimensions without coordinates: ('a', 'b')
Data variables:
x (('a', 'b')) int64 1
ds.x.dims = (('a', 'b'),) I found the issue again where we discussed that we want to keep |
In case you are missing a use case for tuples as dimension/variable names: We have a system with two detectors, lets call them "a" and "b". Now instead of calling the dimensions "x_a", "y_a", "x_b", "y_b" we call them ("x", "a") etc. |
I was just looking through the code a bit and I must say that the usage of dimension arguments is far from harmonized in the code....
At least internally most objects use The lazy shortcut to allow a single str is making this more complicated than one might think (but I have to admit, I use it all the time). |
I would have thought this is indeed the case. I think we're quite open to adjusting this so that it works, but it would probably needs some work, including defining common patterns — e.g. the title of this issue is a good approach
I also think we'd be up for reforming this! Generally this is the first arg and so only used positionally, but it still makes sense to have them consistent. |
Only remotely related to this issue, but would it help to type the dimension (or variable names etc) Mappings with covariant Hashable keys to prevent the issue that Something like: Hashable_co = TypeVar("Hashable_co", bound=Hashable, covariant=True)
Mapping[Hashable_co, Any] Same for lists etc. (Or use Sequences which are already covariant, as opposed to lists) I am not an expert on variance in types and don't know if this would break things, but at least in my tests mypy did not complain when using See https://mypy-play.net/?mypy=latest&python=3.10&gist=b2c27a5ca4aad9e8ce63cd74dc4032b0 |
I spent a non-trivial amount of time on this a year ago or so — my understanding is that But I'm also not sure we lose anything with e.g. https://mypy-play.net/?mypy=latest&python=3.10&gist=a3ea9f6fce5a678803bd7563a54cd9ca
Very cool! I didn't even know about this. |
Wow, totally forgot about this. Then you are right, Any as key is totally ok. Although if I am not mistaken, one could define a custom Mapping class that allows non-hashable keys, this is quite a niche use case and there are more urgent things to type, like this issue. |
I have encountered another problem that is related to this: So I tried to overload this: @overload
def argmin(
self,
dim: Hashable,
*,
axis: int | None = None,
keep_attrs: bool | None = None,
skipna: bool | None = None,
) -> DataArray:
...
@overload
def argmin(
self,
dim: Sequence[Hashable] | None = None,
axis: int | None = None,
keep_attrs: bool | None = None,
skipna: bool | None = None,
) -> dict[Hashable, DataArray]:
... but mypy complains:
I suspect that this is due to the fact that None is actually Hashable and therefore a valid dimension. Unfortunately, using this proposal of |
Ah, that's a good case @headtr1ck . So I don't have an immediate solution in mind unfortunately. I'm not sure having different return types is a great design in general, though I can see the logic for it in this case... |
See python/typing#256 as reference for this problem. |
Should we close? I think this is mostly done. I see it as a big success! Except for the overload case, which I don't think is actionable atm, is there anything remaining here? |
I think even the overload case is widely accepted to just type ignore it. The next step would be to become generic in :) We can close this. |
What happened?
We generally type dimensions as:
However, this is in conflict with passing a tuple of independent dimensions to a method - e.g.
da.mean(("x", "y"))
because a tuple is also hashable.Also mypy requires an
isinstance(dims, Hashable)
check when typing a function. We use anisinstance(dims, str)
check in many places to wrap a single dimension in a list. Changing this toisinstance(dims, Hashable)
will change the behavior for tuples.What did you expect to happen?
In the community call today we discussed to change this to
i.e. if a single dim is passed it has to be a string and wrapping it in a list is a convenience function. Special use cases with
Hashable
types should be wrapped in aIterable
by the user. This probably best reflects the current state of the repo (dims = [dims] if isinstance(dims, str) else dims
).The disadvantage could be that it is a bit more difficult to explain in the docstrings?
@shoyer - did I get this right from the discussion?
Other options
str
as dimension names.This could be too restrictive. @keewis mentioned that tuple dimension names are already used somwehere in the xarray repo. Also we discussed in another issue or PR (which I cannot find right know) that we want to keep allowing
Hashable
.This is too restrictive in the other direction and will probably lead to a lot of downstream troubles. Naming a single dimension with a tuple will be a very rare case, in contrast to passing several dimension names as a tuple.
dims
is a tuple and if there are any dimension names consisting of a tuple. Seems more complicated and potentially brittle for probably small gains (IMO).Minimal Complete Verifiable Example
No response
Relevant log output
No response
Anything else we need to know?
Hashable
are really allowed. E.g.dims
of aDataArray
are typed asxarray/xarray/core/dataarray.py
Line 380 in e056cac
but tuples are not actually allowed:
xr.concat
, which should probably setdim: Hashable
(and make sure it works).str
andtuple
? (Would be good for testing purposes).Environment
N/A
The text was updated successfully, but these errors were encountered: