-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow engine='zarr' and passing args for new xarray API #89
Conversation
self._ds = xr.open_dataset(self.urlpath, engine='zarr', | ||
backend_kwargs={"storage_options": self.storage_options}, | ||
**self.kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add in xr.open_mfdataset
in here, or do it in a separate PR?
self._ds = xr.open_dataset(self.urlpath, engine='zarr', | |
backend_kwargs={"storage_options": self.storage_options}, | |
**self.kwargs) | |
if "*" in url or isinstance(url, list): | |
_open_dataset = xr.open_mfdataset | |
else: | |
_open_dataset = xr.open_dataset | |
self._ds = _open_dataset(self.urlpath, engine='zarr', | |
backend_kwargs={"storage_options": self.storage_options}, | |
**self.kwargs) |
Probably need to include more code following the NetCDF driver at
intake-xarray/intake_xarray/netcdf.py
Lines 65 to 76 in bd43095
if "*" in url or isinstance(url, list): | |
_open_dataset = xr.open_mfdataset | |
if self.pattern: | |
kwargs.update(preprocess=self._add_path_to_ds) | |
if self.combine is not None: | |
if 'combine' in kwargs: | |
raise Exception("Setting 'combine' argument twice in the catalog is invalid") | |
kwargs.update(combine=self.combine) | |
if self.concat_dim is not None: | |
if 'concat_dim' in kwargs: | |
raise Exception("Setting 'concat_dim' argument twice in the catalog is invalid") | |
kwargs.update(concat_dim=self.concat_dim) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, but let's make it a separate follow-on PR (since the zarr backend didn't support multiple datasets before anyway). That code looks like it doesn't have anything specific to zarr, so it should be reusable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good 👍 Edit: Actually, I just remembered that @Mikejmnez had a PR for multiple zarr at #80 already!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I do have something kind of working. @martindurant was working on doing some changes at the xarray level that would have changed the way multiple zarr files are passed through intake. I see that that pull request hasn't passed... I guess I am just waiting on the new implemented changes and to see how tests are going to be changing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be close, that PR at pydata/xarray#4461 is approved, just needs to get merged
intake_xarray/tests/test_remote.py
Outdated
@@ -45,6 +47,7 @@ def test_list(data_server): | |||
|
|||
|
|||
def test_open_rasterio(data_server): | |||
pytest.importorskip("resterio") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be rasterio?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That does seem likely... It would not have made a difference on CI, since rasterio is included in the env.
(reminder: this PR is no good until pydata/xarray#4461 is merged/reworked/whatever and released) |
Catching up on intake and xarray development this morning ;) If I'm following correctly, this PR now depends on pydata/xarray#4823, and it would close #70 possibly #94 as well |
Yes, that is the new PR. There is no hurry to get this in. |
I mean, no hurry on the intake-xarray PR (this one), but I really do want to get the xarray one done! |
Looks like pydata/xarray#4823 got merged and xarray v0.17.0 has been released. Anyone want to work on this? If not I'll try and get my head around what happened and move things forward. |
@weiji14 , you are welcome to. However, there i also a lot of conversation around how to pass URLs (or file objects, mappers) to xarray which depends on the backend eventually selected. intake-xarray is in a good position to make decisions and add functionality such as path-name interpreting, but in many cases (like zarr), it makes sense to defer to xrray's logic when it exists. |
No description provided.