-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow fsspec/zarr/mfdataset #4461
Changes from all commits
a1a794f
4bc674e
ca029ea
67a86f7
3fe984d
ee48ae2
53ced82
46e068a
65d3862
0596088
d34423b
29305aa
0b8e25d
13855ff
2da9b4d
db4a84e
cce42e2
7516d3e
bb4d2a9
857077a
3ad1448
c5f6249
bd26f79
64a6150
ef85740
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -340,6 +340,7 @@ def open_dataset( | |
ends with .gz, in which case the file is gunzipped and opened with | ||
scipy.io.netcdf (only netCDF3 supported). Byte-strings or file-like | ||
objects are opened by scipy.io.netcdf (netCDF3) or h5py (netCDF4/HDF). | ||
Also supports arbitrary ``fsspec`` URLs, only for the "zarr" backend. | ||
group : str, optional | ||
Path to the netCDF4 group in the given file to open (only works for | ||
netCDF4 files). | ||
|
@@ -400,7 +401,10 @@ def open_dataset( | |
backend_kwargs: dict, optional | ||
A dictionary of keyword arguments to pass on to the backend. This | ||
may be useful when backend options would improve performance or | ||
allow user control of dataset processing. | ||
allow user control of dataset processing. When using an ``fsspec`` | ||
path for the filename, they key ``storage_options`` can be used | ||
here to configure the backend storage instance. Alternatively, a | ||
pre-configured file instance can be supplied with key ``fs``. | ||
use_cftime: bool, optional | ||
Only relevant if encoded dates come from a standard calendar | ||
(e.g. "gregorian", "proleptic_gregorian", "standard", or not | ||
|
@@ -548,11 +552,14 @@ def maybe_decode_store(store, chunks): | |
ds2._file_obj = ds._file_obj | ||
return ds2 | ||
|
||
filename_or_obj = _normalize_path(filename_or_obj) | ||
fs = backend_kwargs.get("fs", None) | ||
if fs is None: | ||
filename_or_obj = _normalize_path(filename_or_obj) | ||
|
||
if isinstance(filename_or_obj, AbstractDataStore): | ||
store = filename_or_obj | ||
else: | ||
backend_kwargs = backend_kwargs.copy() | ||
if engine is None: | ||
engine = _autodetect_engine(filename_or_obj) | ||
|
||
|
@@ -564,9 +571,15 @@ def maybe_decode_store(store, chunks): | |
|
||
if engine == "zarr": | ||
backend_kwargs = backend_kwargs.copy() | ||
backend_kwargs.pop("fs", None) | ||
overwrite_encoded_chunks = backend_kwargs.pop( | ||
"overwrite_encoded_chunks", None | ||
) | ||
extra_kwargs["mode"] = "r" | ||
extra_kwargs["group"] = group | ||
if fs is not None: | ||
filename_or_obj = fs.get_mapper(filename_or_obj) | ||
Comment on lines
+580
to
+581
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Rather than adding the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That works already, and will continue to work. However, the whole point. of this PR is to allow working out those details in a single call to open_dataset, which turns out very convenient for encoding in an Intake catalog, for instance, or indeed for the open_mfdataset implementation in here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note that we are working on refactor of the backend API that among other things aims at removing all knowledge of what backends can or can't do from I would suggest to move the call to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the heads up. I already did one slightly complex conflict resolve. It isn't totally clear, though, that the logic can be buried in the zarr engine for two reasons:
Happy to go whichever way is most convenient for the library. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We need to resolve this discussion in order to decide what to do about this PR. Any more thoughts from other devs. In my view, some of the fsspec logic introduced in the PR should eventually move to the generic |
||
backend_kwargs.pop("storage_options", None) | ||
|
||
opener = _get_backend_cls(engine) | ||
store = opener(filename_or_obj, **extra_kwargs, **backend_kwargs) | ||
|
@@ -786,7 +799,8 @@ def open_mfdataset( | |
files to open. Paths can be given as strings or as pathlib Paths. If | ||
concatenation along more than one dimension is desired, then ``paths`` must be a | ||
nested list-of-lists (see ``combine_nested`` for details). (A string glob will | ||
be expanded to a 1-dimensional list.) | ||
be expanded to a 1-dimensional list.). When engine=="zarr", the path(s) can | ||
be of any type understood by ``fsspec``. | ||
chunks : int or dict, optional | ||
Dictionary with keys given by dimension names and values given by chunk sizes. | ||
In general, these should divide the dimensions of each dataset. If int, chunk | ||
|
@@ -907,13 +921,27 @@ def open_mfdataset( | |
""" | ||
if isinstance(paths, str): | ||
if is_remote_uri(paths): | ||
raise ValueError( | ||
"cannot do wild-card matching for paths that are remote URLs: " | ||
"{!r}. Instead, supply paths as an explicit list of strings.".format( | ||
paths | ||
if engine != "zarr": | ||
raise ValueError( | ||
"cannot do wild-card matching for paths that are remote URLs: " | ||
"{!r}. Instead, supply paths as an explicit list of strings.".format( | ||
paths | ||
) | ||
) | ||
) | ||
paths = sorted(glob(paths)) | ||
else: | ||
import fsspec # type: ignore | ||
|
||
backend_kwargs = kwargs.get("backend_kwargs", {}) | ||
storage_options = backend_kwargs.get("storage_options", None) | ||
|
||
fs, _, _ = fsspec.core.get_fs_token_paths( | ||
paths, storage_options=storage_options | ||
) | ||
paths = fs.expand_path(paths) | ||
backend_kwargs["fs"] = fs | ||
kwargs["backend_kwargs"] = backend_kwargs | ||
else: | ||
paths = sorted(glob(paths)) | ||
else: | ||
paths = [str(p) if isinstance(p, Path) else p for p in paths] | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bad merge? This makes the docs build fail with a
SyntaxError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, interesting. Correcting...