-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Support using opened netCDF4.Dataset (Fixes #1459) #1508
Conversation
xarray/backends/netCDF4_.py
Outdated
@@ -182,7 +182,10 @@ def _extract_nc4_variable_encoding(variable, raise_on_invalid=False, | |||
def _open_netcdf4_group(filename, mode, group=None, **kwargs): | |||
import netCDF4 as nc4 | |||
|
|||
ds = nc4.Dataset(filename, mode=mode, **kwargs) | |||
if isinstance(filename, nc4.Dataset): | |||
ds = filename |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One potential gotcha with reusing an existing netCDF4 dataset is that we disable automatic masking and scale on each variable (see below in this function).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why that would be a problem. It's transmuting state, but IMO if you're handing xarray a Dataset
, you kinda want it to do it's thing...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just make it very clear in the docs.
xarray/backends/netCDF4_.py
Outdated
if isinstance(filename, nc4.Dataset): | ||
ds = filename | ||
else: | ||
ds = nc4.Dataset(filename, mode=mode, **kwargs) | ||
|
||
with close_on_error(ds): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't want to automatically close already open netCDF4 datasets if there's a problem loading a group.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'll see about that when I refactor.
xarray/backends/netCDF4_.py
Outdated
@@ -197,6 +200,9 @@ class NetCDF4DataStore(WritableCFDataStore, DataStorePickleMixin): | |||
"""Store for reading and writing data via the Python-NetCDF4 library. | |||
|
|||
This store supports NetCDF3, NetCDF4 and OpenDAP datasets. | |||
|
|||
`filename` can be an already opened netCDF4 ``Dataset``; this will not | |||
support pickling. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe:
filename
can be an already opened netCDF4Dataset
orGroup
. In this case, the other netCDF4 specific arguments are ignored, and the DataStore object cannot be pickled.
xarray/backends/netCDF4_.py
Outdated
@@ -210,9 +216,9 @@ def __init__(self, filename, mode='r', format='NETCDF4', group=None, | |||
self.ds = opener() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory, I think we could support pickling if passed all the right keyword arguments in addition to the existing netCDF4.Dataset
object, and we used the filename/group from the input argument as arguments to opener
. But I'm not entirely sure that's worth doing.
It's probably simpler for now to explicitly raise an error for cases where we would re-use an opener, namely when pickling and/or using autoclose=True
. Can you add that? Right now I would guess that using either of those options could result in somewhat opaque errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I can catch autoclose
, but am I trying to throw an error in __getstate__
and __setstate__
for pickle? Not sure where to go for that part of the problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, netCDF4.Dataset
has a __reduce__
method that raises NotImplementedError, so maybe this isn't necessary
@@ -55,6 +55,12 @@ Enhancements | |||
(:issue:`576`). | |||
By `Stephan Hoyer <https://github.com/shoyer>`_. | |||
|
|||
- Support using an existing, opened netCDF4 ``Dataset`` with | |||
:py:class:`~xarray.backends.NetCDF4DataStore`. This permits creating an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about moving the current constructor logic into a classmethod NetCDF4DataStore.open(filename, mode, format, group, writer, clobber, diskless, persist, autoclose)
And adjusting the constructor: NetCDF4DataStore.__init__(self, netcdf4_dataset, opener=None, writer=None, autoclose=None)
.
Right now, I don't think anyone is using the NetCDF4DataStore
constructor directly -- there's literally no good reason for that. This also gives us a pattern we could use for other constructors (e.g., pydap) where passing in an existing object is desirable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to refactor like that--I just didn't think it was on the table.
I've taken a crack at refactoring the constructor to make that take the dataset instance and have an |
xarray/backends/netCDF4_.py
Outdated
self._opener = functools.partial(opener, mode=self._mode) | ||
super(NetCDF4DataStore, self).__init__(writer) | ||
|
||
ds = nc4.Dataset(filename, mode=mode, format=format, clobber=clobber, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you just re-use opener
here instead? e.g., ds = opener()
like how it was in __init__
before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So that was my attempt at not duplicating work needed in the opener, the constructor, and our new alternate constructor (open
):
- Open the filename and get a dataset
- Find the group
- Set the variables
If we drop the group (as discussed above) then all the constructor needs to do is turn off scale and mask. This would duplicate turning it off in opener--are you ok with the duplication of the "work" (not the code to do so)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we drop the group (as discussed above) then all the constructor needs to do is turn off scale and mask. This would duplicate turning it off in opener--are you ok with the duplication of the "work" (not the code to do so)?
Yes, this seems OK but not ideal. (In my ideal world, we would check to see if the netCDF4.Dataset has auto-mask-and-scale turned on, and raise an error in that case rather than silently converting it. But I don't think that's possible with netCDF4-Python's current API.)
xarray/backends/netCDF4_.py
Outdated
if format is None: | ||
format = 'NETCDF4' | ||
opener = functools.partial(_open_netcdf4_group, filename, mode=mode, | ||
opener = functools.partial(_open_netcdf4_group, filename, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is missing mode
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was looking in common.py
and it seems like mode gets overridden and thus was not really needed:
xarray/xarray/backends/common.py
Lines 255 to 265 in 8e541de
def __getstate__(self): | |
state = self.__dict__.copy() | |
del state['ds'] | |
if self._mode == 'w': | |
# file has already been created, don't override when restoring | |
state['_mode'] = 'a' | |
return state | |
def __setstate__(self, state): | |
self.__dict__.update(state) | |
self.ds = self._opener(mode=self._mode) |
Happy to defer to your judgement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess given how much the test suite likes my refactor, I've messed something up though... 😁
xarray/backends/netCDF4_.py
Outdated
def __init__(self, filename, mode='r', format='NETCDF4', group=None, | ||
def __init__(self, netcdf4_dataset, mode='r', group=None, writer=None, | ||
opener=None, autoclose=False): | ||
self.ds = _get_netcdf4_group(netcdf4_dataset, group) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not pass in a netCDF4 group directly and drop the group
argument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So anyone using this part of the interface, if they want a specific group needs to pass it in rather than pass the name? I can live with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, exactly.
Ok, so I think we're closer now. The tests at least pass on my machine now. 😁 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good to me, just a few minor changes.
self.is_remote = is_remote_uri(filename) | ||
self._filename = filename | ||
self._mode = 'a' if mode == 'w' else mode | ||
self._opener = functools.partial(opener, mode=self._mode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic on these two lines with self._mode
seems a little redundant / strange, but I'm concerned that it might be important to avoid overwriting files when pickling a datastore or using autoclose. Can you restore it to __init__
? I would rather tackle this clean-up in another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, although calling partial
on opener
is conditional on opener
existing since it's None
in the case of an existing Dataset
.
xarray/tests/test_backends.py
Outdated
@@ -762,6 +762,18 @@ def test_0dimensional_variable(self): | |||
expected = Dataset({'x': ((), 123)}) | |||
self.assertDatasetIdentical(expected, ds) | |||
|
|||
def test_read_open_dataset(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For more clarity: test_already_open_dataset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Make the filename argument to NetCDF4DataStore polymorphic so that a Dataset can be passed in.
20ee8ca
to
0e79adc
Compare
I went ahead and rebased on master and squashed down the WIP commits. I suspect this is ready now. |
Looks like test failures are those mentioned in #1540 |
Thanks @dopplershift ! |
Make the filename argument to
NetCDF4DataStore
polymorphic so that aDataset
can be passed in.git diff upstream/master | flake8 --diff
whats-new.rst
for all changes andapi.rst
for new API#1459 discussed adding an alternate constructor (i.e. a class method) to
NetCDF4DataStore
to allow this, which would be my preferred approach rather than making afilename
polymorphic (viaisinstance
). Unfortunately, alternate constructors only work by taking one set of parameters (or setting defaults) and then passing them to the original constructor. Given that, there's no way to make an alternate constructor without also making the original constructor somehow aware of this functionality--or breaking backwards-compatibility. I'm open to suggestions to the contrary.