ENH: Support using opened netCDF4.Dataset (Fixes #1459) #1508

dopplershift · 2017-08-16T20:19:01Z

Make the filename argument to NetCDF4DataStore polymorphic so that a
Dataset can be passed in.

Closes xarray.Dataset from existing netCDF4.Dataset #1459
Tests added / passed
Passes git diff upstream/master | flake8 --diff
Fully documented, including whats-new.rst for all changes and api.rst for new API

#1459 discussed adding an alternate constructor (i.e. a class method) to NetCDF4DataStore to allow this, which would be my preferred approach rather than making a filename polymorphic (via isinstance). Unfortunately, alternate constructors only work by taking one set of parameters (or setting defaults) and then passing them to the original constructor. Given that, there's no way to make an alternate constructor without also making the original constructor somehow aware of this functionality--or breaking backwards-compatibility. I'm open to suggestions to the contrary.

shoyer · 2017-08-18T17:38:40Z

xarray/backends/netCDF4_.py

@@ -182,7 +182,10 @@ def _extract_nc4_variable_encoding(variable, raise_on_invalid=False,
 def _open_netcdf4_group(filename, mode, group=None, **kwargs):
    import netCDF4 as nc4

-    ds = nc4.Dataset(filename, mode=mode, **kwargs)
+    if isinstance(filename, nc4.Dataset):
+        ds = filename


One potential gotcha with reusing an existing netCDF4 dataset is that we disable automatic masking and scale on each variable (see below in this function).

Not sure why that would be a problem. It's transmuting state, but IMO if you're handing xarray a Dataset, you kinda want it to do it's thing...

Let's just make it very clear in the docs.

shoyer · 2017-08-18T17:42:01Z

xarray/backends/netCDF4_.py

+    if isinstance(filename, nc4.Dataset):
+        ds = filename
+    else:
+        ds = nc4.Dataset(filename, mode=mode, **kwargs)

    with close_on_error(ds):


We don't want to automatically close already open netCDF4 datasets if there's a problem loading a group.

Ok, I'll see about that when I refactor.

shoyer · 2017-08-18T17:45:04Z

xarray/backends/netCDF4_.py

@@ -197,6 +200,9 @@ class NetCDF4DataStore(WritableCFDataStore, DataStorePickleMixin):
    """Store for reading and writing data via the Python-NetCDF4 library.

    This store supports NetCDF3, NetCDF4 and OpenDAP datasets.
+
+    `filename` can be an already opened netCDF4 ``Dataset``; this will not
+    support pickling.


Maybe:

filename can be an already opened netCDF4 Dataset or Group. In this case, the other netCDF4 specific arguments are ignored, and the DataStore object cannot be pickled.

shoyer · 2017-08-18T17:47:42Z

xarray/backends/netCDF4_.py

@@ -210,9 +216,9 @@ def __init__(self, filename, mode='r', format='NETCDF4', group=None,
        self.ds = opener()


In theory, I think we could support pickling if passed all the right keyword arguments in addition to the existing netCDF4.Dataset object, and we used the filename/group from the input argument as arguments to opener. But I'm not entirely sure that's worth doing.

It's probably simpler for now to explicitly raise an error for cases where we would re-use an opener, namely when pickling and/or using autoclose=True. Can you add that? Right now I would guess that using either of those options could result in somewhat opaque errors.

So I can catch autoclose, but am I trying to throw an error in __getstate__ and __setstate__ for pickle? Not sure where to go for that part of the problem.

Actually, netCDF4.Dataset has a __reduce__ method that raises NotImplementedError, so maybe this isn't necessary

shoyer · 2017-08-18T17:56:25Z

doc/whats-new.rst

@@ -55,6 +55,12 @@ Enhancements
  (:issue:`576`).
  By `Stephan Hoyer <https://github.com/shoyer>`_.

+- Support using an existing, opened netCDF4 ``Dataset`` with
+  :py:class:`~xarray.backends.NetCDF4DataStore`. This permits creating an


What do you think about moving the current constructor logic into a classmethod NetCDF4DataStore.open(filename, mode, format, group, writer, clobber, diskless, persist, autoclose)

And adjusting the constructor: NetCDF4DataStore.__init__(self, netcdf4_dataset, opener=None, writer=None, autoclose=None).

Right now, I don't think anyone is using the NetCDF4DataStore constructor directly -- there's literally no good reason for that. This also gives us a pattern we could use for other constructors (e.g., pydap) where passing in an existing object is desirable.

I'm happy to refactor like that--I just didn't think it was on the table.

dopplershift · 2017-08-23T23:23:42Z

I've taken a crack at refactoring the constructor to make that take the dataset instance and have an open class method that opens the dataset. It was a little more effort to break everything apart (owing to opener) but I think what I have is ok. I'd appreciate feedback before I go and update the whats new and rebase to clean up.

shoyer · 2017-08-23T23:50:48Z

xarray/backends/netCDF4_.py

-        self._opener = functools.partial(opener, mode=self._mode)
-        super(NetCDF4DataStore, self).__init__(writer)
+
+        ds = nc4.Dataset(filename, mode=mode, format=format, clobber=clobber,


Can you just re-use opener here instead? e.g., ds = opener() like how it was in __init__ before?

So that was my attempt at not duplicating work needed in the opener, the constructor, and our new alternate constructor (open):

Open the filename and get a dataset

Find the group

Set the variables

If we drop the group (as discussed above) then all the constructor needs to do is turn off scale and mask. This would duplicate turning it off in opener--are you ok with the duplication of the "work" (not the code to do so)?

If we drop the group (as discussed above) then all the constructor needs to do is turn off scale and mask. This would duplicate turning it off in opener--are you ok with the duplication of the "work" (not the code to do so)?

Yes, this seems OK but not ideal. (In my ideal world, we would check to see if the netCDF4.Dataset has auto-mask-and-scale turned on, and raise an error in that case rather than silently converting it. But I don't think that's possible with netCDF4-Python's current API.)

shoyer · 2017-08-23T23:51:13Z

xarray/backends/netCDF4_.py

        if format is None:
            format = 'NETCDF4'
-        opener = functools.partial(_open_netcdf4_group, filename, mode=mode,
+        opener = functools.partial(_open_netcdf4_group, filename,


I think this is missing mode.

I was looking in common.py and it seems like mode gets overridden and thus was not really needed:

xarray/xarray/backends/common.py

Lines 255 to 265 in 8e541de

def __getstate__(self):

state = self.__dict__.copy()

del state['ds']

if self._mode == 'w':

# file has already been created, don't override when restoring

state['_mode'] = 'a'

return state

def __setstate__(self, state):

self.__dict__.update(state)

self.ds = self._opener(mode=self._mode)

Happy to defer to your judgement.

I guess given how much the test suite likes my refactor, I've messed something up though... 😁

shoyer · 2017-08-23T23:53:00Z

xarray/backends/netCDF4_.py

-    def __init__(self, filename, mode='r', format='NETCDF4', group=None,
+    def __init__(self, netcdf4_dataset, mode='r', group=None, writer=None,
+                 opener=None, autoclose=False):
+        self.ds = _get_netcdf4_group(netcdf4_dataset, group)


Why not pass in a netCDF4 group directly and drop the group argument?

So anyone using this part of the interface, if they want a specific group needs to pass it in rather than pass the name? I can live with that.

Yes, exactly.

dopplershift · 2017-08-24T03:11:13Z

Ok, so I think we're closer now. The tests at least pass on my machine now. 😁

shoyer

Generally looks good to me, just a few minor changes.

shoyer · 2017-08-30T06:00:17Z

xarray/backends/netCDF4_.py

-        self.is_remote = is_remote_uri(filename)
-        self._filename = filename
-        self._mode = 'a' if mode == 'w' else mode
-        self._opener = functools.partial(opener, mode=self._mode)


This logic on these two lines with self._mode seems a little redundant / strange, but I'm concerned that it might be important to avoid overwriting files when pickling a datastore or using autoclose. Can you restore it to __init__? I would rather tackle this clean-up in another PR.

Done, although calling partial on opener is conditional on opener existing since it's None in the case of an existing Dataset.

shoyer · 2017-08-30T06:00:38Z

xarray/tests/test_backends.py

@@ -762,6 +762,18 @@ def test_0dimensional_variable(self):
                expected = Dataset({'x': ((), 123)})
                self.assertDatasetIdentical(expected, ds)

+    def test_read_open_dataset(self):


For more clarity: test_already_open_dataset

Make the filename argument to NetCDF4DataStore polymorphic so that a Dataset can be passed in.

dopplershift · 2017-08-30T20:37:50Z

I went ahead and rebased on master and squashed down the WIP commits. I suspect this is ready now.

dopplershift · 2017-08-31T16:01:18Z

Looks like test failures are those mentioned in #1540

shoyer · 2017-08-31T17:18:58Z

Thanks @dopplershift !

shoyer reviewed Aug 18, 2017

View reviewed changes

jhamman added topic-backends enhancement labels Aug 23, 2017

shoyer reviewed Aug 23, 2017

View reviewed changes

dopplershift mentioned this pull request Aug 28, 2017

v0.10 Release #1535

Closed

13 tasks

jhamman modified the milestone: 0.10 Aug 28, 2017

shoyer mentioned this pull request Aug 29, 2017

WIP: Zarr backend #1528

Merged

4 tasks

shoyer reviewed Aug 30, 2017

View reviewed changes

ENH: Support using opened netCDF4.Dataset (Fixes pydata#1459)

0e79adc

Make the filename argument to NetCDF4DataStore polymorphic so that a Dataset can be passed in.

dopplershift force-pushed the from-dataset branch from 20ee8ca to 0e79adc Compare August 30, 2017 20:36

shoyer merged commit b190501 into pydata:master Aug 31, 2017

dopplershift deleted the from-dataset branch August 31, 2017 22:24

shoyer mentioned this pull request Sep 5, 2017

Use xarray.open_dataset() for password-protected Opendap files #1068

Closed

jhamman mentioned this pull request Sep 17, 2017

Allow rasterio_open to be used on in-memory rasterio objects? #1575

Closed

shoyer mentioned this pull request Nov 29, 2017

Create xarray.Dataset from a netCDF4.Dataset object #1304

Closed

fmaussion mentioned this pull request Apr 5, 2018

Fix AttributeError whith most recent xarray version fmaussion/salem#97

Merged

2 tasks

shoyer mentioned this pull request Jul 23, 2018

Saving to disk an in-memory file Unidata/netcdf4-python#807

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Support using opened netCDF4.Dataset (Fixes #1459) #1508

ENH: Support using opened netCDF4.Dataset (Fixes #1459) #1508

dopplershift commented Aug 16, 2017 •

edited

Loading

shoyer Aug 18, 2017

dopplershift Aug 23, 2017

shoyer Aug 23, 2017

shoyer Aug 18, 2017

dopplershift Aug 23, 2017

shoyer Aug 18, 2017

shoyer Aug 18, 2017

dopplershift Aug 23, 2017

shoyer Aug 23, 2017

shoyer Aug 18, 2017

dopplershift Aug 23, 2017

dopplershift commented Aug 23, 2017

shoyer Aug 23, 2017

dopplershift Aug 24, 2017

shoyer Aug 24, 2017

shoyer Aug 23, 2017

dopplershift Aug 24, 2017

dopplershift Aug 24, 2017

shoyer Aug 23, 2017

dopplershift Aug 24, 2017

shoyer Aug 24, 2017

dopplershift commented Aug 24, 2017

shoyer left a comment

shoyer Aug 30, 2017

dopplershift Aug 30, 2017

shoyer Aug 30, 2017

dopplershift Aug 30, 2017

dopplershift commented Aug 30, 2017

dopplershift commented Aug 31, 2017

shoyer commented Aug 31, 2017

		@@ -210,9 +216,9 @@ def __init__(self, filename, mode='r', format='NETCDF4', group=None,
		self.ds = opener()

	def __getstate__(self):
	state = self.__dict__.copy()
	del state['ds']
	if self._mode == 'w':
	# file has already been created, don't override when restoring
	state['_mode'] = 'a'
	return state

	def __setstate__(self, state):
	self.__dict__.update(state)
	self.ds = self._opener(mode=self._mode)

ENH: Support using opened netCDF4.Dataset (Fixes #1459) #1508

ENH: Support using opened netCDF4.Dataset (Fixes #1459) #1508

Conversation

dopplershift commented Aug 16, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dopplershift commented Aug 23, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dopplershift commented Aug 24, 2017

shoyer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dopplershift commented Aug 30, 2017

dopplershift commented Aug 31, 2017

shoyer commented Aug 31, 2017

dopplershift commented Aug 16, 2017 •

edited

Loading