Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support opening password protected datasets over Opendap (Fixes #1068) #1570

Merged
merged 12 commits into from
Sep 15, 2017

Conversation

mrpgraae
Copy link
Contributor

Changed PydapDataStore to be initialized with a Pydap dataset (returned by pydap.client.open_url). Also added a open classmethod to PydapDataStore, for opening with a url. The open classmethod also takes an optional session object, allowing the api discussed in #1068:

pydap_ds = pydap.client.open_url(url, session=session)
store = xarray.backends.PydapDataStore(pydap_ds)
ds = xarray.open_dataset(store)

or

store = xarray.backends.PydapDataStore.open(url, session=session)
ds = xarray.open_dataset(store)

I tested this with NASA Earthdata as you can see in the added testcase, so i used a session object returned from pydap.cas.urs.setup_session() as that function is tailored for that. However, it just returns a Session object from the requests library, so maybe homemade requests sessions can work with other sites. I haven't been able to find other freely accessible but password-protected data repositories.

I thought about adding the other keyword parameters from pydap.client.open_url to xarray.backends.PydapDataStore.open but I was unsure whether that would be a good idea? Also, if we should do it, would it be okay to just pass **kwargs on to pydap.client.open_url?

Any constructive feedback and/or proposed changes are highly welcome of course. By the way, this is my first time contributing to an open source project like this, so please let me know if there is anything I should have done otherwise!

Copy link
Member

@shoyer shoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, thank you!

I thought about adding the other keyword parameters from pydap.client.open_url to xarray.backends.PydapDataStore.open but I was unsure whether that would be a good idea? Also, if we should do it, would it be okay to just pass **kwargs on to pydap.client.open_url?

I'm ambivalent on this one. I don't think there's a strong need given the ability to pass in pydap datasets directly into the constructor, but if there are a few other optional keyword arguments to complete the function signature, then sure, that would be fine. I would definitely avoid using **kwargs, because that precludes the ability for us to add other xarray specific options at some point in the future.

@@ -60,9 +60,14 @@ class PydapDataStore(AbstractDataStore):
This store provides an alternative way to access OpenDAP datasets that may
be useful if the netCDF4 library is not available.
"""
def __init__(self, url):
def __init__(self, ds):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please add a docstring that mentions the type of ds?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It didn't seem like you had any preferred docstring format, so i just used the Numpy style. Hope that's fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do try to stick with numpy-style :)

def test_password(self):
from pydap.cas.urs import setup_session

url = ('https://disc2.gesdisc.eosdis.nasa.gov/opendap/TRMM_RT/'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a server that we can reasonably expect to remain online and working for the indefinite future? I am somewhat reluctant to hard-code URLs like this, given that they are likely to break in the future.

Integration tests are valuable, but we probably already have enough of an integration test with the other pydap network tests. The right way to do this test is probably to mock out pydap.open_url using mock.patch, to verify that the session argument is passed:
https://docs.python.org/3/library/unittest.mock-examples.html#mocking-classes

So if you're up for it, feel free to give mocking a try. But I actually feel pretty good about correctness for this change even without another explicit test, given that PydapDataStore.open is called internally by xarray and its logic for handling the session parameter is trivial. So I would also be OK even dropping this test entirely.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree completely with regards to the other pydap tests being sufficient for integration testing, and also feel that hard-coding a URL like this is bad practice. However i wanted to show how/that i tested the implementation.

I'm not sure about the server staying online at that address indefinitely. Then there's also the problem of the credentials being publicly available here, so people may use it and change the password.

Good idea mocking the function! I also feel confident about the correctness of this, but i'd like to give mocking a try, since i don't really have any experience with that.

- Changed :py:class:`~xarray.backends.PydapDataStore` to take a Pydap dataset.
This permits opening Opendap datasets that require authentication, by
instantiating a Pydap dataset with a session object. Also added
:py:func:`xarray.backends.PydapDataStore.open` which takes a url and session
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be :py:meth:

@mrpgraae
Copy link
Contributor Author

mrpgraae commented Sep 12, 2017

Thank you for the fast review!

but if there are a few other optional keyword arguments to complete the function signature, then sure, that would be fine. I would definitely avoid using **kwargs, because that precludes the ability for us to add other xarray specific options at some point in the future.

One of the reasons i was unsure about adding the other keyword arguments, is that I'm not entirely sure if changing any of them from their default values will break the interface with xarray. But i guess you could argue that if users are changing these parameters, they probably know what they are doing.

Then there's also the timeout parameter, for which I think we will need to import some module within pydap just to get DEFAULT_TIMEOUT. That seems a bit awkward? We could also just leave that one out.

I definitely see your point with **kwargs, I had a feeling that would be a bad idea.

@shoyer
Copy link
Member

shoyer commented Sep 12, 2017

One of the reasons i was unsure about adding the other keyword arguments, is that I'm not entirely sure if changing any of them from their default values will break the interface with xarray.

Well, this is probably a good reason to leave out any that we feel unsure about.

Then there's also the timeout parameter, for which I think we will need to import some module within pydap just to get DEFAULT_TIMEOUT. That seems a bit awkward?

Yes, that would make me hesitant to use it, especially if it's not clear that DEFAULT_TIMEOUT is a supported API.

@mrpgraae
Copy link
Contributor Author

I fixed the documentation stuff and changed the test to one that uses a mock of pydap.client.open_url, like we talked about.

I decided against adding open_urls other keyword arguments to PydapDataStore.open, for the reasons that we discussed and since users can still instantiate their own Pydap dataset and pass that, if they really want to change those parameters.

Copy link
Member

@shoyer shoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great to me!

One last suggestion -- can you add a brief note about the ability to access password protected datasets in the opendap section of the docs?
http://xarray.pydata.org/en/stable/io.html#opendap

@@ -60,9 +60,14 @@ class PydapDataStore(AbstractDataStore):
This store provides an alternative way to access OpenDAP datasets that may
be useful if the netCDF4 library is not available.
"""
def __init__(self, url):
def __init__(self, ds):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do try to stick with numpy-style :)

@mrpgraae
Copy link
Contributor Author

This looks great to me!

One last suggestion -- can you add a brief note about the ability to access password protected datasets in the opendap section of the docs?
http://xarray.pydata.org/en/stable/io.html#opendap

Thanks! It's cool to be able to contribute to a library i use daily.

I added some documentation, please tell me what you think. For the second example, with the NASA URS, i wanted to use a URL that points to an actual NASA dataset, but i couldn't find one that wasn't extremely long. I ended up using that one, which is for a real NASA server, but a non-existant dataset. Hope that's not too confusing. Obviously, the username and password are fake as well :) so maybe it doesn't matter.

@shoyer shoyer merged commit 3fb5cbb into pydata:master Sep 15, 2017
@shoyer
Copy link
Member

shoyer commented Sep 15, 2017

Thanks @mrpgraae !

@shoyer
Copy link
Member

shoyer commented Sep 15, 2017

This will be in the forthcoming v0.10 release

@mrpgraae
Copy link
Contributor Author

Thank you too! Cool :)

@mrpgraae mrpgraae deleted the opendap-password branch September 15, 2017 16:19
@janjaapmeijer
Copy link

Is there a way to do this, but then for ftp-servers rather than for opendap? It would be awesome to retrieve data from marine.copernicus.eu and subset it with xarray before saving it to disk. Cheers Jan Jaap

@shoyer
Copy link
Member

shoyer commented Oct 11, 2018

I don't think we have any way to access data via FTP in xarray.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants