-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support opening password protected datasets over Opendap (Fixes #1068) #1570
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, thank you!
I thought about adding the other keyword parameters from pydap.client.open_url to xarray.backends.PydapDataStore.open but I was unsure whether that would be a good idea? Also, if we should do it, would it be okay to just pass
**kwargs
on to pydap.client.open_url?
I'm ambivalent on this one. I don't think there's a strong need given the ability to pass in pydap datasets directly into the constructor, but if there are a few other optional keyword arguments to complete the function signature, then sure, that would be fine. I would definitely avoid using **kwargs
, because that precludes the ability for us to add other xarray specific options at some point in the future.
@@ -60,9 +60,14 @@ class PydapDataStore(AbstractDataStore): | |||
This store provides an alternative way to access OpenDAP datasets that may | |||
be useful if the netCDF4 library is not available. | |||
""" | |||
def __init__(self, url): | |||
def __init__(self, ds): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you please add a docstring that mentions the type of ds
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It didn't seem like you had any preferred docstring format, so i just used the Numpy style. Hope that's fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do try to stick with numpy-style :)
xarray/tests/test_backends.py
Outdated
def test_password(self): | ||
from pydap.cas.urs import setup_session | ||
|
||
url = ('https://disc2.gesdisc.eosdis.nasa.gov/opendap/TRMM_RT/' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a server that we can reasonably expect to remain online and working for the indefinite future? I am somewhat reluctant to hard-code URLs like this, given that they are likely to break in the future.
Integration tests are valuable, but we probably already have enough of an integration test with the other pydap network tests. The right way to do this test is probably to mock out pydap.open_url
using mock.patch
, to verify that the session
argument is passed:
https://docs.python.org/3/library/unittest.mock-examples.html#mocking-classes
So if you're up for it, feel free to give mocking a try. But I actually feel pretty good about correctness for this change even without another explicit test, given that PydapDataStore.open
is called internally by xarray and its logic for handling the session
parameter is trivial. So I would also be OK even dropping this test entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree completely with regards to the other pydap tests being sufficient for integration testing, and also feel that hard-coding a URL like this is bad practice. However i wanted to show how/that i tested the implementation.
I'm not sure about the server staying online at that address indefinitely. Then there's also the problem of the credentials being publicly available here, so people may use it and change the password.
Good idea mocking the function! I also feel confident about the correctness of this, but i'd like to give mocking a try, since i don't really have any experience with that.
doc/whats-new.rst
Outdated
- Changed :py:class:`~xarray.backends.PydapDataStore` to take a Pydap dataset. | ||
This permits opening Opendap datasets that require authentication, by | ||
instantiating a Pydap dataset with a session object. Also added | ||
:py:func:`xarray.backends.PydapDataStore.open` which takes a url and session |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be :py:meth:
Thank you for the fast review!
One of the reasons i was unsure about adding the other keyword arguments, is that I'm not entirely sure if changing any of them from their default values will break the interface with xarray. But i guess you could argue that if users are changing these parameters, they probably know what they are doing. Then there's also the I definitely see your point with **kwargs, I had a feeling that would be a bad idea. |
Well, this is probably a good reason to leave out any that we feel unsure about.
Yes, that would make me hesitant to use it, especially if it's not clear that |
I fixed the documentation stuff and changed the test to one that uses a mock of I decided against adding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great to me!
One last suggestion -- can you add a brief note about the ability to access password protected datasets in the opendap section of the docs?
http://xarray.pydata.org/en/stable/io.html#opendap
@@ -60,9 +60,14 @@ class PydapDataStore(AbstractDataStore): | |||
This store provides an alternative way to access OpenDAP datasets that may | |||
be useful if the netCDF4 library is not available. | |||
""" | |||
def __init__(self, url): | |||
def __init__(self, ds): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do try to stick with numpy-style :)
Thanks! It's cool to be able to contribute to a library i use daily. I added some documentation, please tell me what you think. For the second example, with the NASA URS, i wanted to use a URL that points to an actual NASA dataset, but i couldn't find one that wasn't extremely long. I ended up using that one, which is for a real NASA server, but a non-existant dataset. Hope that's not too confusing. Obviously, the username and password are fake as well :) so maybe it doesn't matter. |
Thanks @mrpgraae ! |
This will be in the forthcoming v0.10 release |
Thank you too! Cool :) |
Is there a way to do this, but then for ftp-servers rather than for opendap? It would be awesome to retrieve data from marine.copernicus.eu and subset it with xarray before saving it to disk. Cheers Jan Jaap |
I don't think we have any way to access data via FTP in xarray. |
git diff upstream/master | flake8 --diff
whats-new.rst
for all changes andapi.rst
for new APIChanged
PydapDataStore
to be initialized with a Pydap dataset (returned bypydap.client.open_url
). Also added aopen
classmethod to PydapDataStore, for opening with a url. Theopen
classmethod also takes an optional session object, allowing the api discussed in #1068:or
I tested this with NASA Earthdata as you can see in the added testcase, so i used a session object returned from pydap.cas.urs.setup_session() as that function is tailored for that. However, it just returns a
Session
object from the requests library, so maybe homemade requests sessions can work with other sites. I haven't been able to find other freely accessible but password-protected data repositories.I thought about adding the other keyword parameters from
pydap.client.open_url
toxarray.backends.PydapDataStore.open
but I was unsure whether that would be a good idea? Also, if we should do it, would it be okay to just pass**kwargs
on topydap.client.open_url
?Any constructive feedback and/or proposed changes are highly welcome of course. By the way, this is my first time contributing to an open source project like this, so please let me know if there is anything I should have done otherwise!