-
-
Notifications
You must be signed in to change notification settings - Fork 408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hsa support #2122
Hsa support #2122
Conversation
Codecov Report
@@ Coverage Diff @@
## main #2122 +/- ##
==========================================
+ Coverage 62.98% 63.17% +0.18%
==========================================
Files 131 132 +1
Lines 17059 17187 +128
==========================================
+ Hits 10745 10858 +113
- Misses 6314 6329 +15
📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code mostly looks fine, with a few small corrections here and there.
However, the documentation is targeted too much at a technical audience - not at astronomers. There need to be examples of how to find observation IDs based on a position on the sky or some other astronomical criteria. Can that be achieved through this interface?
Is there any way, through the HSA, to access large program data, like the HiGal data?
astroquery/hsa/core.py
Outdated
verbose=False, | ||
cache=True, **kwargs): | ||
""" | ||
Download observation from Herschel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you say more here - what does it mean to 'download observations'? Are we downloading FITS images? Raw timestream data? Level X processed files? All of the above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't sure if this function was going to be useful or just confusing.
You can achieve the same result using the download_data
function with the retrieval_type='OBSERVATION'
(which is the default value). Maybe we could delete this function completely.
Answering the question, it retrieves all the products associated with an observation (mostly FITS files).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be useful, but it's necessary to explain what "all the products associated with an observation" is likely to mean. Up to you, though.
Thank you @keflavich for your review. The way to get observation IDs with this interface is through the TAP ( I can provide an example of the entire process but I don't know where to put it. Should I add another section in the As far as, I know you cannot access HiGAL data from the HSA. I don't have much experience with that so I might be mistaken. |
yes! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry it took me a while to get to the review.
This looks pretty good internally - I think the code is fine.
The user-facing documentation could use some improvement, though, and we would like an astroquery-compatible API - so, for the "query a region on the sky" TAP query, we want a query_region
function.
astroquery/hsa/core.py
Outdated
-The auxiliary directory: contains all Herschel non-science spacecraft data | ||
-The calibarion directory: contains the uplink and downlink calibration products | ||
-<obs_id> directory: contains the science data distributed in sub-directories called level0/0.5/1/2/2.5/3. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check how this renders in the docs build; I think you might need different formatting for this to look nice. But I don't remember my .rst well enough to be sure - it might be OK too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
docs/hsa/hsa.rst
Outdated
Herschel Science Archive (`astroquery.hsa`) | ||
******************************************* | ||
|
||
Herschel was the fourth cornerstone in ESA's Horizon 2000 science programme, designed to observe the 'cool' universe. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be a link to the Herschel mission page here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
docs/hsa/hsa.rst
Outdated
------------------------------- | ||
2. Getting Observation Product | ||
------------------------------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
------------------------------- | |
2. Getting Observation Product | |
------------------------------- | |
------------------------------- | |
2. Getting Observation Products | |
------------------------------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
docs/hsa/hsa.rst
Outdated
This will download the product of the observation '1342195355' with the instrument 'PACS' and | ||
it will store them in a tar called '1342195355.tar'. The parameters available are detailed in the API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be helpful to add a few words about what a user can expect to find in an observation tarball, or at least, a link to where one can find that information
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a problem, I've added a link.
docs/hsa/hsa.rst
Outdated
This will download the product of the observation '1342195355' with the instrument 'PACS' and | ||
it will store them in a tar called '1342195355.tar'. The parameters available are detailed in the API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs more explanation of how "Observation Products" are different from "data". Is it just the interface that's different, or are the products different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a few words explaining this.
docs/hsa/hsa.rst
Outdated
'http://archives.esac.esa.int/hsa/whsa/' | ||
|
||
------------------------------- | ||
3. Getting Herschel Postcard |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section also needs description - what is a postcard? Is it a "cutout" or "quicklook"? What does that mean? Is it usable for science? Again, this could be answered with a link to the right part of the archive, but it would be helpful to have descriptive text here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a brief description.
docs/hsa/hsa.rst
Outdated
------------------------------------------ | ||
|
||
This function provides access to the Herschel Science Archive database using the Table Access Protocol (TAP) and via the Astronomical Data | ||
Query Language (ADQL). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we link to the description of ADQL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
docs/hsa/hsa.rst
Outdated
|
||
>>> from astroquery.hsa import HSA | ||
>>> | ||
>>> HSA.query_hsa_tap("select top 10 observation_id from hsa.v_active_observation where contains(point('ICRS', hsa.v_active_observation.ra, hsa.v_active_observation.dec),circle('ICRS', 100.2417,9.895, 1.1))=1", output_format='csv', output_file='results.cvs') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should have a wrapper function implemented to do this, e.g.:
def query_observations(coordinate, radius, n_obs=10):
...
to match the general astroquery user API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On it 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a mixture of comments, some are of things that need to be fixed, others are more advisory or nitpicky.
CHANGES.rst
Outdated
hsa | ||
^^^ | ||
|
||
- New module to access ESA Herschel mission. [#] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs the PR number
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
astroquery/hsa/__init__.py
Outdated
@@ -0,0 +1,29 @@ | |||
# Licensed under a 3-clause BSD style license - see LICENSE.rst | |||
""" | |||
HSA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move this module under the esa
namespace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
astroquery/hsa/core.py
Outdated
filename=None, | ||
verbose=False, | ||
cache=True, | ||
**kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please list in the signature all the kwargs you added in the docstring. I also wonder what else could be hidden under **kwargs
? If it's not a huge list I think it's better to list them all.
Also, minor preference, I think it's nicer not to list everything in a new line, but I'm ok with if it stays as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added the parameters in the docstring to the signature. I wouldn't add the rest of them because IMO there are a lot see
astroquery/hsa/core.py
Outdated
filename = os.path.splitext(filename)[0] | ||
|
||
if retrieval_type is None: | ||
retrieval_type = "OBSERVATION" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to define the defaults in the signature, I don't see any practical use here for setting it to None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. I've changed it.
astroquery/hsa/core.py
Outdated
link += "".join("&{0}={1}".format(key, val) | ||
for key, val in kwargs.items()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These can be in one line, it's easier to read, and having ~100char long lines is totally fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
from .test_hsa import TestHSA | ||
|
||
|
||
class TestHSARemote(TestHSA): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need to subclass here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not needed anymore since we're not using the dummyTapHandler
anymore.
'observation_id': obs_id, | ||
'instrument_name': "PACS"} | ||
expected_res = obs_id + ".tar" | ||
hsa = HSAClass(self.get_dummy_tap_handler()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't exactly understand why you need the dummy here and below, these remote test should ideally test using direct connection to the archive, just as a user would do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, I think I copy-pasted from the other tests and didn't change it. It's fixed now.
res = hsa.download_data(**parameters) | ||
assert res == expected_res | ||
assert os.path.isfile(res) | ||
os.remove(res) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use a temp_dir type pytest fixture, as these files are not cleaned up in case there is a test failure. There are examples for this approach is a few other modules, e.g in esasky
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm using tempfile.TemporaryDirectory()
now and I've added a download_dir
parameter to all the functions.
astroquery/hsa/core.py
Outdated
else: | ||
self._tap = tap_handler | ||
|
||
def download_data(self, *, retrieval_type=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
users may want to be able to set the download directory (e.g download_dir
in esasky), but still let the filenames figured out automatically. Would be nice if this method and a few below would support it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added download_dir
@@ -0,0 +1,117 @@ | |||
# Licensed under a 3-clause BSD style license - see LICENSE.rst |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These remote tests are very slow (I have let them run for 30mins, 2 failed one success, and cancelled the remaining), I'm not exactly sure about the reasons. If possible please use obs_id
s that belongs to smaller files, etc.
While these tests are not running in CI for pull requests, we run them weekly from CI cron, and also run them manually to check for regression, it's really important that they would run in a reasonably quick timescale.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed the observations that were being downloaded to smaller ones and I've also changed the query so it only downloads one product_level
. Now the tests take ~10 seconds.
Oh, and also, please rebase both to resolve the minor conflicts with the changelog and the docs index page, and also to pick up the latest CI matrix. Thanks! |
Co-authored-by: Adam Ginsburg <[email protected]>
Co-authored-by: Adam Ginsburg <[email protected]>
Co-authored-by: Adam Ginsburg <[email protected]>
@lvalerom - I've pushed commits to fix the docs build as well as the remote data doctests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only comment that would be great to be addressed in this PR is about the skipping of remote tests when the retries fail. I like that retry is built in, but I feel it would be better to not skip those tests, but I can be convinced otherwise if you would comment on the assumptions that went into the current setup.
The rest of my comments are nitpicks, that don't really need to be cleanup now, or point beyond this PR.
|
||
return filename | ||
|
||
def query_hsa_tap(self, query, *, output_file=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This points outside the scope of this PR, but I this it would make sense API wise it would be better to rename all these query_xyz_tap
methods to query_tap
. What do you guys, @lvalerom @jespinosaar, think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense. I would change it in another PR along with the other missions though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that should definitely be done in a separate PR, as the other modules need to be renamed using proper deprecations.
if method == self._invokedMethod: | ||
return | ||
else: | ||
raise ValueError("Method '{}' is not invoked. (Invoked method \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: no linebreaks are needed in these types of strings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could be wrong but I think these ValueError
were not properly formatted.
ValueError("Method '{}' is not invoked. (Invoked method \
is '{}'.)").format(method, self._invokedMethod)
# Should be
ValueError("Method '{}' is not invoked. (Invoked method "
"is '{}'.)".format(method, self._invokedMethod))
# Notice the parenthesis
I've noticed that this is happening in other missions. I can open a PR to fix this if you would like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, a PR would be great. Also, these can be now changed to use f-strings, that way there are fewer parens, and a bit more easily legible message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've fixed the formatting in this module. I'll open a different PR to address the other modules.
astroquery/esa/hsa/core.py
Outdated
from pathlib import Path | ||
|
||
from astropy import units as u | ||
from ...utils import commons |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for relative import here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
astroquery/esa/hsa/core.py
Outdated
timeout = conf.TIMEOUT | ||
|
||
def __init__(self, tap_handler=None): | ||
super(HSAClass, self).__init__() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: we only support recent enough pythons, so this can be simplified to super().__init__()
. A ton of the modules hasn't been cleaned up, so no need to do it in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed.
@@ -0,0 +1,14 @@ | |||
# Licensed under a 3-clause BSD style license - see LICENSE.rst |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are no data files for this module, so no need for this file either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
hsa = HSAClass() | ||
res = self.access_archive_with_retries(hsa.download_data, parameters) | ||
if res is None: | ||
pytest.skip("Archive broke the connection {} times, unable to test".format(self.retries)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, and everywhere below: I wonder whether it would be more useful to let this case break the test? Otherwise it would be difficult to pick up if there is a genuine issue (or would you always expect a non None return?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I was testing these last time I noticed that the connection with the archive was very unstable and, at least for large files, a lot times the connection was broken by the archive. The reason why I decided to skip the tests if the archive is unresponsive is to avoid flakiness in the tests. We always expect a non None
return from these functions (the filename
) so it should be ok to check for a None
value to determine if the connection was broken.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any smaller files maybe to work with?
Also, if this stays as is, maybe xfail
is more appropriate than skip?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I agree,
xfail
suits better here. I've changed it. - Since the changes to make these tests faster I haven't noticed any connection problems. I would still keep the tests with
xfail
just in case (I don't think it hurts to have it 🙂)
I was going through @keflavich comments to see whether all got addressed and the only remaining one is about the lack of On the other hand, I see that Other than getting this clarified, the PR is ready for merge. Test coverage is up at 85% when the remote tests are run, so ignore the codecov complaint above. |
OK, reviewing my comments, I did indeed suggest I think I suggeset we add
it should do the default TAP thing of:
i.e., |
suggestion added in lvalerom#1 |
OK. So, In that case I would suggest we go ahead merge this and you open a PR against |
Thanks @lvalerom! I'll open issues as reminders for the ideas during the review that were pointing beyond this PR. |
Added new astroquery submodule called:
astroquery.esa.hsa
This module allows access to the ESA Herschel mission.