-
-
Notifications
You must be signed in to change notification settings - Fork 408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix molecule parsing issue for CDMS #2385
Conversation
Hello @keflavich! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2022-05-06 00:54:31 UTC |
I can't run tests locally because of an astropy bug that I can't correct because I'm on a plane and |
The key changes here are to the start locations of rows. I've added several tests and extensively checked the CDMS database; this fixed-format is universal, it just contained more than I had realized originally. The correction suggested in #2375 (comment) is not correct; the |
Codecov Report
@@ Coverage Diff @@
## main #2385 +/- ##
==========================================
- Coverage 63.31% 63.00% -0.31%
==========================================
Files 133 133
Lines 17271 17269 -2
==========================================
- Hits 10935 10881 -54
- Misses 6336 6388 +52
📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more |
This is ready for review. I squashed the dozens of whitespace commits. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test need to be fixed, otherwise it looks good.
@@ -303,12 +331,66 @@ def tryfloat(x): | |||
CDMS = CDMSClass() | |||
|
|||
|
|||
def parse_letternumber(st): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add _
as this is not suppose to end up in the public API (and indeed it's not added to all)
|
||
return out | ||
|
||
|
||
def build_lookup(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
out of curiosity, what's the reason of this being factored out from the main class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember. 😬
""" | ||
|
||
response = MockResponseSpec('HC7S.data') | ||
tbl = CDMS._parse_result(response) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you ever supposed to call _parse_result
directly? If not the test should be refactored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no. I see your point. I was copying from an old/previous test, but it would be better to test the user-facing API here, I agree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh, yes. Then the previous test was failing, too, I suppose 😅
assert parse_letternumber("ZZ") == 3535 | ||
|
||
|
||
def test_hc7s(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test fails when run on it's own, it relies on other tests prior to it run successfully, a side effect of calling the private method directly.
astroquery/linelists/cdms/tests/test_cdms.py F [100%]
======================================================= FAILURES =======================================================
______________________________________________________ test_hc7s _______________________________________________________
def test_hc7s():
"""
Test for a very complicated molecule
CDMS.query_lines_async(100*u.GHz, 100.755608*u.GHz, molecule='HC7S', parse_name_locally=True)
"""
response = MockResponseSpec('HC7S.data')
> tbl = CDMS._parse_result(response)
astroquery/linelists/cdms/tests/test_cdms.py:97:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <astroquery.linelists.cdms.core.CDMSClass object at 0x135f68df0>
response = <astroquery.linelists.cdms.tests.test_cdms.MockResponseSpec object at 0x14d075ee0>, verbose = False
def _parse_result(self, response, verbose=False):
"""
Parse a response into an `~astropy.table.Table`
The catalog data files are composed of fixed-width card images, with
one card image per spectral line. The format of each card image is
similar to the JPL version:
FREQ, ERR, LGINT, DR, ELO, GUP, TAG, QNFMT, QN', QN"
(F13.4,F8.4, F8.4, I2,F10.4, I3, I7, I4, 6I2, 6I2)
but the formats are somewhat different and are encoded below.
The first several entries are the same, but more detail is appended at
the end of the line
FREQ: Frequency of the line in MHz.
ERR: Estimated or experimental error of FREQ in MHz.
LGINT: Base 10 logarithm of the integrated intensity in units of nm^2 MHz at
300 K.
DR: Degrees of freedom in the rotational partition function (0 for atoms,
2 for linear molecules, and 3 for nonlinear molecules).
ELO: Lower state energy in cm^{-1} relative to the ground state.
GUP: Upper state degeneracy.
MOLWT: The first half of the molecular weight tag, which is the mass in atomic
mass units (Daltons).
TAG: Species tag or molecular identifier. This only includes the
last 3 digits of the CDMS tag
A negative value of MOLWT flags that the line frequency has been
measured in the laboratory. We record this boolean in the 'Lab'
column. ERR is the reported experimental error.
QNFMT: Identifies the format of the quantum numbers
Ju/Ku/vu and Jl/Kl/vl are the upper/lower QNs
F: the hyperfine lines
name: molecule name
The full detailed description is here:
https://cdms.astro.uni-koeln.de/classic/predictions/description.html#description
"""
if 'Zero lines were found' in response.text:
raise EmptyResponseError(f"Response was empty; message was '{response.text}'.")
soup = BeautifulSoup(response.text, 'html.parser')
text = soup.find('pre').text
starts = {'FREQ': 0,
'ERR': 14,
'LGINT': 25,
'DR': 36,
'ELO': 38,
'GUP': 48,
'MOLWT': 51,
'TAG': 54,
'QNFMT': 58,
'Ju': 61,
'Ku': 63,
'vu': 65,
'F1u': 67,
'F2u': 69,
'F3u': 71,
'Jl': 73,
'Kl': 75,
'vl': 77,
'F1l': 79,
'F2l': 81,
'F3l': 83,
'name': 89}
result = ascii.read(text, header_start=None, data_start=0,
comment=r'THIS|^\s{12,14}\d{4,6}.*',
names=list(starts.keys()),
col_starts=list(starts.values()),
format='fixed_width', fast_reader=False)
result['FREQ'].unit = u.MHz
result['ERR'].unit = u.MHz
result['Lab'] = result['MOLWT'] < 0
result['MOLWT'] = np.abs(result['MOLWT'])
result['MOLWT'].unit = u.Da
for suf in 'ul':
for qn in ('J', 'v', 'K', 'F1', 'F2', 'F3'):
qnind = qn+suf
if result[qnind].dtype != int:
intcol = np.array(list(map(parse_letternumber, result[qnind])),
dtype=int)
result[qnind] = intcol
# if there is a crash at this step, something went wrong with the query
# and the _last_query_temperature was not set. This shouldn't ever
# happen, but, well, I anticipate it will.
> if self._last_query_temperature == 0:
E AttributeError: 'CDMSClass' object has no attribute '_last_query_temperature'
astroquery/linelists/cdms/core.py:269: AttributeError
=============================================== short test summary info ================================================
FAILED astroquery/linelists/cdms/tests/test_cdms.py::test_hc7s - AttributeError: 'CDMSClass' object has no attribute ...
================================================== 1 failed in 0.93s ===================================================
Does this now fully address #2375? If yes, please link them together making sure the merge will close the issue. |
The failing/weird mocks were inherited from jplspec, which has tests that skip the actual query steps and just test the mock responses. So we should refactor that too. |
bugfix; test is WIP
whitespace cleanup pep8 cleanup pep8 cleanup escaped character fix. The previous commit fixed the outstanding pep8 issue but the fix was not recognized fix import yay tests pass more cleanup of the tags more fixes fix local tests
fix mistake remote tests pass now fix docs code style fixes (whitespace) rst whitespace text (I don't think my code highlighter works right for rst files...) restore an unintentionally-deleted line (but this probably won't fix the failure b/c this "response" is still not defined... we need this example to use the previous as context...)
confusingly.... describe how string matching is done. Also do some minor performance cleanup
Yeah, I suppose we missed a few things here and there during reviews 😅 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extremely minor nitpicks that I'll fix in a commit along with the whitespace, so we can merge this.
|
||
|
||
@pytest.mark.remote_data | ||
def test_2375(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use names (and PR titles) that are descriptive. No one will remember what issue 2375 was when this test starts failing :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test_complicated_molecule_parsing, then? (but link #2375 in the docstring?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeap, I went with test_molecule_with_parens
, and agree, keeping the reference to the issue number is super useful
CHANGES.rst
Outdated
@@ -24,6 +24,11 @@ jplsbdb | |||
- Fix a bug for jplsdbd query when the returned physical quantity contains | |||
a unit with exponential. [#2377] | |||
|
|||
linelists/cdms |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we are checking for it, but the syntax here follows the namespace rather than the directory structure (separated with .
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, I didn't realize
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is something we blindly inherited from core astropy, but I don't think we actually check for it anywhere. So at this point, it's just "historical reasons" 🤷♀️
Codecov is confused and messed up due to the totally unrelated nrao disabling, so I just go ahead and merge this now. |
WIP: this fixes only half the problem. Tests are needed.
Problem is described in #2375: there are molecules with
()
's in the names that break the regex. The current fix escapes those automatically.There's a second problem: the fixed-width parser fails for some subset of molecules. Todo: