-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
link FITS_rec instances to hdu extensions on save #178
link FITS_rec instances to hdu extensions on save #178
Conversation
9d30b03
to
847aba1
Compare
847aba1
to
469aebe
Compare
I'm seeing a few failures in the regression tests:
@hbushouse @tapastro are these expected failures? An example error is:
|
469aebe
to
1edc19b
Compare
The regtest errors are expected only in the sense that they've been happening for a while now, but we can't figure out why. But regardless, they have nothing to do with this PR, so you're good. |
I think this should address JP-3277 but it probably makes sense to test this PR on one of the failures mentioned on the ticket. However, I don't know how to do that :) I'm happy to learn (if that makes sense) but would need to know how to find the data and run the failing step. |
I can give it a try. I'll install this PR version of stdatamodels and test it out. But if you want to try, feel free to install jwst in your environment, and you may be able to follow the repo Wiki for regression testing locally and use the tso regression test data to verify? |
1edc19b
to
fd89bbb
Compare
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #178 +/- ##
==========================================
+ Coverage 63.90% 63.95% +0.04%
==========================================
Files 101 101
Lines 5555 5565 +10
==========================================
+ Hits 3550 3559 +9
- Misses 2005 2006 +1
☔ View full report in Codecov by Sentry. |
I ran the Looking at the data during test with and without the changes in this PR... Without this PR, jw01091002001_03101_00001-seg001_nis_short_x1dints.fits, contains duplicated data (30 EXTRACT1D and 1 INT_TIMES extensions that are not referenced from the ASDF extension which contains 31 embedded binary blocks). With this PR the same files contains the same EXTRACT1D and INT_TIMES extensions however these are linked in the ASDF extension (which contains no binary blocks). |
DataModel converts recarrays to FITS_rec on assignment these were not correctly linked to FITS extensions on save resulting in duplication of FITS_rec data on save. This adds a test for the duplication and some book keeping to allow linking the converted FITS_recs and extensions on save. Astropy creates a 'view' of the FITS_rec when assigned to a new FITS extension which makes a simple 'is' check complicated (so the strategy used for ndarray and NDArrayType could not be used). Instead, the links between FITS_rec and extensions are tracked (via the FitsContext) which allows creation of the links without having re-find the hdu created from the FITS_rec. (The existing strategy is left in place for ndarray to maintain compatibility with existing ASDF-in-FITS behavior which does not require the same schema structure as the DataModels).
fd89bbb
to
eb516d5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@tapastro Have you had a chance to do independent testing of this fix? |
I ran a single spec2 association which used to produce a 440MB x1dints file, where the asdf extension was ~223 MB. This code produced a 250MB file with a 32MB asdf extension. Seems like it did a good job! I don't understand how to check if any of the arrays are linked or reproduced in the extension, though. |
Thanks for checking! That definitely looks promising. There's no 'official' way to check the links. What I've been doing is similar to what is in the test added with this PR: stdatamodels/tests/test_fits.py Lines 667 to 677 in fbd97cc
In brief, when stdatamodels links an item in the ASDF tree to a fits extension it changes the data 'source' stored in the ASDF tree to a string like So a normal array will appear something like this in the ASDF yaml. lookup_table: !core/ndarray-1.0.0
source: 0
datatype: float32
byteorder: big
shape: [2048, 2048] Whereas a linked array will appear something like: dq: !core/ndarray-1.0.0
source: fits:DQ,1
datatype: uint32
byteorder: big
shape: [10, 256, 2048] (note that the above 2 examples were both taken from the same '*_nis_short_calints.fits' file so a file can have both linked and non-linked arrays/tables, in this case the non-linked arrays are related to the wcs). Let me know if more details would be helpful. |
Ah, in that case it's not quite going right - I see 6618 unlinked arrays, and doing a findall on "source:" shows there are 9928 total arrays in the file. I can move this to somewhere public for you to take a look, if you want. |
DataModel converts recarrays to FITS_rec on assignment these were not correctly linked to FITS extensions on save resulting in duplication of FITS_rec data on save.
This adds a test for the duplication and some book keeping to allow linking the converted FITS_recs and extensions on save.
Astropy creates a 'view' of the FITS_rec when assigned to a new FITS extension which makes a simple 'is' check complicated (so the strategy used for ndarray and NDArrayType could not be used). Instead, the links between FITS_rec and extensions are tracked (via the FitsContext) which allows creation of the links without having re-find the hdu created from the FITS_rec. (The existing strategy is left in place for ndarray to maintain compatibility with existing ASDF-in-FITS behavior which does not require the same schema structure as the DataModels).
Resolves JP-3277
Regression tests running: https://plwishmaster.stsci.edu:8081/blue/organizations/jenkins/RT%2FJWST-Developers-Pull-Requests/detail/JWST-Developers-Pull-Requests/776/pipeline/202/
Docs failures are unrelated (see: #180)
Checklist
CHANGES.rst
(either inBug Fixes
orChanges to API
)