-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix handling of NDArrayType during block to fits hdu matching #1234
Fix handling of NDArrayType during block to fits hdu matching #1234
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -463,6 +463,29 @@ def test_array_view_compatible_dtype(tmp_path): | |||
af.write_to(file_path) | |||
|
|||
|
|||
def test_hdu_link_independence(tmp_path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a nice description/documentation string like you did for the test_resave_breaks_hdulist_tree_array_link
test? This isn't necessary, but the one in the other test was very helpful for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great suggestion! I added a docstring to the test. Let me know if it looks like what you had in mind.
NDArrayType fails checks to isinstance(arr, ndarray) as it is not a subclass of ndarray. This prevented util.get_array_base from finding the correct base array which contributed to failures to link hdu items to arrays within the tree (leading to duplication of data). The EmbeddedBlockManager in fits_embed also skipped checking for links between NDArrayType and hdu items which also broke the link between hdu items and arrays within the tree. This PR removes the skip. fixes issue asdf-format#1232
8e01f66
to
b812f42
Compare
Is this fix available yet in an ASDF release or only on github master? |
@hbushouse I'd note that this fixes only one of two problems that arose. The particular case that triggered the failure has many useless cutouts due to confused source leading to large cutouts. This fix prevents those cases from crashing the pipeline, but the results are still mostly useless, and for those cases where the segmentation is reasonable, the duplication of the data, while wasting space, doesn't cause any pipeline failures. For the pipeline to generate more meaningful results means fixing the segmentation issue (even if only rejecting any that require y sizes more than some threshold). |
@hbushouse The fix for the data duplication was released in 2.14 (here are the release notes), we did a patch release 2.14.1 (the current newest released version) to deal with astropy CI issues. For jwst this PR bumped the asdf version to 2.14.1 and included a test for the data duplication: spacetelescope/jwst#7358 |
Excellent! It was the update to required jwst dependencies that I was most concerned about. |
Let me know if I can help :) |
Fixes #1232
When an asdf tree containing arrays is read from a fits file, the arrays are created as NDArrayType objects (instead of ndarray). If this tree was resaved to a fits file the NDArrayType was mishandled during matching arrays to hdu items which resulted in duplicate storage of array data in hdu items and internal blocks.
This PR adds a few tests for this issue and fixes the handling of NDArrayType to allow for accurate matching between hdu items and arrays to avoid data duplication.