Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix handling of NDArrayType during block to fits hdu matching #1234

Merged

Conversation

braingram
Copy link
Contributor

Fixes #1232

When an asdf tree containing arrays is read from a fits file, the arrays are created as NDArrayType objects (instead of ndarray). If this tree was resaved to a fits file the NDArrayType was mishandled during matching arrays to hdu items which resulted in duplicate storage of array data in hdu items and internal blocks.

This PR adds a few tests for this issue and fixes the handling of NDArrayType to allow for accurate matching between hdu items and arrays to avoid data duplication.

Copy link
Contributor

@perrygreenfield perrygreenfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -463,6 +463,29 @@ def test_array_view_compatible_dtype(tmp_path):
af.write_to(file_path)


def test_hdu_link_independence(tmp_path):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a nice description/documentation string like you did for the test_resave_breaks_hdulist_tree_array_link test? This isn't necessary, but the one in the other test was very helpful for me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great suggestion! I added a docstring to the test. Let me know if it looks like what you had in mind.

https://github.com/braingram/asdf/blob/bug/hdu_array_link_breaking/asdf/tests/test_fits_embed.py#L467-L476

@braingram braingram marked this pull request as ready for review November 21, 2022 20:24
NDArrayType fails checks to isinstance(arr, ndarray) as it is not a
subclass of ndarray. This prevented util.get_array_base from finding
the correct base array which contributed to failures to link hdu
items to arrays within the tree (leading to duplication of data).

The EmbeddedBlockManager in fits_embed also skipped checking for
links between NDArrayType and hdu items which also broke the link
between hdu items and arrays within the tree. This PR removes the
skip.

fixes issue asdf-format#1232
@braingram braingram force-pushed the bug/hdu_array_link_breaking branch from 8e01f66 to b812f42 Compare November 22, 2022 14:26
@WilliamJamieson WilliamJamieson merged commit 0a6bfcb into asdf-format:master Nov 22, 2022
@braingram braingram deleted the bug/hdu_array_link_breaking branch November 22, 2022 19:40
@hbushouse
Copy link

Is this fix available yet in an ASDF release or only on github master?

@perrygreenfield
Copy link
Contributor

@hbushouse I'd note that this fixes only one of two problems that arose. The particular case that triggered the failure has many useless cutouts due to confused source leading to large cutouts. This fix prevents those cases from crashing the pipeline, but the results are still mostly useless, and for those cases where the segmentation is reasonable, the duplication of the data, while wasting space, doesn't cause any pipeline failures. For the pipeline to generate more meaningful results means fixing the segmentation issue (even if only rejecting any that require y sizes more than some threshold).

@braingram
Copy link
Contributor Author

@hbushouse The fix for the data duplication was released in 2.14 (here are the release notes), we did a patch release 2.14.1 (the current newest released version) to deal with astropy CI issues.

For jwst this PR bumped the asdf version to 2.14.1 and included a test for the data duplication: spacetelescope/jwst#7358

@hbushouse
Copy link

For jwst this PR bumped the asdf version to 2.14.1 and included a test for the data duplication: spacetelescope/jwst#7358

Excellent! It was the update to required jwst dependencies that I was most concerned about.

@braingram
Copy link
Contributor Author

Let me know if I can help :)
There was also a similar PR to stdatamodels (adding a test and bumping the asdf version): spacetelescope/stdatamodels#105 However, I do not believe a new version of that was released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unnecessary duplicate arrays when using asdf embedded in FITS files
4 participants