Safely convert fitsrec in tree before serializing with asdf #205

braingram · 2023-09-05T16:04:25Z

Prior to asdf 3.0, FITS_rec (a subclass of ndarray) was serialized like an ndarray. In asdf 3.0, this will result in a warning:

asdf.exceptions.AsdfConversionWarning: A ndarray subclass (<class 'astropy.io.fits.fitsrec.FITS_rec'>) was converted as a ndarray. This behavior will be removed from a future version of ASDF. See https://asdf.readthedocs.io/en/latest/asdf/config.html#convert-unknown-ndarray-subclasses

This is done out of abundance of caution for oddities that can occur when treating FITS_rec like an ndarray. See below for an example.

There are many places where stdatamodels passes a FITS_recto asdf in a way that is unaware of these oddities (see this run log for a full list of warnings that will appear when asdf 3.0 is released or if testing stdatamodels against current asdf main: https://github.com/asdf-format/asdf/actions/runs/6088051343/job/16588221134)

This PR converts all FITS_rec instances to ndarray instances prior to asdf serialization/validation. The conversion is done using a new method _fits_rec_to_array which corrects columns containing unsigned integers.

Example of where treating `FITS_rec` like a `ndarray` can be problematic

from astropy.io import fits
import numpy as np

arr = np.zeros([6], dtype=[('a', 'uint32')])
arr['a'] = np.arange(6)

h = fits.HDUList()
h.append(fits.PrimaryHDU())
h.append(fits.BinTableHDU())
h[-1].data = arr
h.writeto('foo.fits', overwrite=True)

with fits.open('foo.fits') as ff:
    print(ff[-1].data)
    print(ff[-1].data.view(np.ndarray))

Produces invalid data in the view created from the FITS_rec.

[(          0,) (          1,) (          2,) (          3,)
 (          4,) (          5,)]
[(-2147483648,) (-2147483647,) (-2147483646,) (-2147483645,)
 (-2147483644,) (-2147483643,)]

Checklist

added entry in CHANGES.rst (either in Bug Fixes or Changes to API)
updated relevant tests
updated relevant documentation
updated relevant milestone(s)
added relevant label(s)

braingram · 2023-09-05T18:28:05Z

Regression tests passed with no errors: https://plwishmaster.stsci.edu:8081/blue/organizations/jenkins/RT%2FJWST-Developers-Pull-Requests/detail/JWST-Developers-Pull-Requests/901/pipeline/202

codecov · 2023-09-05T18:32:41Z

Codecov Report

Patch coverage is 100.00% of modified lines.

Files Changed	Coverage
src/stdatamodels/fits_support.py	`100.00%`
src/stdatamodels/model_base.py	`100.00%`
src/stdatamodels/util.py	`100.00%`
src/stdatamodels/validate.py	`100.00%`

📢 Thoughts on this report? Let us know!.

braingram · 2023-09-05T18:35:59Z

src/stdatamodels/fits_support.py

@@ -775,7 +775,8 @@ def callback(node, json_id):
            data = hdulist[pair].data
            return data
        return node
-    af.tree = treeutil.walk_and_modify(af.tree, callback)
+    # don't assign to af.tree to avoid an extra validation
+    af._tree = treeutil.walk_and_modify(af.tree, callback)


Changing this assignment has 2 benefits.

(required for this PR) the modified tree returned from walk_and_modify contains FITS_rec instances (this function, _map_hdulist_to_arrays is responsible for replacing the arrays referenced in the tree with the data from the hdus which might come back as FITS_rec instances). If af.tree was assigned, asdf would attempt to validate the assigned tree which would result in warnings (in asdf 3.0) if the tree contained FITS_rec instances

The assignment to af.tree results in validation of the tree. This is unnecessary as this function (_map_hdulist_to_arrays) is called right after a call to asdf.open (see from_fits_asdf in the same file) which by default will validate the tree on read. Assigning to af._tree avoids this duplicate validation.

braingram · 2023-09-05T18:36:54Z

src/stdatamodels/model_base.py

+        # don't open_asdf(tree) as this will cause a second validation of the tree
+        # instead open an empty tree, then assign to the hidden '_tree'
+        asdffile = self.open_asdf(None, **kwargs)
+        asdffile._tree = tree


The assignment to af._tree instead of af.tree is for the same reasons as described in the comment and in https://github.com/spacetelescope/stdatamodels/pull/205/files#r1316260255

eslavich

Just something I'm curious about, PR looks good

eslavich · 2023-09-10T18:44:19Z

src/stdatamodels/util.py

+
+
+def _fits_rec_to_array(fits_rec):
+    bad_columns = [n for n in fits_rec.dtype.fields if np.issubdtype(fits_rec[n].dtype, np.unsignedinteger)]


In rebuild_fits_rec_dtype there is a somewhat different definition of a bad column:

table_dtype = dtype[field_name] field_dtype = fits_rec.field(field_name).dtype if np.issubdtype(table_dtype, np.signedinteger) and np.issubdtype(field_dtype, np.unsignedinteger):

Is that first check unnecessary? Since unsigned integers in FITS are always "pseudo unsigned"?

I think so. To put it another way, the lack of unsigned integer support in FITS means that all unsigned integers in the fitsrec (so field_dtype is unsigned) will be stored as a signed integer with an offset (so table_dtype will be signed).

I'm happy to remove the first part of the check in rebuild_fits_rec_dtype if it would be preferable to have that change in this PR. Alternatively I can open a follow-up PR (or just a tracking issue if the optimization is of lower priority).

I opened an issue to track optimizing the comparison.

#206

That way this PR can rely on the previously run regtests and a follow-up PR (after the next jwst release) can look into optimizing the comparison.

hbushouse · 2023-09-11T18:49:16Z

CI failure is unrelated.

braingram changed the title ~~strip fitsrec from tree before serializing with asdf~~ TEST: strip fitsrec from tree before serializing with asdf Sep 5, 2023

braingram changed the title ~~TEST: strip fitsrec from tree before serializing with asdf~~ TEST: safely convert fitsrec in tree before serializing with asdf Sep 5, 2023

braingram commented Sep 5, 2023

View reviewed changes

braingram changed the title ~~TEST: safely convert fitsrec in tree before serializing with asdf~~ Safely convert fitsrec in tree before serializing with asdf Sep 5, 2023

braingram marked this pull request as ready for review September 5, 2023 18:38

braingram requested a review from a team as a code owner September 5, 2023 18:38

braingram mentioned this pull request Sep 5, 2023

Move ndarray conversion to a Converter asdf-format/asdf#1537

Merged

eslavich reviewed Sep 10, 2023

View reviewed changes

braingram force-pushed the asdf_no_fitsrec branch from b05cda1 to cc9b569 Compare September 11, 2023 14:57

braingram added 2 commits September 11, 2023 11:27

strip fitsrec from tree before serializing with asdf

5081e20

update changelog

751a9f3

braingram force-pushed the asdf_no_fitsrec branch from cc9b569 to 751a9f3 Compare September 11, 2023 15:27

hbushouse added the FITS support label Sep 11, 2023

Merge branch 'master' into asdf_no_fitsrec

5f83bad

hbushouse approved these changes Sep 11, 2023

View reviewed changes

braingram mentioned this pull request Sep 11, 2023

Redundant condition in rebuild_fits_rec_dtype #206

Closed

hbushouse merged commit 91c7d8a into spacetelescope:master Sep 11, 2023

braingram deleted the asdf_no_fitsrec branch September 11, 2023 18:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Safely convert fitsrec in tree before serializing with asdf #205

Safely convert fitsrec in tree before serializing with asdf #205

braingram commented Sep 5, 2023 •

edited by hbushouse

Loading

braingram commented Sep 5, 2023 •

edited

Loading

codecov bot commented Sep 5, 2023 •

edited

Loading

braingram Sep 5, 2023

braingram Sep 5, 2023

eslavich left a comment

eslavich Sep 10, 2023

braingram Sep 11, 2023

braingram Sep 11, 2023

hbushouse commented Sep 11, 2023



		def _fits_rec_to_array(fits_rec):
		bad_columns = [n for n in fits_rec.dtype.fields if np.issubdtype(fits_rec[n].dtype, np.unsignedinteger)]

Safely convert fitsrec in tree before serializing with asdf #205

Safely convert fitsrec in tree before serializing with asdf #205

Conversation

braingram commented Sep 5, 2023 • edited by hbushouse Loading

Example of where treating FITS_rec like a ndarray can be problematic

braingram commented Sep 5, 2023 • edited Loading

codecov bot commented Sep 5, 2023 • edited Loading

Codecov Report

braingram Sep 5, 2023

Choose a reason for hiding this comment

braingram Sep 5, 2023

Choose a reason for hiding this comment

eslavich left a comment

Choose a reason for hiding this comment

eslavich Sep 10, 2023

Choose a reason for hiding this comment

braingram Sep 11, 2023

Choose a reason for hiding this comment

braingram Sep 11, 2023

Choose a reason for hiding this comment

hbushouse commented Sep 11, 2023

braingram commented Sep 5, 2023 •

edited by hbushouse

Loading

Example of where treating `FITS_rec` like a `ndarray` can be problematic

braingram commented Sep 5, 2023 •

edited

Loading

codecov bot commented Sep 5, 2023 •

edited

Loading