Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Safely convert fitsrec in tree before serializing with asdf #205

Merged
merged 3 commits into from
Sep 11, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CHANGES.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,9 @@ Other
- Change format of the MirMrsPtCorrModel to use a 1d reference table
instead of 2d FITS image extensions [#196]

- Convert ``FITS_rec`` instances to arrays before serializing or
validating with asdf [#205]


1.8.0 (2023-08-24)
==================
Expand Down
3 changes: 2 additions & 1 deletion src/stdatamodels/fits_support.py
Original file line number Diff line number Diff line change
Expand Up @@ -775,7 +775,8 @@ def callback(node, json_id):
data = hdulist[pair].data
return data
return node
af.tree = treeutil.walk_and_modify(af.tree, callback)
# don't assign to af.tree to avoid an extra validation
af._tree = treeutil.walk_and_modify(af.tree, callback)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing this assignment has 2 benefits.

  1. (required for this PR) the modified tree returned from walk_and_modify contains FITS_rec instances (this function, _map_hdulist_to_arrays is responsible for replacing the arrays referenced in the tree with the data from the hdus which might come back as FITS_rec instances). If af.tree was assigned, asdf would attempt to validate the assigned tree which would result in warnings (in asdf 3.0) if the tree contained FITS_rec instances
  2. The assignment to af.tree results in validation of the tree. This is unnecessary as this function (_map_hdulist_to_arrays) is called right after a call to asdf.open (see from_fits_asdf in the same file) which by default will validate the tree on read. Assigning to af._tree avoids this duplicate validation.



def from_fits_hdu(hdu, schema, cast_arrays=True):
Expand Down
11 changes: 8 additions & 3 deletions src/stdatamodels/model_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@
from . import properties
from . import schema as mschema
from . import validate
from .util import get_envar_as_boolean, remove_none_from_tree
from .util import convert_fitsrec_to_array_in_tree, get_envar_as_boolean, remove_none_from_tree

from .history import HistoryList

Expand Down Expand Up @@ -585,10 +585,15 @@ def to_asdf(self, init, *args, **kwargs):
`~asdf.AsdfFile.write_to`.
"""
self.on_save(init)
self.validate()
asdffile = self.open_asdf(self._instance, **kwargs)
self.validate() # required to trigger ValidationWarning
tree = convert_fitsrec_to_array_in_tree(self._instance)
# don't open_asdf(tree) as this will cause a second validation of the tree
# instead open an empty tree, then assign to the hidden '_tree'
asdffile = self.open_asdf(None, **kwargs)
asdffile._tree = tree
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assignment to af._tree instead of af.tree is for the same reasons as described in the comment and in https://github.com/spacetelescope/stdatamodels/pull/205/files#r1316260255

asdffile.write_to(init, *args, **kwargs)


@classmethod
def from_fits(cls, init, schema=None, **kwargs):
"""
Expand Down
20 changes: 20 additions & 0 deletions src/stdatamodels/util.py
Original file line number Diff line number Diff line change
Expand Up @@ -318,6 +318,15 @@ def _remove_none(node):
return treeutil.walk_and_modify(tree, _remove_none)


def convert_fitsrec_to_array_in_tree(tree):
def _convert_fitsrec(node):
if isinstance(node, fits.FITS_rec):
return _fits_rec_to_array(node)
else:
return node
return treeutil.walk_and_modify(tree, _convert_fitsrec)


def rebuild_fits_rec_dtype(fits_rec):
dtype = fits_rec.dtype
new_dtype = []
Expand All @@ -329,3 +338,14 @@ def rebuild_fits_rec_dtype(fits_rec):
else:
new_dtype.append((field_name, table_dtype))
return np.dtype((np.record, new_dtype))


def _fits_rec_to_array(fits_rec):
bad_columns = [n for n in fits_rec.dtype.fields if np.issubdtype(fits_rec[n].dtype, np.unsignedinteger)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In rebuild_fits_rec_dtype there is a somewhat different definition of a bad column:

        table_dtype = dtype[field_name]
        field_dtype = fits_rec.field(field_name).dtype
        if np.issubdtype(table_dtype, np.signedinteger) and np.issubdtype(field_dtype, np.unsignedinteger):

Is that first check unnecessary? Since unsigned integers in FITS are always "pseudo unsigned"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so. To put it another way, the lack of unsigned integer support in FITS means that all unsigned integers in the fitsrec (so field_dtype is unsigned) will be stored as a signed integer with an offset (so table_dtype will be signed).

I'm happy to remove the first part of the check in rebuild_fits_rec_dtype if it would be preferable to have that change in this PR. Alternatively I can open a follow-up PR (or just a tracking issue if the optimization is of lower priority).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened an issue to track optimizing the comparison.

#206

That way this PR can rely on the previously run regtests and a follow-up PR (after the next jwst release) can look into optimizing the comparison.

if not len(bad_columns):
return fits_rec.view(np.ndarray)
new_dtype = rebuild_fits_rec_dtype(fits_rec)
arr = np.asarray(fits_rec, new_dtype).copy()
for name in bad_columns:
arr[name] = fits_rec[name]
return arr
3 changes: 2 additions & 1 deletion src/stdatamodels/validate.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
from asdf.util import HashableDict
import numpy as np

from .util import remove_none_from_tree
from .util import convert_fitsrec_to_array_in_tree, remove_none_from_tree


class ValidationWarning(Warning):
Expand Down Expand Up @@ -144,6 +144,7 @@ def _check_value(value, schema, ctx):
# converting to tagged tree, so that we don't have to descend unnecessarily
# into nodes for custom types.
value = remove_none_from_tree(value)
value = convert_fitsrec_to_array_in_tree(value)
value = yamlutil.custom_tree_to_tagged_tree(value, ctx._asdf)

if ctx._validate_arrays:
Expand Down