-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Safely convert fitsrec in tree before serializing with asdf #205
Conversation
Regression tests passed with no errors: https://plwishmaster.stsci.edu:8081/blue/organizations/jenkins/RT%2FJWST-Developers-Pull-Requests/detail/JWST-Developers-Pull-Requests/901/pipeline/202 |
Codecov ReportPatch coverage is
📢 Thoughts on this report? Let us know!. |
@@ -775,7 +775,8 @@ def callback(node, json_id): | |||
data = hdulist[pair].data | |||
return data | |||
return node | |||
af.tree = treeutil.walk_and_modify(af.tree, callback) | |||
# don't assign to af.tree to avoid an extra validation | |||
af._tree = treeutil.walk_and_modify(af.tree, callback) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing this assignment has 2 benefits.
- (required for this PR) the modified tree returned from
walk_and_modify
containsFITS_rec
instances (this function,_map_hdulist_to_arrays
is responsible for replacing the arrays referenced in the tree with the data from the hdus which might come back asFITS_rec
instances). Ifaf.tree
was assigned, asdf would attempt to validate the assigned tree which would result in warnings (in asdf 3.0) if the tree containedFITS_rec
instances - The assignment to
af.tree
results in validation of the tree. This is unnecessary as this function (_map_hdulist_to_arrays
) is called right after a call toasdf.open
(see from_fits_asdf in the same file) which by default will validate the tree on read. Assigning toaf._tree
avoids this duplicate validation.
# don't open_asdf(tree) as this will cause a second validation of the tree | ||
# instead open an empty tree, then assign to the hidden '_tree' | ||
asdffile = self.open_asdf(None, **kwargs) | ||
asdffile._tree = tree |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assignment to af._tree
instead of af.tree
is for the same reasons as described in the comment and in https://github.com/spacetelescope/stdatamodels/pull/205/files#r1316260255
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just something I'm curious about, PR looks good
|
||
|
||
def _fits_rec_to_array(fits_rec): | ||
bad_columns = [n for n in fits_rec.dtype.fields if np.issubdtype(fits_rec[n].dtype, np.unsignedinteger)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In rebuild_fits_rec_dtype
there is a somewhat different definition of a bad column:
table_dtype = dtype[field_name]
field_dtype = fits_rec.field(field_name).dtype
if np.issubdtype(table_dtype, np.signedinteger) and np.issubdtype(field_dtype, np.unsignedinteger):
Is that first check unnecessary? Since unsigned integers in FITS are always "pseudo unsigned"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so. To put it another way, the lack of unsigned integer support in FITS means that all unsigned integers in the fitsrec (so field_dtype
is unsigned) will be stored as a signed integer with an offset (so table_dtype
will be signed).
I'm happy to remove the first part of the check in rebuild_fits_rec_dtype
if it would be preferable to have that change in this PR. Alternatively I can open a follow-up PR (or just a tracking issue if the optimization is of lower priority).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opened an issue to track optimizing the comparison.
That way this PR can rely on the previously run regtests and a follow-up PR (after the next jwst release) can look into optimizing the comparison.
b05cda1
to
cc9b569
Compare
cc9b569
to
751a9f3
Compare
CI failure is unrelated. |
Prior to asdf 3.0,
FITS_rec
(a subclass of ndarray) was serialized like an ndarray. In asdf 3.0, this will result in a warning:This is done out of abundance of caution for oddities that can occur when treating
FITS_rec
like anndarray
. See below for an example.There are many places where stdatamodels passes a
FITS_rec
to asdf in a way that is unaware of these oddities (see this run log for a full list of warnings that will appear when asdf 3.0 is released or if testing stdatamodels against current asdf main: https://github.com/asdf-format/asdf/actions/runs/6088051343/job/16588221134)This PR converts all
FITS_rec
instances tondarray
instances prior to asdf serialization/validation. The conversion is done using a new method_fits_rec_to_array
which corrects columns containing unsigned integers.Example of where treating
FITS_rec
like andarray
can be problematicProduces invalid data in the view created from the
FITS_rec
.Checklist
CHANGES.rst
(either inBug Fixes
orChanges to API
)