-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
convert numpy scalars to python types before yaml encoding #1605
Merged
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this fully preserve floating point precision?
Alternately you can just set the
numpy
representation version you want usingnp.set_printoptions
Adding something like this snippet in the correct place (such as
conftest.py
to fix unit tests) should also fix the issueFor example adding this to
romancal
's top levelconftest.py
fixes the errors resulting from the proposednumpy
2.0 changes.See astropy/astropy#15096
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that astropy/astropy#15065, in particular the astropy.io.yaml suggest a much clearer way to do this than my above suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking a look and for sharing the links to how astropy is handling this change. astropy/astropy#15065 looks interesting (as it appears numpy calls
str
prior torepr
where it adds the dtype portion of repr). However due to the way pyyaml deserializes floats I'm not sure we want to continue usingstr
orrepr
on the numpy scalars (more on that below).The short answer the precision question is that this PR should have no negative impact (and in some cases have a positive impact) on roundtripping floating point scalars.
The long answer is more complicated.
There has been some recent discussion on scalar handling and ASDF (see #1519). As discussed, the YAML standard is not overly prescriptive on float precision https://yaml.org/type/float.html. ASDF does not further define float handling and we should document that users should use arrays stored in ASDF blocks for accurate roundtripping and control of precision. I'm pinging @perrygreenfield @eslavich and @nden as they can hopefully fill in details from the discussion that I've forgotten and/or failed to note.
The simplest case is float128 (and anything else more than 64 bits). asdf currently deserializes floating point scalars in the tree as python native
float
. This means that there is already and continues to be loss of precision for (systems that support) floats with more than 64 bits when these values are written to the tree and not the ASDF blocks.For 64 bit floats both numpy and python reprs (by default) select the shortest string that will roundtrip (see https://docs.python.org/3/tutorial/floatingpoint.html). So converting a
np.float64
tofloat
prior to serialization has no impact on precision.For floats with less than 64 bits the situation is the messiest and is where the changes in this PR will have a small impact on what is written to and roundtripped through an ASDF file. The difference comes from the numpy repr choosing the shortest string that roundtrips with the precision of the datatype (whereas asdf will always convert the value to 64 bits on read). It's probably helpful to use an example.
Prior to this PR,
np.float16(3.143)
did not roundtrip.This can be boiled down to numpy repr choosing '3.143' to represent the value, which when loaded as a float produces a slightly different '3.143'.
However, as expected, casting the read value back to 16 bit allows the comparison to pass:
To summarize, prior to this PR, what was written for a <64 bit numpy floating point scalar was the shortest string that would round trip if the precision of the original float was used. However because pyyaml (and asdf) read the floats as 64 bits, the read values can fail a 64 bit comparison.
With this PR the values are converted to a float prior to conversion to a string for serialization. Using the above example, with this PR
Casting the value back to 16 bits also works without issue:
However it should be noted that to achieve this, the string written to the file is different due to the conversion to 64 bits prior to the call to repr (which will now select the shortest string that reconstructs the 64 instead of 16 bit float).
So for this example with this PR asdf will write
3.142578125
instead of3.143
to the file.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation. As I stated before, I wanted to make sure ASDF can "roundtrip" scalars correctly.
Now that light has been shined on the issue of numpy scalars, I think ASDF should work towards serializing numpy scalars differently than the built in python scalars. NEP 41 is working towards having more complicated
dtypes
(which would apply to scalars too). One of the main motivations for this effort is to encode units into thedtype
itself (for performance). Looking towards this indicates that ASDF might want to start considering how to encode thedtype
for scalars.