convert numpy scalars to python types before yaml encoding #1605

braingram · 2023-07-26T16:11:37Z

nep51 changes numpy scalars modifying the repr.

this change conflicts with represent_float/int (from pyyaml, which uses repr). With numpy 2.0 (which implements nep51) scalars are being encoded as, for example, 'np.float64(3.14)' instead of '3.14'. To work around this change, convert the values to builtin python types before passing them to pyyaml for encoding.

This PR also disables pyyaml devdeps testing as it is incompatible with cython 3.0 and it's failure to install stops devdeps testing.

braingram · 2023-07-26T16:55:24Z

jwst devdeps failure is because of an unreleased fix in stdatamodels:
spacetelescope/stdatamodels#184
and could also be fixed by:
#1594

WilliamJamieson · 2023-07-26T22:31:29Z

asdf/yamlutil.py

 for scalar_type in util.iter_subclasses(np.floating):
-    AsdfDumper.add_representer(scalar_type, AsdfDumper.represent_float)
+    AsdfDumper.add_representer(scalar_type, lambda dumper, data: dumper.represent_float(float(data)))


Will this fully preserve floating point precision?

Alternately you can just set the numpy representation version you want using np.set_printoptions

Adding something like this snippet in the correct place (such as conftest.py to fix unit tests) should also fix the issue

from asdf.util import minversion import numpy as np if not minversion(np, "2.0.dev"): np.set_printoptions(legacy="1.25")

For example adding this to romancal's top level conftest.py fixes the errors resulting from the proposed numpy 2.0 changes.

See astropy/astropy#15096

Note that astropy/astropy#15065, in particular the astropy.io.yaml suggest a much clearer way to do this than my above suggestion.

Thanks for taking a look and for sharing the links to how astropy is handling this change. astropy/astropy#15065 looks interesting (as it appears numpy calls str prior to repr where it adds the dtype portion of repr). However due to the way pyyaml deserializes floats I'm not sure we want to continue using str or repr on the numpy scalars (more on that below).

The short answer the precision question is that this PR should have no negative impact (and in some cases have a positive impact) on roundtripping floating point scalars.

The long answer is more complicated.

There has been some recent discussion on scalar handling and ASDF (see #1519). As discussed, the YAML standard is not overly prescriptive on float precision https://yaml.org/type/float.html. ASDF does not further define float handling and we should document that users should use arrays stored in ASDF blocks for accurate roundtripping and control of precision. I'm pinging @perrygreenfield @eslavich and @nden as they can hopefully fill in details from the discussion that I've forgotten and/or failed to note.

The simplest case is float128 (and anything else more than 64 bits). asdf currently deserializes floating point scalars in the tree as python native float. This means that there is already and continues to be loss of precision for (systems that support) floats with more than 64 bits when these values are written to the tree and not the ASDF blocks.

For 64 bit floats both numpy and python reprs (by default) select the shortest string that will roundtrip (see https://docs.python.org/3/tutorial/floatingpoint.html). So converting a np.float64 to float prior to serialization has no impact on precision.

For floats with less than 64 bits the situation is the messiest and is where the changes in this PR will have a small impact on what is written to and roundtripped through an ASDF file. The difference comes from the numpy repr choosing the shortest string that roundtrips with the precision of the datatype (whereas asdf will always convert the value to 64 bits on read). It's probably helpful to use an example.
Prior to this PR, np.float16(3.143) did not roundtrip.

>>> v = np.float16(3.143) >>> asdf.testing.helpers.roundtrip_object(v) == v False

This can be boiled down to numpy repr choosing '3.143' to represent the value, which when loaded as a float produces a slightly different '3.143'.

>>> repr(v) 3.143 >>> fv = float(repr(v)) # repr also 3.143 >>> fv == v False >>> abs(fv - v) 0.00042187499999979394

However, as expected, casting the read value back to 16 bit allows the comparison to pass:

>>> np.float16(fv) == v True

To summarize, prior to this PR, what was written for a <64 bit numpy floating point scalar was the shortest string that would round trip if the precision of the original float was used. However because pyyaml (and asdf) read the floats as 64 bits, the read values can fail a 64 bit comparison.

With this PR the values are converted to a float prior to conversion to a string for serialization. Using the above example, with this PR

>>> v = np.float16(3.143) >>> asdf.testing.helpers.roundtrip_object(v) == v np.True_ # different repr due to NEP51

Casting the value back to 16 bits also works without issue:

>>> np.float16(asdf.testing.helpers.roundtrip_object(v)) == v np.True_

However it should be noted that to achieve this, the string written to the file is different due to the conversion to 64 bits prior to the call to repr (which will now select the shortest string that reconstructs the 64 instead of 16 bit float).

>>> v = np.float16(3.143) >>> v np.float16(3.143) >>> float(v) 3.142578125

So for this example with this PR asdf will write 3.142578125 instead of 3.143 to the file.

Thanks for the explanation. As I stated before, I wanted to make sure ASDF can "roundtrip" scalars correctly.

Now that light has been shined on the issue of numpy scalars, I think ASDF should work towards serializing numpy scalars differently than the built in python scalars. NEP 41 is working towards having more complicated dtypes (which would apply to scalars too). One of the main motivations for this effort is to encode units into the dtype itself (for performance). Looking towards this indicates that ASDF might want to start considering how to encode the dtype for scalars.

nep51 changes numpy scalars modifying the repr. this change conflicts with represent_float/int (which uses repr) and with numpy 2.0 (which implements nep51) scalars are being encoded as, for example, 'np.float64(3.14)' instead of '3.14'. To work around this change, convert the values to builtin python types before passing them to pyyaml for encoding.

braingram · 2023-07-27T17:26:02Z

3.9 devdeps failure is because the scipy nightly for 3.9 failed to build due to an anaconda server 500:
https://github.com/scipy/scipy/actions/runs/5682181723/job/15400012240

convert numpy scalars to python types before yaml encoding (cherry picked from commit b43b2c5)

braingram added Downstream CI development No backport required labels Jul 26, 2023

github-actions bot modified the milestone: 3.0.0 Jul 26, 2023

braingram mentioned this pull request Jul 26, 2023

use scipy directly instead of astropy modeling for residual fringe fit spacetelescope/jwst#7764

Merged

7 tasks

braingram force-pushed the nep51 branch from f01281d to ccc1d31 Compare July 26, 2023 16:29

braingram marked this pull request as ready for review July 26, 2023 16:30

braingram requested a review from a team as a code owner July 26, 2023 16:30

WilliamJamieson reviewed Jul 26, 2023

View reviewed changes

WilliamJamieson approved these changes Jul 27, 2023

View reviewed changes

braingram mentioned this pull request Jul 27, 2023

ASDF handling of numpy scalar values (and scalar precision) #1519

Open

braingram force-pushed the nep51 branch from ccc1d31 to 5cce445 Compare July 27, 2023 16:49

braingram merged commit b43b2c5 into asdf-format:main Jul 27, 2023

braingram deleted the nep51 branch July 27, 2023 17:26

braingram mentioned this pull request Jul 27, 2023

Re-add pyyaml to devdeps testing when it gets cython 3 compatibility #1607

Closed

WilliamJamieson mentioned this pull request Jul 27, 2023

Remove numpy 2.0 hack spacetelescope/romancal#797

Merged

5 tasks

braingram added a commit to braingram/asdf that referenced this pull request Aug 1, 2023

Merge pull request asdf-format#1605 from braingram/nep51

4d4c3d7

convert numpy scalars to python types before yaml encoding (cherry picked from commit b43b2c5)

braingram mentioned this pull request Aug 1, 2023

Update 2.15.x branch #1609

Merged

braingram added a commit to braingram/asdf that referenced this pull request Aug 1, 2023

Merge pull request asdf-format#1605 from braingram/nep51

bf53354

convert numpy scalars to python types before yaml encoding (cherry picked from commit b43b2c5)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

convert numpy scalars to python types before yaml encoding #1605

convert numpy scalars to python types before yaml encoding #1605

braingram commented Jul 26, 2023 •

edited

Loading

braingram commented Jul 26, 2023

WilliamJamieson Jul 26, 2023

WilliamJamieson Jul 27, 2023

braingram Jul 27, 2023

WilliamJamieson Jul 27, 2023

braingram commented Jul 27, 2023

convert numpy scalars to python types before yaml encoding #1605

convert numpy scalars to python types before yaml encoding #1605

Conversation

braingram commented Jul 26, 2023 • edited Loading

braingram commented Jul 26, 2023

WilliamJamieson Jul 26, 2023

Choose a reason for hiding this comment

WilliamJamieson Jul 27, 2023

Choose a reason for hiding this comment

braingram Jul 27, 2023

Choose a reason for hiding this comment

WilliamJamieson Jul 27, 2023

Choose a reason for hiding this comment

braingram commented Jul 27, 2023

braingram commented Jul 26, 2023 •

edited

Loading