-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use deepdiff
instead of asdf.commands.diff
for output and truth file comparisons
#868
Conversation
Codecov ReportPatch coverage is
📢 Thoughts on this report? Let us know!. |
deepdiff
instead of asdf.commands.diff
for output and truth file comparisons
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
541cd48
to
0ebaead
Compare
This looks good to me. I agree that there's enough complexity in these comparisons that it makes sense to separate this from asdf---the thorniest example being the WCS object. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good. Some comments:
- At some point we should add some tests, but I don't think we need those in this PR.
- The scientifically relevant WCS consistency is about ~1/100 pix, which would correspond to 0.001 arcsec atol. One might hope that for this kind of regression testing we could do 10 or 100x better than that. That would be in 2D, so it would be something like np.all(coord1.separation(coord2) < 0.001 * u.arcsec).
- It seems likely that at some point in the future we'll want to specify different levels of precision on different arrays. That will take some changes but I don't think we should do it until we need it.
Thanks for taking a look and for the response. Adding some tests is a great idea. If the general approach and format look good I can add a few to get the bulk of the coverage back. Perhaps at least a failing and passing I'm also happy to update the wcs output based on your suggestion. I'd have to look a bit more at the wcs comparsion to figure out: does the output of the bounding box projection contain units? If not, what is the corresponding unit? Adjusting the comparisons per-array seems doable (I believe the comparison is aware of the key/path of the object being compared). I'd have to think about this a bit to figure out if there is a low-complexity way of adding this configuration (possibly passing in comparison functions from the regtest that calls For the above changes, would you prefer that I update this PR or open follow-up issues (so that this PR can be merged sooner to start using the included changes for testing other PRs)? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
537210d
to
a21a8b5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two requests so that we can insure that the fix works and remains working, can you add two tests:
- A test which runs a step twice and generates two identical outputs in different files and then runs
compare_asdf
on them and asserts they are identical. This way we know that in theory it passes on identical results. - More importantly, a test which runs a step twice (or two different steps), in such a way that we expect the output files to be different, but the same in structure. Then run
compare_asdf
on them and asserts the files are not identical. This way we know its now correctly identifying when two files are different under the circumstances that most of the regtests are run.
Thanks Brett. Yes, I think we want to proceed with this approach. I don't think we need the improved WCS comparison at the moment, and think we'd want to explore a bit how well we actually do. But I was trying to work out what to expect to need in the future. And yeah, let's just defer adding extra flexibility / complexity into the numpy comparisons until we find we actually need it. Any objections to going ahead and merging? Thanks! |
I agree with this; however, the sanity tests I requested are to guard against root problem that this PR is trying to solve. Namely, |
a21a8b5
to
eb69b3f
Compare
Thanks! I added 2 tests in: eb69b3f They are generating and saving level 2 data models (rather than running steps). Let me know if these are sufficient for the current needs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding those two tests. Everything else looks good to me.
disable pytest verbose output in jenkins files as pretty print output should appear in test summaries
e894ad1
to
9f703eb
Compare
Rebased and running regression tests: |
The regression tests show 8 errors (and a slew of warnings, many of these occurred in run 376 which used #872 so they're now and probably were already in main).
All errors are using the new |
Perfect. Ready to merge? |
Works for me! Thanks. |
There are many uses of
compare_asdf
in the regression tests like the following:romancal/romancal/regtest/test_wfi_pipeline.py
Line 43 in 68e6e0d
These
assert
s are alwaysTrue
ascompare_asdf
does not return a value (so it's output will always beNone
).romancal/romancal/regtest/regtestdata.py
Lines 528 to 532 in 68e6e0d
Furthermore,
asdf.commands.diff
produces unhelpful output when comparisons fail (see: #867 (comment)).This PR replaces
asdf.commands.diff
with deepdiff (and adds it as a test dependency). deepdiff appears to be a well supported and flexible library for comparing nested python objects (although I have little experience with it).deepdiff
comparisons can be configured to support custom objects by implementingOperator
subclasses. This PR includesOperator
subclasses for:ndarray
andNDArrayType
(which also coversQuantity
)astropy.time.Time
gwcs.WCS
Each of these implements a custom comparison which allows controlling "equality" and the output that is generated when objects are not equal.
Fixing the above issue with
compare_asdf
has exposed a number of regression test failures.Regtest run: https://plwishmaster.stsci.edu:8081/blue/organizations/jenkins/RT%2FRoman-Developers-Pull-Requests/detail/Roman-Developers-Pull-Requests/362/pipeline/205
With this PR
test_dark_current_outfile_step
is failing with:To enable the output to be
pretty_print
ed the usage ofcompare_asdf
was changed to match similar code in jwst where a diff is computed then the result inspected and optionally a report generated. This also allows for an easy fix for #870Fixes #870
Checklist
CHANGES.rst
under the corresponding subsection