Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JP-3654: NSClean subarray speedup #8745

Merged
merged 17 commits into from
Sep 11, 2024

Conversation

t-brandt
Copy link
Contributor

@t-brandt t-brandt commented Aug 30, 2024

Resolves JP-3654

Closes #8551

This PR changes the behavior of NSClean in subarray mode. Previously, the code would fit all unilluminated pixels simultaneously. This required very large amounts of time and memory for larger subarrays. This PR changes that behavior to compute the correction in overlapping regions, resulting in large speedups with little change in the results. The reference pixels are no longer used in the correction, which helps with some ringing artifacts near the detector edges.

Checklist for PR authors (skip items if you don't have permissions or they are not applicable)

  • added entry in CHANGES.rst within the relevant release section
  • updated or added relevant tests
  • updated relevant documentation
  • added relevant milestone
  • added relevant label(s)
  • ran regression tests, post a link to the Jenkins job below.
    How to run regression tests on a PR
  • All comments are resolved
  • Make sure the JIRA ticket is resolved properly

@t-brandt t-brandt requested a review from a team as a code owner August 30, 2024 18:55
Copy link

codecov bot commented Aug 30, 2024

Codecov Report

Attention: Patch coverage is 3.84615% with 50 lines in your changes missing coverage. Please review.

Project coverage is 60.79%. Comparing base (8f6814f) to head (12f5d3e).

Files with missing lines Patch % Lines
jwst/nsclean/nsclean.py 2.12% 46 Missing ⚠️
jwst/nsclean/lib.py 20.00% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8745      +/-   ##
==========================================
- Coverage   60.85%   60.79%   -0.06%     
==========================================
  Files         373      373              
  Lines       38657    38690      +33     
==========================================
  Hits        23523    23523              
- Misses      15134    15167      +33     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@melanieclarke melanieclarke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this! It looks good to me, just a couple small requests.

I will run regression tests and test this branch locally on some subarray.

It looks like there's a conflict in the change log. Can you please fix it?

jwst/nsclean/lib.py Outdated Show resolved Hide resolved
jwst/nsclean/nsclean.py Outdated Show resolved Hide resolved
jwst/nsclean/nsclean.py Outdated Show resolved Hide resolved
jwst/nsclean/nsclean.py Outdated Show resolved Hide resolved
@melanieclarke melanieclarke changed the title Nsclean subarray speedup JP-3654: NSClean subarray speedup Sep 4, 2024
@melanieclarke
Copy link
Collaborator

@t-brandt - I'd like to get this in if we can, since we're closing in on the end of the development period for the current build. Would you mind if I go ahead and make the last clean up changes?

Co-authored-by: Melanie Clarke <[email protected]>
@t-brandt
Copy link
Contributor Author

t-brandt commented Sep 6, 2024

Please do go ahead and do the last cleanup. Thanks!

@melanieclarke
Copy link
Collaborator

melanieclarke commented Sep 6, 2024

Testing locally, results look similar to previous, and processing time is greatly improved.

It's now possible to process subarray=ALLSLITS data, so I removed the restriction on that data type.

Test data was:
SUBS200A1: jw02757002001_03104_00001_nrs1_rate.fits
ALLSLITS: jw01128004001_0310g_00003_nrs1_rate.fits

@melanieclarke melanieclarke added this to the Build 11.1 milestone Sep 6, 2024
@melanieclarke melanieclarke added the nsclean NIRSpec NSClean algorithm label Sep 6, 2024
@melanieclarke
Copy link
Collaborator

melanieclarke commented Sep 6, 2024

Regression tests started here: https://plwishmaster.stsci.edu:8081/job/RT/job/JWST-Developers-Pull-Requests/1690/

Tests aborted - looking at some results locally first.

@melanieclarke
Copy link
Collaborator

@t-brandt - Looking at local regression test results, I see a couple things we should maybe discuss some more.

While the ALLSLITS data runs through nsclean, results are quite poor with mask_spectral_regions=True (the default). I think this is just the known issue with with overfitting highly masked regions, and the workaround is to run with mask_spectral_regions = False and modify nsigma as needed. I don't think we need to address that in this PR, but we should consider different default values.

Also, the change to remove forcing the edges to be included in the fit (mask=True) seems to be introducing some edge effects in some cases. For the SUBS200A1 regression test data, it shows up only in the reference columns (they are not NaN in this older input rate data). For the ALLSLITS regression test data, it shows up in the next couple columns too.

Are the edge issues something we can fix in the subarray correction now? Or should we revert the change to the reference pixel masking?

Here's what the ALLSLITS regression test data looks like. Top is nscleaned, bottom is the rate file.

allslits_regression_test

@melanieclarke
Copy link
Collaborator

For comparison, here's what the same data looks like with better masking (top) and with better masking + the reference edges included in the fit (bottom).

allslits_with_fixes

@t-brandt
Copy link
Contributor Author

t-brandt commented Sep 6, 2024

Maybe what I/we should do then is to zero out the 1/f correction in the reference pixels? Right now they are masked, and the correction is extrapolated into those pixels (which is a terrible approximation). Zeroing out the correction in those pixels is a one or two line change. I don't think leaving the reference pixels in the mask should be an option, since they should have already been used in the refpix step and will have no more information to add.

@melanieclarke
Copy link
Collaborator

That sounds worth trying - it's similiar to the way the full frame correction sets the correction to zero if there are no unmasked pixels in the column. For the ALLSLITS case, the edge effects show up in the first 6 columns, so refpix + 2 real science columns. Would that address those columns too?

@t-brandt
Copy link
Contributor Author

t-brandt commented Sep 9, 2024

Ok, I have pushed several small changes to fix the edge effects. These happened when a large, contiguous part of the area used for the correction was masked for fitting 1/f. In those cases, the matrices became very poorly conditioned. The approach now skips rows at the end without any available pixels for fitting 1/f.

@melanieclarke
Copy link
Collaborator

melanieclarke commented Sep 9, 2024

Local regression tests and spot checks with my other test data sets look good. Thanks for the fix!

I'll run the full regression test set now and hopefully we can get this merged!

Regression tests running here:
https://plwishmaster.stsci.edu:8081/job/RT/job/JWST-Developers-Pull-Requests/1697/

@t-brandt
Copy link
Contributor Author

The overfitting present with the default parameters is because the range of frequencies that NSClean is fitting is too large when a large fraction of the array is missing due to the mask. This can be fixed by changing these frequencies (see the image below). If we want to actually implement this is will be a change of a few lines; I can do it tomorrow. I think the most natural way to implement the change would be to check for allslits mode and mask_spectral_regions set to True, and if both of these are set, change the frequency cutoffs from current the default values.
comparison

@melanieclarke
Copy link
Collaborator

@t-brandt - that sounds reasonable to me to include here, since this is the first time ALLSLITS data will be correctable with NSClean, and since it really is unusable with current default parameters.

@t-brandt
Copy link
Contributor Author

Ok, the default behavior for ALLSLITS should now be reasonable.

@melanieclarke
Copy link
Collaborator

@melanieclarke
Copy link
Collaborator

@hayescr - tagging you in to this conversation for NIRSpec's information.

This PR is the fix for subarray performance, which enables ALLSLITS subarrays to be corrected. The default correction frequencies worked very poorly when spectral regions are masked, so we are modifying the cutoff frequencies for that subarray specifically. We should discuss at a later date if we want to do something similar for the other modes that frequently get overcorrected, like MOS.

Copy link
Collaborator

@melanieclarke melanieclarke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regression test results look as expected.

There are changes to nsclean for all NIRSpec data in spec2:

  • ALLSLITS data is corrected for the first time.
  • FS subarray corrections are slightly different due to the different algorithm, and due to removing the forced zeros at the edges of the array.
  • Full frame corrections for FS, MOS, and IFU are slightly different due to removing the forced zeros at the edges of the array.

Reviewing results locally, all nsclean corrections are visually similar to previous versions. There are also regression test changes for all products downstream of nsclean, as expected.

All of the remaining failures reported are unrelated.

I think this PR is ready to merge now. When it is in, I will incorporate the changes into the clean_flicker_noise step (#8669).

@t-brandt
Copy link
Contributor Author

Great, thanks @melanieclarke! @hayescr and @melanieclarke, I think that the issue with overcorrection in various modes is best solved by exposing a single parameter (provisionally called aggressiveness) to the user. This would tune the frequencies to be corrected and the coupling of the four outputs. We could set a default with reasonable behavior, with more aggressive removal appropriate to cases where at least ~half of the array is available as a reference for the correction, and less aggressive removals appropriate to cases where most of the array is unavailable for a correction. I envision this applying to both subarray and full frame corrections.

@melanieclarke
Copy link
Collaborator

melanieclarke commented Sep 11, 2024

That sounds good to me @t-brandt - it would be nice to be able to tune that without asking the user to modify frequency cutoffs themselves!

I think we don't yet have a ticket for the aggressiveness parameter - can you please file one, and we'll continue the conversation there? Time is short on this build, but we should have plenty of time to work on it in the next one.

@melanieclarke melanieclarke merged commit 7b6b1a0 into spacetelescope:master Sep 11, 2024
28 of 29 checks passed
@t-brandt
Copy link
Contributor Author

Sounds perfect. The ticket would modify the full frame NSClean behavior and create an aggressiveness parameter. I will make one.

@hayescr
Copy link
Contributor

hayescr commented Sep 11, 2024

Thanks @melanieclarke and @t-brandt, agreed that we might want slightly different cutoff frequencies for different cases, perhaps other subarrays when the spectral regions are masked as well. I had a look at some of Tim's tests of setting this aggressiveness and there were some cases where different cases looked better with different values, so this would definitely be worth investigating. Feel free to tag me on that once you open it Tim, I'm happy to continue the conversation more there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation nsclean NIRSpec NSClean algorithm
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Modify NSClean behaviour for subarray data
3 participants