-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace ModelContainer with ModelLibrary #1241
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1241 +/- ##
==========================================
- Coverage 79.28% 76.89% -2.39%
==========================================
Files 117 114 -3
Lines 8065 7773 -292
==========================================
- Hits 6394 5977 -417
- Misses 1671 1796 +125
☔ View full report in Codecov by Sentry. |
391eb84
to
357a1c3
Compare
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as outdated.
This comment was marked as outdated.
3dfd262
to
c9f6274
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of this "review" is just me asking clarification questions about how this works; thanks for humoring that. Not sure I have the expertise yet to add much more than that.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
On 5/31/24 9:37 AM, Brett Graham wrote:
This also seems to cause many issues with the unit tests,
test_outlier_detection.py, test_skymatch and, test_tweakreg Do we
have an idea of how much work is required to correct these?
Thanks for giving this a look. Did you pip install the package before
testing? I ask because the changes in this PR require a minor
modification to stpipe (see the modified pyproject.toml which points
to the stpipe fork). Without those changes many tests will fail as
stpipe will attempt to treat the |ModelLibrary| as a |ModelContainer|.
No, I just checked out your branch. I'll give that a try for the next
tests.
The items in the library are also read only?
|ml.asn['products'][0]['members'][1]['exptype'] = 'background'
Traceback (most recent call last): File "<stdin>", line 1, in
<module> TypeError: 'mappingproxy' object does not support item
assignment |
Can this be relaxed? For instance it might be convenient to add
catalogs to the association for SDF processing.
The models are modifiable but the association metadata is read only. I
made this read only because modifying the association after models
have been loaded (or after the library is created when |asn_exptypes|
is used) leads to mismatches between the models and the association.
Using the example, let's assume member 1 has a exptype of science. If
the model is loaded from the library (or from the existing container),
it will contain the 'science' exptype in it's metadata (at
|model.meta.asn.exptype|). If now the exptype is changed in the
association there is no way to update the corresponding model metadata
and the asn will report this as the updated value but the model
metadata will still report 'science'. Making the asn read only helps
to prevent this (although it's still possible to modify the
Here I am thinking ahead that we'll need difference images (color maps)
and may need to designate a science exposure as a background to be
subtracted from the science image.
We don't have any algorithms yet but just trying to be flexible.
model metadata and have it not match the asn).
Would you expand on the use case for adding catalogs to the
association? Would it work to add these to the association before the
library is created? I don't think I've yet added creating a library
from an in memory association but that should be straightforward and
might help here.
I think for the production products we can create the catalog names that
are associated with an image file, thus spending hours discussing file
names.
For custom catalogs,
strun romancal.step.TweakRegStep --catfile my_favorite_cat.asdf
it would be good to be able to add / replace the catalog file in the asn.
We can probably ignore this for now and focus on the production products.
…
—
Reply to this email directly, view it on GitHub
<https://urldefense.com/v3/__https://github.com/spacetelescope/romancal/pull/1241*issuecomment-2142183132__;Iw!!CrWY41Z8OgsX0i-WU-0LuAcUu2o!xwKb-nXLse0EPytCkC2ZgQiLbGPkgbLBW9_tdALSU8Tzu9UMF7zu5_0j0L2Q7GPU-bjneOC5yX5-a0Ypsg0kJ541$>,
or unsubscribe
<https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ALXCXWNQANUVCOIYZGC4HH3ZFB4IXAVCNFSM6AAAAABH4FWT36VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBSGE4DGMJTGI__;!!CrWY41Z8OgsX0i-WU-0LuAcUu2o!xwKb-nXLse0EPytCkC2ZgQiLbGPkgbLBW9_tdALSU8Tzu9UMF7zu5_0j0L2Q7GPU-bjneOC5yX5-a0YpsrN5-LsI$>.
You are receiving this because you commented.Message ID:
***@***.***>
|
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lot of work here! I had a few comments. I still need to look at the main library code.
romancal/resample/resample.py
Outdated
# members | ||
# if self.input_models.filepaths is None | ||
# else self.input_models.filepaths | ||
# ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is my fault. There was a request to store the actual file names of the L2 input files (as opposed to their meta.filenames, which may become disconnected). filepaths got populated with the file names from the association file in the model container here:
romancal/romancal/datamodels/container.py
Line 184 in e559890
self.filepaths = [op.basename(m) for m in self._models] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for commenting on this. I don't yet fully understand how these are used or carried through the pipelines. Some discussion on expectations for this would be helpful. For example, when running the mosaic
pipeline is the expectation that these will be the original input filenames? If so, I think this code in skymatch will lose the filepaths:
romancal/romancal/skymatch/skymatch_step.py
Line 100 in e559890
return ModelContainer([x.meta["image_model"] for x in images]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@schlafly I believe this is now handled in resample (instead of the container):
https://github.com/spacetelescope/romancal/pull/1241/files#diff-f87241f9a890ac31d6813379a09b6eb7e1052e3ab53039449a8f0df91aa030e1R351
but it would be great if this could be confirmed (I don't see any test failures but I'm not sure if there is a test for this feature).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, sorry for losing this thread! The issue was that meta.filename can easily become disconnected from the actual file name on disk, and there was a request for storing the actual file names on disk used to create a L3 file, rather than the meta.filenames that contributed. My initial impression here
https://github.com/spacetelescope/romancal/pull/1241/files#diff-f87241f9a890ac31d6813379a09b6eb7e1052e3ab53039449a8f0df91aa030e1R351
is that this returns to the previous behavior of using meta.filename. If there's a convenient way to ask the container for the original on-disk filenames, that would be best.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Is there a test for this (so I can poke it to see what's going on with and without this PR)?
To make sure I'm understanding the goal, let's say an association is provided to the mosaic pipeline that lists members "foo.asdf" and "bar.asdf" which have meta.filename attributes that are something else "chutes.asdf" and "ladders.asdf". The goal is to have the output resampled product contain "foo.asdf" and "bar.asdf". I think that should be easy to do by using the members info from Library.asn
but I'll have to run some tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@schlafly I switched to populating this using the asn members (which are the input filenames) in:
449a58d
I didn't see any test differences so I'm assuming there is no test for this behavior (and the mosaic pipeline input files all have meta.filenames that match the real filenames).
I modified the test data for the mos pipeline test (locally) renaming one file in the association and on disk (but leaving it's meta.filename the same) and then run the pipeline. The result from resample recorded the modified name (the real filename) and not meta.filename.
I'd be happy to open an issue to track adding a test for this or adding one to this PR (I think it should be possible to add as a resample only unit test but that would require some investigation). Which would you prefer, an issue to track adding the test or adding it to this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect, thanks, that looks great. Just making sure I hadn't missed it before, that's a new feature in ModelLibrary that wasn't available to me when I wrote that bit of code before? Great. It looks like when the ModelLibrary is constructed from a list of models, these expnames default to meta.filename, but that seems like it isn't a regression since for some set of in-memory models I don't know what to do otherwise?
I tried to trace the code briefly and was a bit confused about what the resample step is now allowed to support as input. The docstring is presumably out of date:
romancal/romancal/resample/resample.py
Lines 61 to 62 in 449a58d
input_models : list of objects | |
list of data models, one for each input image |
This bit:
romancal/romancal/resample/resample.py
Lines 80 to 84 in 449a58d
if (input_models is None) or (len(input_models) == 0): | |
raise ValueError( | |
"No input has been provided. Input should be a list of datamodel(s) or " | |
"a ModelLibrary." | |
) |
says either a list of models or a ModelLibrary.
This bit
romancal/romancal/resample/resample.py
Lines 134 to 135 in 449a58d
for i, m in enumerate(models): | |
self.input_models.shelve(m, i, modify=False) |
suggests to me that it has to be a ModelLibrary (which presumably then always has the asn member).
I don't have strong preferences for an issue vs. an addition to this PR for a new test. It looks like this PR is pretty close to being ready to merge, so I'm happy to delay a new test to an issue. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Just making sure I hadn't missed it before, that's a new feature in ModelLibrary that wasn't available to me when I wrote that bit of code before?
Exactly. I think the members entries previously got lost in skymatch due to:
romancal/romancal/skymatch/skymatch_step.py
Line 100 in 7476678
return ModelContainer([x.meta["image_model"] for x in images]) |
It looks like when the ModelLibrary is constructed from a list of models, these expnames default to meta.filename, but that seems like it isn't a regression since for some set of in-memory models I don't know what to do otherwise?
Yup! Although we certainly could change that if something else seems more useful. The general idea is to make the ModelLibrary
always have a "valid" association (even if it's made up based off the models) so the step code has a consistent interface (with ModelContainer
I think you only get something in asn_table
if the input is an association).
Re: resample step
Thanks for the catch. Yes the ResampleData
class now only accepts a ModelLibrary
. The ResampleStep
should accept the same input as before (except for a ModelLibrary
instead of a ModelContainer
).
I pushed a change to the docstring in: 6184692
and the error message in: 899c9aa
I still have to rebase this to deal with the horde of conflicts following #1314 so the CI is unhappy temporarily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the linked commits addressed the documentation issues (thanks for finding those!) and I was able to "rebase". Rerunning the regtests shows the same results as linked in the above description (the run link is up-to-date).
romancal/tweakreg/tweakreg_step.py
Outdated
# self.log.info("* Images in GROUP 1:") | ||
# for im in grp_img[0]: | ||
# self.log.info(f" {im.meta.filename}") | ||
# self.log.info("") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I presume that this and related logging issues are a temporary expedient rather than a proposed change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a partially proposed change at the moment. It does highlight one pattern that has some negative consequences when the ModelLibrary
is on_disk
. The logging commented out here (if the library is on_disk
) will result in every model being read from disk to print out the meta.filename
of each model. This is shortly followed by every model then being re-opened to set the cal_step
status. I think a more complete change would be to combine these loops or remove the logging if printing out the filenames isn't that important (I don't think most steps do this).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. I left a few comments/suggestions.
This comment was marked as outdated.
This comment was marked as outdated.
Great. As far as you're concerned, this is good to merge? @ddavis-stsci , anything else that you would like to see? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thanks for all of this!
This PR replaces
ModelContainer
with aModelLibrary
subclass.Please see stpipe for the core documentation on
ModelLibrary
.The main benefit is that
ModelLibrary
(in contrast toModelContainer
) will allow passing data between steps without holding it in memory (and instead keeping datamodels "on disk").This will allow for future improvements to steps (especially those part of a pipeline) to allow them to run in a generically memory efficient way (the interface to an "in memory" and an "on disk"
ModelLibrary
are identical). Importantly this PR does not aim to make improvements to the steps (to limit the already wide scope of the changes). Follow-up PRs will be needed to address issues with some steps where they are coded in a way that keeps all models in memory (defeating the memory saving possible withModelLibrary
). To provide one example, the tweakreg step contains a call to create_astrometric_catalog which currently expects a list of models. In this PR the models are read from the library, which loads all models in memory. It is possible to change the code avoid reading in all of the models. However this would require changes to several utility functions (some in stcal) which are technically not required to add theModelLibrary
but are required to see performance benefits from using the library.Regression tests:
https://github.com/spacetelescope/RegressionTests/actions/runs/9962595365
shows 5 minor metadata differences.
exptype
andgroup_id
are now in the output fileasn.pool_name
is nownone
instead ofdummy value
andasn.table_name
is nowL3_regtest_asn.json
instead ofdummy value
asn
andexptype
are now in the output fileasn
exptype
andgroup_id
are now in the output fileasn
andexptype
are now in the output fileFixes #1303
Todo:
__init__
from in memory associationmap_function
method)__setitem__
__getitem__
discard
__setitem__
changes)ModelLibrary
docsResolves RCAL-nnnn
Closes #
This PR addresses ...
Checklist
CHANGES.rst
under the corresponding subsection