Replace ModelContainer with ModelLibrary #1241

braingram · 2024-05-17T13:45:02Z

This PR replaces ModelContainer with a ModelLibrary subclass.

Please see stpipe for the core documentation on ModelLibrary.

The main benefit is that ModelLibrary (in contrast to ModelContainer) will allow passing data between steps without holding it in memory (and instead keeping datamodels "on disk").

This will allow for future improvements to steps (especially those part of a pipeline) to allow them to run in a generically memory efficient way (the interface to an "in memory" and an "on disk" ModelLibrary are identical). Importantly this PR does not aim to make improvements to the steps (to limit the already wide scope of the changes). Follow-up PRs will be needed to address issues with some steps where they are coded in a way that keeps all models in memory (defeating the memory saving possible with ModelLibrary). To provide one example, the tweakreg step contains a call to create_astrometric_catalog which currently expects a list of models. In this PR the models are read from the library, which loads all models in memory. It is possible to change the code avoid reading in all of the models. However this would require changes to several utility functions (some in stcal) which are technically not required to add the ModelLibrary but are required to see performance benefits from using the library.

Regression tests:
https://github.com/spacetelescope/RegressionTests/actions/runs/9962595365
shows 5 minor metadata differences.

test_resample_single_file where exptype and group_id are now in the output file
test_level3_mos_pipeline where asn.pool_name is now none instead of dummy value and asn.table_name is now L3_regtest_asn.json instead of dummy value
test_tweakreg where asn and exptype are now in the output file
test_level2_grism_processing_pipeline where asn exptype and group_id are now in the output file
test_level2_image_processing_pipeline where asn and exptype are now in the output file

Fixes #1303

Todo:

Resolves RCAL-nnnn

Closes #

This PR addresses ...

Checklist

added entry in CHANGES.rst under the corresponding subsection
updated relevant tests
updated relevant documentation
updated relevant milestone(s)
added relevant label(s)
ran regression tests, post a link to the Jenkins job below. How to run regression tests on a PR

codecov · 2024-05-17T14:16:23Z

Codecov Report

Attention: Patch coverage is 87.41419% with 55 lines in your changes missing coverage. Please review.

Project coverage is 76.89%. Comparing base (7476678) to head (b5349d9).
Report is 249 commits behind head on main.

Files with missing lines	Patch %	Lines
romancal/tweakreg/tweakreg_step.py	79.64%	23 Missing ⚠️
romancal/resample/resample.py	92.45%	8 Missing ⚠️
romancal/pipeline/exposure_pipeline.py	14.28%	6 Missing ⚠️
romancal/resample/resample_step.py	72.72%	6 Missing ⚠️
romancal/pipeline/mosaic_pipeline.py	16.66%	5 Missing ⚠️
romancal/datamodels/library.py	93.87%	3 Missing ⚠️
romancal/flux/flux_step.py	85.71%	2 Missing ⚠️
romancal/skymatch/skymatch_step.py	90.47%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1241      +/-   ##
==========================================
- Coverage   79.28%   76.89%   -2.39%     
==========================================
  Files         117      114       -3     
  Lines        8065     7773     -292     
==========================================
- Hits         6394     5977     -417     
- Misses       1671     1796     +125

Flag	Coverage Δ
nightly	`?`

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

emolter

Most of this "review" is just me asking clarification questions about how this works; thanks for humoring that. Not sure I have the expertise yet to add much more than that.

romancal/datamodels/library.py

romancal/datamodels/tests/test_datamodels.py

ddavis-stsci · 2024-05-31T17:31:33Z

On 5/31/24 9:37 AM, Brett Graham wrote: This also seems to cause many issues with the unit tests, test_outlier_detection.py, test_skymatch and, test_tweakreg Do we have an idea of how much work is required to correct these? Thanks for giving this a look. Did you pip install the package before testing? I ask because the changes in this PR require a minor modification to stpipe (see the modified pyproject.toml which points to the stpipe fork). Without those changes many tests will fail as stpipe will attempt to treat the |ModelLibrary| as a |ModelContainer|.

No, I just checked out your branch. I'll give that a try for the next tests.

The items in the library are also read only? |ml.asn['products'][0]['members'][1]['exptype'] = 'background' Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'mappingproxy' object does not support item assignment | Can this be relaxed? For instance it might be convenient to add catalogs to the association for SDF processing. The models are modifiable but the association metadata is read only. I made this read only because modifying the association after models have been loaded (or after the library is created when |asn_exptypes| is used) leads to mismatches between the models and the association. Using the example, let's assume member 1 has a exptype of science. If the model is loaded from the library (or from the existing container), it will contain the 'science' exptype in it's metadata (at |model.meta.asn.exptype|). If now the exptype is changed in the association there is no way to update the corresponding model metadata and the asn will report this as the updated value but the model metadata will still report 'science'. Making the asn read only helps to prevent this (although it's still possible to modify the

Here I am thinking ahead that we'll need difference images (color maps) and may need to designate a science exposure as a background to be subtracted from the science image. We don't have any algorithms yet but just trying to be flexible.

model metadata and have it not match the asn). Would you expand on the use case for adding catalogs to the association? Would it work to add these to the association before the library is created? I don't think I've yet added creating a library from an in memory association but that should be straightforward and might help here.

I think for the production products we can create the catalog names that are associated with an image file, thus spending hours discussing file names. For custom catalogs, strun romancal.step.TweakRegStep --catfile my_favorite_cat.asdf it would be good to be able to add / replace the catalog file in the asn. We can probably ignore this for now and focus on the production products.

…

— Reply to this email directly, view it on GitHub <https://urldefense.com/v3/__https://github.com/spacetelescope/romancal/pull/1241*issuecomment-2142183132__;Iw!!CrWY41Z8OgsX0i-WU-0LuAcUu2o!xwKb-nXLse0EPytCkC2ZgQiLbGPkgbLBW9_tdALSU8Tzu9UMF7zu5_0j0L2Q7GPU-bjneOC5yX5-a0Ypsg0kJ541$>, or unsubscribe <https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ALXCXWNQANUVCOIYZGC4HH3ZFB4IXAVCNFSM6AAAAABH4FWT36VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBSGE4DGMJTGI__;!!CrWY41Z8OgsX0i-WU-0LuAcUu2o!xwKb-nXLse0EPytCkC2ZgQiLbGPkgbLBW9_tdALSU8Tzu9UMF7zu5_0j0L2Q7GPU-bjneOC5yX5-a0YpsrN5-LsI$>. You are receiving this because you commented.Message ID: ***@***.***>

schlafly

Lot of work here! I had a few comments. I still need to look at the main library code.

romancal/outlier_detection/outlier_detection_step.py

romancal/pipeline/exposure_pipeline.py

romancal/regtest/test_tweakreg.py

schlafly · 2024-06-03T20:33:00Z

romancal/resample/resample.py

+        #     members
+        #     if self.input_models.filepaths is None
+        #     else self.input_models.filepaths
+        # )


This is my fault. There was a request to store the actual file names of the L2 input files (as opposed to their meta.filenames, which may become disconnected). filepaths got populated with the file names from the association file in the model container here:

romancal/romancal/datamodels/container.py

Line 184 in e559890

self.filepaths = [op.basename(m) for m in self._models]

Thanks for commenting on this. I don't yet fully understand how these are used or carried through the pipelines. Some discussion on expectations for this would be helpful. For example, when running the mosaic pipeline is the expectation that these will be the original input filenames? If so, I think this code in skymatch will lose the filepaths:

romancal/romancal/skymatch/skymatch_step.py

Line 100 in e559890

return ModelContainer([x.meta["image_model"] for x in images])

@schlafly I believe this is now handled in resample (instead of the container):
https://github.com/spacetelescope/romancal/pull/1241/files#diff-f87241f9a890ac31d6813379a09b6eb7e1052e3ab53039449a8f0df91aa030e1R351
but it would be great if this could be confirmed (I don't see any test failures but I'm not sure if there is a test for this feature).

Thanks, sorry for losing this thread! The issue was that meta.filename can easily become disconnected from the actual file name on disk, and there was a request for storing the actual file names on disk used to create a L3 file, rather than the meta.filenames that contributed. My initial impression here
https://github.com/spacetelescope/romancal/pull/1241/files#diff-f87241f9a890ac31d6813379a09b6eb7e1052e3ab53039449a8f0df91aa030e1R351
is that this returns to the previous behavior of using meta.filename. If there's a convenient way to ask the container for the original on-disk filenames, that would be best.

Thanks. Is there a test for this (so I can poke it to see what's going on with and without this PR)?

To make sure I'm understanding the goal, let's say an association is provided to the mosaic pipeline that lists members "foo.asdf" and "bar.asdf" which have meta.filename attributes that are something else "chutes.asdf" and "ladders.asdf". The goal is to have the output resampled product contain "foo.asdf" and "bar.asdf". I think that should be easy to do by using the members info from Library.asn but I'll have to run some tests.

@schlafly I switched to populating this using the asn members (which are the input filenames) in:
449a58d
I didn't see any test differences so I'm assuming there is no test for this behavior (and the mosaic pipeline input files all have meta.filenames that match the real filenames).

I modified the test data for the mos pipeline test (locally) renaming one file in the association and on disk (but leaving it's meta.filename the same) and then run the pipeline. The result from resample recorded the modified name (the real filename) and not meta.filename.

I'd be happy to open an issue to track adding a test for this or adding one to this PR (I think it should be possible to add as a resample only unit test but that would require some investigation). Which would you prefer, an issue to track adding the test or adding it to this PR?

Perfect, thanks, that looks great. Just making sure I hadn't missed it before, that's a new feature in ModelLibrary that wasn't available to me when I wrote that bit of code before? Great. It looks like when the ModelLibrary is constructed from a list of models, these expnames default to meta.filename, but that seems like it isn't a regression since for some set of in-memory models I don't know what to do otherwise?

I tried to trace the code briefly and was a bit confused about what the resample step is now allowed to support as input. The docstring is presumably out of date:

romancal/romancal/resample/resample.py

Lines 61 to 62 in 449a58d

input_models : list of objects

list of data models, one for each input image

This bit:

romancal/romancal/resample/resample.py

Lines 80 to 84 in 449a58d

if (input_models is None) or (len(input_models) == 0):

raise ValueError(

"No input has been provided. Input should be a list of datamodel(s) or "

"a ModelLibrary."

)

says either a list of models or a ModelLibrary.
This bit

romancal/romancal/resample/resample.py

Lines 134 to 135 in 449a58d

for i, m in enumerate(models):

self.input_models.shelve(m, i, modify=False)

suggests to me that it has to be a ModelLibrary (which presumably then always has the asn member).

I don't have strong preferences for an issue vs. an addition to this PR for a new test. It looks like this PR is pretty close to being ready to merge, so I'm happy to delay a new test to an issue. Thanks!

Thanks!

Just making sure I hadn't missed it before, that's a new feature in ModelLibrary that wasn't available to me when I wrote that bit of code before?

Exactly. I think the members entries previously got lost in skymatch due to:

romancal/romancal/skymatch/skymatch_step.py

Line 100 in 7476678

return ModelContainer([x.meta["image_model"] for x in images])

It looks like when the ModelLibrary is constructed from a list of models, these expnames default to meta.filename, but that seems like it isn't a regression since for some set of in-memory models I don't know what to do otherwise?

Yup! Although we certainly could change that if something else seems more useful. The general idea is to make the ModelLibrary always have a "valid" association (even if it's made up based off the models) so the step code has a consistent interface (with ModelContainer I think you only get something in asn_table if the input is an association).

Re: resample step

Thanks for the catch. Yes the ResampleData class now only accepts a ModelLibrary. The ResampleStep should accept the same input as before (except for a ModelLibrary instead of a ModelContainer).
I pushed a change to the docstring in: 6184692
and the error message in: 899c9aa

I still have to rebase this to deal with the horde of conflicts following #1314 so the CI is unhappy temporarily.

I believe the linked commits addressed the documentation issues (thanks for finding those!) and I was able to "rebase". Rerunning the regtests shows the same results as linked in the above description (the run link is up-to-date).

romancal/resample/resample.py

romancal/tweakreg/tweakreg_step.py

schlafly · 2024-06-03T20:40:18Z

romancal/tweakreg/tweakreg_step.py

+            # self.log.info("* Images in GROUP 1:")
+            # for im in grp_img[0]:
+            #     self.log.info(f"     {im.meta.filename}")
+            # self.log.info("")


I presume that this and related logging issues are a temporary expedient rather than a proposed change?

This is a partially proposed change at the moment. It does highlight one pattern that has some negative consequences when the ModelLibrary is on_disk. The logging commented out here (if the library is on_disk) will result in every model being read from disk to print out the meta.filename of each model. This is shortly followed by every model then being re-opened to set the cal_step status. I think a more complete change would be to combine these loops or remove the logging if printing out the filenames isn't that important (I don't think most steps do this).

romancal/datamodels/library.py

mairanteodoro

Looks good to me. I left a few comments/suggestions.

romancal/flux/flux_step.py

romancal/outlier_detection/outlier_detection.py

romancal/resample/resample_step.py

romancal/skymatch/skymatch_step.py

schlafly · 2024-07-19T13:01:05Z

Great. As far as you're concerned, this is good to merge? @ddavis-stsci , anything else that you would like to see?

schlafly

Looks good to me, thanks for all of this!

github-actions bot added testing pipeline dependencies Pull requests that update a dependency file labels May 17, 2024

braingram force-pushed the model_library branch 3 times, most recently from 391eb84 to 357a1c3 Compare May 20, 2024 18:35

This comment was marked as resolved.

Sign in to view

This comment was marked as outdated.

Sign in to view

github-actions bot added the regression_testing label May 23, 2024

braingram force-pushed the model_library branch 2 times, most recently from 3dfd262 to c9f6274 Compare May 23, 2024 19:14

github-actions bot added the documentation Improvements or additions to documentation label May 23, 2024

emolter reviewed May 30, 2024

View reviewed changes

This comment was marked as resolved.

Sign in to view

braingram force-pushed the model_library branch from 3d13486 to c1db85b Compare May 31, 2024 15:17

This comment was marked as resolved.

Sign in to view

schlafly reviewed Jun 3, 2024

View reviewed changes

mairanteodoro reviewed Jun 6, 2024

View reviewed changes

romancal/flux/flux_step.py Show resolved Hide resolved

romancal/outlier_detection/outlier_detection.py Outdated Show resolved Hide resolved

romancal/resample/resample_step.py Outdated Show resolved Hide resolved

romancal/skymatch/skymatch_step.py Outdated Show resolved Hide resolved

This comment was marked as outdated.

Sign in to view

braingram force-pushed the model_library branch from 13e3481 to d64e269 Compare June 10, 2024 13:47

braingram added 21 commits July 18, 2024 17:14

fix outlier_detection tests

af15e74

fix flux tests

9c20805

fix tweakreg tests

0c467f8

rebase

64acdae

use library._save

9026f89

remove use of private _models

3b99769

return non-shelved model in tweakreg

e662456

fixing another borrow

a22bf23

add changelog

883d6a3

fix bug introduced in rebase

137eefe

clean up comments

16532dc

modify asn metadata setting in models

8221e7a

fix asn_pool->pool_name

acd1b3d

clean up comments

c9d162a

switch to stpipe main

42dba28

fix skipped test

00e2eaf

fix docstring

f7fb783

use filenames instead of meta.filename for resample members

38996e4

fix ResampleData docstring

d896b4c

fix error message

e66656c

redo spacetelescope#1314

b5349d9

braingram force-pushed the model_library branch from 899c9aa to b5349d9 Compare July 18, 2024 21:22

This was referenced Jul 19, 2024

use monkeypatch in patch_match tests to avoid global state change #1319

Merged

uncomment and use get_asn #1320

Merged

schlafly approved these changes Jul 22, 2024

View reviewed changes

braingram merged commit c49de36 into spacetelescope:main Jul 22, 2024
29 of 30 checks passed

braingram deleted the model_library branch July 22, 2024 13:42

This was referenced Jul 22, 2024

delete unreachable code #1322

Merged

replace usages of copy_arrays with memmap spacetelescope/roman_datamodels#360

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace ModelContainer with ModelLibrary #1241

Replace ModelContainer with ModelLibrary #1241

braingram commented May 17, 2024 •

edited

Loading

codecov bot commented May 17, 2024 •

edited

Loading

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as outdated.

emolter left a comment

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

ddavis-stsci commented May 31, 2024 via email

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

schlafly left a comment

schlafly Jun 3, 2024

braingram Jun 3, 2024

braingram Jul 16, 2024

schlafly Jul 17, 2024

braingram Jul 17, 2024

braingram Jul 17, 2024

schlafly Jul 18, 2024

braingram Jul 18, 2024

braingram Jul 19, 2024

schlafly Jun 3, 2024

braingram Jun 3, 2024

mairanteodoro left a comment

This comment was marked as outdated.

schlafly commented Jul 19, 2024

schlafly left a comment

	input_models : list of objects
	list of data models, one for each input image

	if (input_models is None) or (len(input_models) == 0):
	raise ValueError(
	"No input has been provided. Input should be a list of datamodel(s) or "
	"a ModelLibrary."
	)

	for i, m in enumerate(models):
	self.input_models.shelve(m, i, modify=False)

Replace ModelContainer with ModelLibrary #1241

Replace ModelContainer with ModelLibrary #1241

Conversation

braingram commented May 17, 2024 • edited Loading

codecov bot commented May 17, 2024 • edited Loading

Codecov Report

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as outdated.

emolter left a comment

Choose a reason for hiding this comment

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

ddavis-stsci commented May 31, 2024 via email

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

schlafly left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mairanteodoro left a comment

Choose a reason for hiding this comment

This comment was marked as outdated.

schlafly commented Jul 19, 2024

schlafly left a comment

Choose a reason for hiding this comment

braingram commented May 17, 2024 •

edited

Loading

codecov bot commented May 17, 2024 •

edited

Loading