-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update CoCiP-grid data access patterns to reduce duplicate chunk downloads #171
Conversation
@thabbott , this is solid 🦾 I'm of the opinion that it is worth adding these changes and cutting a release. The additional RAM is a bit surprising, and it isn't a clear cut win for the case where we localize the zarr store. But, given that our nominal case is using a remote zarr store, this seems like a good improvement. Furthermore, if we are operating in a network-contrained environment, the ~50% decrease in data over the wire could lead to an even more substantive gain that what you've profiled here. Unless there are other concerns, I think full steam ahead 🚀Once there is a new release of pycontrails, I'll pull it into the api-preprocessor (along with other WIP), and get some numbers based on it running in k8s vs. a VM. |
75b9af6
to
825eb39
Compare
A few other notes after spending some time cleaning up this PR:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't speak to the implementation adjustments following the package version updates. But the implementation logic vis a vis what we discussed looks appropriate, and the summary numbers in the table position this change as a valuable enhancement.
noting:
I haven't been following the recent releases for pycontrails, but it appears that the recent resample_and_fill updates will also go out with this release. I'll make adjustments, if warranted, to the flights-pipeline/spire-ingest-resample-worker, as that was the instigator for those changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all looks good, thank you for fixing the scipy problem.
If you want me to check into the nc file fixtures for the tests, i'm happy to go back through that
Changes
Updates data access patterns in
CocipGrid
to avoid discarding and reloading met data after it has already been loaded into memory.Benchmarks for the API preprocessor (https://github.com/contrailcirrus/api-preprocessor/tree/triage/coarse-fine-zarr-chunking):
main
feature/cocip-grid-access-patterns
Peak RAM: 3369 MiB
Ingress: 70.4 GiB
Model-level variables: 2.15 downloads/chunk
Single-level variables: 2.60 downloads/chunk
Peak RAM: 4037 MiB
Ingress: 38.3 GiB
Model-level variables: 1.17 downloads/chunk
Single-level variables: 1.80 downloads/chunk
no dask
Peak RAM: 25298 MiB
Ingress: 58.8 GiB
Model-level variables: 1.55 downloads/chunk
Single-level variables: 1.97 downloads/chunk
Peak RAM: 28050 MiB
Ingress: 37.9 GiB
Model-level variables: 1.16 downloads/chunk
Single-level variables: 1.97 downloads/chunk
Peak RAM: 3726 MiB
Peak RAM: 4530 MiB
no dask
Peak RAM: 22952 MiB
Peak RAM: 22966 MiB
Numbers are all running on a
c3d-highmem-4
VM (4 vCPUs, 32 GB RAM). Downloads/chunks are averages over all accessed chunks.When using dask, the updated data access pattern cuts ingress and wall time by something like a factor of two, but comes with some memory overhead (a bit less than 1 GiB).
I've included benchmarks without dask (disabled by calling
xr.open_zarr
withchunks=None
) to check that I'm not causing any major performance regressions for that case, but disabling dask increases memory consumption significantly, so I don't think those benchmarks are particularly relevant for the API preprocessor. (I will admit that I don't understand why dask-less benchmarks are slower with a local zarr store than a remote zarr store.... Not sure that's worth digging into, though.)@nickmasson, @mlshapiro, open to feedback on whether this seems like a useful improvement. If the consensus is yes, I'll update unit tests (failing since the latest xarray release) and add a test or two to check for expected behavior in the updated
_maybe_downselect_met_rad
.Internals
_maybe_downselect_met_rad
method ofCocipGrid
to reuse in-memory data when new forecast steps are selected.Tests
make test
)Reviewer