Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LRU Caching for NetCDF lumped format timeslice chunks #421

Merged
merged 11 commits into from
Jul 28, 2022

Conversation

mattw-nws
Copy link
Contributor

@mattw-nws mattw-nws commented Jun 21, 2022

Caches slices of data in NetCDFPerFeatureDataProvider for performance improvement.

Additions

  • Test param files (not real params!) for running cat-52 and cat-67 in addition to cat-27 and a realization config for testing all three with NetCDF lumped forcing sample file.

Removals

Changes

  • Boost-supplied LRU cache implementation caches whole slices of NetCDF, all catchments for a single timestep for each variable gets a cache entry. This improves performance for forcing reads dramatically.
  • Slightly improved lumped NetCDF generation script allows more flexibility in CSV files (capitalization of Time field) and extracts catchment ID more inteligently.

Testing

  1. Validated existing NetCDFPerFeatureDataProvider tests.

Screenshots

Notes

Todos

Checklist

  • PR has an informative and human-readable title
  • Changes are limited to a single goal (no scope creep)
  • Code can be automatically merged (no conflicts)
  • Code follows project standards (link if applicable)
  • Passes all existing automated tests
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future todos are captured in comments
  • Project documentation has been updated (including the "Unreleased" section of the CHANGELOG)
  • Reviewers requested with the Reviewers tool ➡️

Testing checklist (automated report can be put here)

Target Environment support

  • Linux

@mattw-nws mattw-nws force-pushed the netcdf-lumped-optimized branch from 4daa471 to 637cc16 Compare June 23, 2022 15:26
mattw-nws pushed a commit to mattw-nws/ngen that referenced this pull request Jul 12, 2022
mattw-nws pushed a commit to mattw-nws/ngen that referenced this pull request Jul 12, 2022
@mattw-nws mattw-nws force-pushed the netcdf-lumped-optimized branch 2 times, most recently from d71ffdb to 22c4974 Compare July 15, 2022 17:09
@mattw-nws mattw-nws requested a review from donaldwj July 15, 2022 19:00
@mattw-nws mattw-nws force-pushed the netcdf-lumped-optimized branch from 80f1654 to a5e594e Compare July 15, 2022 19:12
mattw-nws pushed a commit that referenced this pull request Jul 19, 2022
@mattw-nws mattw-nws force-pushed the netcdf-lumped-optimized branch from 5b9f13e to c7db0e7 Compare July 19, 2022 13:27
@mattw-nws mattw-nws force-pushed the netcdf-lumped-optimized branch from c7db0e7 to 2e9ed48 Compare July 19, 2022 14:36
@mattw-nws mattw-nws marked this pull request as ready for review July 19, 2022 14:50
@stcui007
Copy link
Contributor

Only some superficial reading of the codes so far. One note on commit "Tweaks to allow creation from either "standard" CSV format": one line 29, should the second parameter be 1 instead of 2, as noted somewhere in a C++ code?
chunksizes=(num_catchments,2)

@mattw-nws
Copy link
Contributor Author

@stcui007 Yes, correct--this is adjusted in 440d572.

@stcui007
Copy link
Contributor

One comment I have involves NetCDFPerFeatureDataProvider.hpp lines 401 - 424. In particular, I don't fully understand line 422. I assume i iterate through cache slices, while j varies within a slice? Then for raw_values[i+j] the indices i and j would be equivalent, i.e., (i=0, j=1) and (i=1, j=0) would be the same. Would this potentially mix up different slices? In the current set setup, since cache_slice_t_size = 1, this won't have any effect. Is this going to be generalized?

Are the comment lines between lines 67 - 69, and 74 - 75 still necessary?

One other question doesn't have anything to do with the current codes, but with the format of the netcdf files. The original format of the AORC netcdf forcing files' attributes contain the scale_factor and offset. If we deal with that in pre-processing, then we don't need to deal with that here. Otherwise, it needs to be considered.

@mattw-nws
Copy link
Contributor Author

One comment I have involves NetCDFPerFeatureDataProvider.hpp lines 401 - 424. In particular, I don't fully understand line 422. I assume i iterate through cache slices, while j varies within a slice? Then for raw_values[i+j] the indices i and j would be equivalent, i.e., (i=0, j=1) and (i=1, j=0) would be the same. Would this potentially mix up different slices? In the current set setup, since cache_slice_t_size = 1, this won't have any effect. Is this going to be generalized?

Yes, this is probably flawed right now for any case where cache_slice_t_size != 1. That should probably be something like [(i*cache_slice_t_size)+j] but that's not quite right either I don't think. More work will be required to correctly support time slices greater than 1... I think catchment slices greater than 1 will be more likely needed, if this implementation survives long enough.

@mattw-nws
Copy link
Contributor Author

Are the comment lines between lines 67 - 69, and 74 - 75 still necessary?

No, but they might be needed at larger scales. Left in for reference.

@mattw-nws
Copy link
Contributor Author

One other question doesn't have anything to do with the current codes, but with the format of the netcdf files. The original format of the AORC netcdf forcing files' attributes contain the scale_factor and offset. If we deal with that in pre-processing, then we don't need to deal with that here. Otherwise, it needs to be considered.

Good question... my assumption was that the library handles these attributes, but that may be incorrect. If we support those in these files, we would need to test that... however these files have float values so those metadata scale values are far less useful.

@stcui007
Copy link
Contributor

stcui007 commented Jul 21, 2022 via email

Copy link
Contributor

@stcui007 stcui007 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment already provided in PR.

@mattw-nws
Copy link
Contributor Author

The original format of the AORC netcdf forcing files' attributes contain the scale_factor and offset...This question comes up because in my ESMPY based codes the output netcdf files copy the original attributes

Those attributes should only be copied if the values in the NetCDF need to be scaled and offset... do they? Are the values copied into the new NetCDF (floats) already scaled and offset? Or are they the same values from the original NetCDF before they are scaled and offset?

@donaldwj
Copy link
Contributor

donaldwj commented Jul 22, 2022 via email

@stcui007
Copy link
Contributor

stcui007 commented Jul 22, 2022 via email

@stcui007
Copy link
Contributor

stcui007 commented Jul 22, 2022 via email

@mattw-nws
Copy link
Contributor Author

@donaldwj @stcui007 Here's what's in the latest file example, via ncdump:

        float T2D(catchment-id, time) ;
                T2D:_FillValue = -99999.f ;
                T2D:missing_value = -32767.f ;
                T2D:long_name = "Temperature" ;
                T2D:short_name = "TMP_2maboveground" ;
                T2D:scale_factor = 0.1f ;
                T2D:units = "K" ;
                T2D:level = "2 m above ground" ;

and

 T2D =
  {2930.423, 2920.427, 2910.339, 2900.349, 2894.427, 2888.816, 2882.97, 
    2879.969, 2876.969, 2873.97},
  {2951.988, 2938.251, 2924.16, 2910.149, 2898.901, 2887.482, 2875.898, 
    2871.486, 2866.829, 2862.471},

First, is this valid, then? Will these floats be rescaled by the library when read? Second, I don't think that's what we want, if they're floats. There's no cost to storing the values without scaling if they're floats, I don't think--it's only a source of confusion.

@stcui007
Copy link
Contributor

stcui007 commented Jul 22, 2022 via email

@mattw-nws mattw-nws merged commit f7b361f into NOAA-OWP:master Jul 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants