Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix HEMCO ERS tests #1221

Open
wants to merge 5 commits into
base: cam_development
Choose a base branch
from

Conversation

lizziel
Copy link
Collaborator

@lizziel lizziel commented Jan 9, 2025

This PR updates submodule HEMCO_CESM to version 2.1.0 and adds new ERS tests for CAM-chem in HEMCO. The new version of HEMCO_CESM includes fixes to allow ERS tests of CAM-chem with HEMCO to pass if restarted on the hour. The new tests (10hr) do this, allowing us to demonstrate that the issue with current ERS tests (9hr) is understood to be caused by restarting mid-hour.

Part of this fix is also to use a new HEMCO_Config.rc file which specifies hourly rather than monthly read of aviation emissions. This is necessary to force vertical regridding to model pressure levels at the highest frequency possible in HEMCO (hourly).

Additional improvements in this update are:

  1. Reduced and corrected HEMCO prints. HEMCO prints still go to the CESM log but note that a future update coming soon will move that to the atm log, enabled by a new HEMCO version about to be released.
  2. Fleximod capability in HEMCO_CESM

See issue discussion at #856
See HEMCO_CESM updates related to the ERS test problem at ESCOMP/HEMCO_CESM#41

Updates include:
- New HEMCO configuration input file which reads 3D emissions hourly
- New ERS and ERP tests for FCSD_HCO which use 10 steps instead of 9
- Updated HEMCO_CESM submodule version (v2.1). HEMCO version remains the
  same (3.9.0).

Signed-off-by: Lizzie Lundgren <[email protected]>
@lizziel
Copy link
Collaborator Author

lizziel commented Jan 9, 2025

This PR is based on cam6_4_038. Please let me know if you would like me to rebase on a new version. Once the version is settled on I will run the test suite again and create the changelog updates.

@lizziel lizziel changed the title Fixes HEMCO ERS tests Fix HEMCO ERS tests Jan 10, 2025
@cacraigucar
Copy link
Collaborator

@lizziel - is the plan to continue working on getting the restarts to work no matter what timestep the model attempts to restart on? In order to fully function in CESM, this requirement would need to be met.

@cacraigucar cacraigucar self-requested a review January 13, 2025 22:08
@cacraigucar cacraigucar self-assigned this Jan 13, 2025
@lizziel
Copy link
Collaborator Author

lizziel commented Jan 15, 2025

@cacraigucar - Yes, I am working on a better fix than the one submitted, which is meant as a temporary solution to demonstrate we understand the issue. Since merge of this commit is a little ways off I may be able to come up with a better fix before then. This PR would still be needed since the HEMCO_CESM version update includes a necessary fix regardless, but if I come up with a more complete solution I could remove the new tests I added to the CAM test suite.

There are a few different ways to solve the ERS issue and the one I plan to pursue is expanding HEMCO read capability to allow configuration of reading/regridding an emission every time it is run. Right now you must hard-code frequency in the configuration file, and that hard-coded frequency has a minimum of 1hr. I am not going to expand the hard-coded minimum to number of minutes since then we would have the issue of having to update the config file based on timestep duration used.

The downside of this plan is that reading and regridding 3D emissions every timestep means more computation per timestep. However, other alternatives would take longer to implement and be more complex, such as storing raw file data between timesteps which HEMCO does not currently do. Another option is scaling emissions between timesteps for the vertical regrid rather than do a vertical interpolation from the raw data, but I am not familiar enough with the messy algorithm to know if that is even possible. If it is possible then we would need to save previous timestep pressures (if not already available) in the restart file.

The HEMCO update for this would go into a Z-version of the older HEMCO version 3.9 used by CESM so that we woud not require a GEOS-Chem version update at the same time. I am concurrently developing a branch using the latest GEOS-Chem (14.5) and HEMCO (3.10) but I don't think that needs to be done before the CESM3 freeze date.

Let me know your thoughts on my plan, including if you would like to go in a different direction for a fix.

@cacraigucar
Copy link
Collaborator

@briandobbins - I think you might be the best person to weigh in on the pros/cons of the options @lizziel is contemplating. Since HEMCO is slated to be the way emissions are brought into CAM (and this is called during the run phase to step the emissions along, so it is not just an init read), it is important to get it as efficient as possible. Let us know your thoughts.

@gold2718
Copy link
Collaborator

allow configuration of reading/regridding an emission every time it is run

Could this be a variable that CAM could set through its namelist? That way, CAM could decide how often HEMCO needs to read and regrid, say every N timesteps. This would allow skipping reads on some timesteps if saving the regridding time is desired. CAM does that with other processes such as radiation and ali_arms.

@briandobbins
Copy link
Collaborator

I'm woefully unfamiliar with HEMCO, I'm afraid, so let me ask some very naive questions first, and please correct anything I've gotten terribly wrong here. At a basic level, this changes HEMCO reads from monthly to hourly, which is the highest frequency for HEMCO.

  1. For a run with, say, a 30-minute time-step, this changes file reads from once every ~1400 steps (monthly) to once every 2 steps (hourly), and this is the performance concern? Do we open new files each time, or read data from an already opened file? And do we have timing stats on what this typically takes, both for the regridding and the read itself?

  2. Is the need for the vertical regridding to pressure levels because of the on-the-hour restart, or general correctness?

  3. How much data do we typically read anyway? Can we have a setting where we read N time-steps, so we do reads less frequently, albeit at the cost of memory (and, admittedly, development complexity)?

  4. For my own curiosity, with runs with sub-hourly steps, do we do interpolation already?

Thanks - if nothing else, I'll learn a lot from this.

@lizziel
Copy link
Collaborator Author

lizziel commented Jan 15, 2025

@briandobbins questions:

For a run with, say, a 30-minute time-step, this changes file reads from once every ~1400 steps (monthly) to once every 2 steps (hourly), and this is the performance concern? Do we open new files each time, or read data from an already opened file? And do we have timing stats on what this typically takes, both for the regridding and the read itself?

Correct. HEMCO will read the same file every hour, or every timestep in my proposal, in order to do a vertical interpolation to the current pressure grid. I believe the file is kept open in the buffer but need to verify.

I have only done a timing test for offline GEOS-Chem and found a 6% increase in a 1-day run for a 4x5 run using 8 cores. Further testing is needed to determine scaling with cores and the test should be using CESM.

Is the need for the vertical regridding to pressure levels because of the on-the-hour restart, or general correctness?

We do vertical regridding to pressure levels simply because that is what is done in the Modular Earth Submodel System (MESSy) algorithm for vertical regridding in HEMCO. CESM uses a vertical pressure grid that is dynamic and so we pass that dynamic pressure grid to HEMCO every timestep. I assume that decision was made for general correctness. We could instead used a fixed pressure grid for the interpolation but that is a judgment call from the CESM scientists.

I am not an expert on this algorithm or the alternatives that could be used, but I am sure that making the pressure ratio used in the algorithm constant rather than dynamic corrects the restart issue. We essentially have this ratio that is different every timestep since PEDGE is coming from CESM:

       DO l = 1, HcoState%NZ+1
          sigout(:,:,l) = HcoState%Grid%PEDGE%Val(:,:,l) &
                        / HcoState%Grid%PEDGE%Val(:,:,1)
       ENDDO

If we used an approximate grid that is constant then that would solve this issue. Or we might be able to implement a new interpolation from current emissions to new emissions based on change in pressure. That would require storing previous pressures and emissions in the restart file. But I am not sure if that interpolation would be possible without roundoff error or other errors.

How much data do we typically read anyway? Can we have a setting where we read N time-steps, so we do reads less frequently, albeit at the cost of memory (and, admittedly, development complexity)?

This currently only pertains to 3D emissions which currently consists only of aviation emissions. Setting how frequently to read is easily done in the HEMCO configuration file. But my understanding is we need the tests to pass because the error introduced by not vertically interpolating every timestep is not acceptable for science runs.

For my own curiosity, with runs with sub-hourly steps, do we do interpolation already?

In the version in CESM right now we only read and interpolate vertically at the start of the month. This PR changes that to hourly. For runs with sub-hourly timesteps there is no vertical interpolation between the hour. This is why the ERS tests which restart after 5 timesteps (2.5 hrs) give failure but my new tests which restart on the hour pass.

@lizziel
Copy link
Collaborator Author

lizziel commented Jan 15, 2025

@gold2718,

Could this be a variable that CAM could set through its namelist? That way, CAM could decide how often HEMCO needs to read and regrid, say every N timesteps. This would allow skipping reads on some timesteps if saving the regridding time is desired. CAM does that with other processes such as radiation and ali_arms.

We could set this through the namelist for a custom set of emissions. However, I think it is safer to use the HEMCO configuration file where read settings for all of the emissions are stored. Note that the read frequency is different for different emissions inventories and is not just a blanket value. Also, if users are allowed to change the read frequency to the same cadence as the files, in this case monthly, doesn't that defeat the purpose of the restart tests?

@lizziel
Copy link
Collaborator Author

lizziel commented Jan 16, 2025

Another thought I have about the design for a fix is that it should be inventory-independent. Currently the problem is only one inventory of emissions simply because aviation emissions happens to be the only 3D emissions. Whatever fix is applied should automatically be applied to additional 3D inventories.

Also, I should clarify my timing results for monthly versus hourly read in GEOS-Chem. The 6% value is for HEMCO only. For a one day run the HEMCO time increased from 31s to 33s. This is negligible compared to total run time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

4 participants