Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add FED EnVar DA Capability #632

Merged
merged 29 commits into from
Oct 31, 2023
Merged

Conversation

hongli-wang
Copy link
Collaborator

@hongli-wang hongli-wang commented Sep 29, 2023

Description

Please include relevant motivation and context

  • This PR supports RRFS_B GSI FED assimilation.
  • This PR adds a new GSI EnVar FED assimilation capability.

** Please include a summary of the change and which issue is fixed**

  • Read FED background and ensemble from restart phy files
  • Add new control/state variable of fed ( in anavinfo, section: met guess, state and control variable)
  • Create intfed.f90 and sfpfed.f90 for minimization.
  • Other related codes. For example, update hydrometers when either dbz or fed is assimilated, or both are assimilated. Previously the update of hydrometers is done only when dbz is assimilated.

Please provide reference to the issue this pull request is addressing
Please see
Fixes #622

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

  • One FED obs DA test
  • Real FED DA with pseudo ensemble for code development and debug
  • Real FED DA with real ensemble

Checklist

  • [X ] My code follows the style guidelines of this project
  • [X ] I have performed a self-review of my own code
  • [X ] I have commented my code, particularly in hard-to-understand areas
  • New and existing tests pass with my changes
  • Any dependent changes have been merged and published

DUE DATE for this PR is 11/10/2023. If this PR is not merged into develop by this date, the PR will be closed and returned to the developer.

@hongli-wang
Copy link
Collaborator Author

hongli-wang commented Sep 29, 2023

@hu5970 @TingLei-NOAA
Please wait to review until I merge some EMC GSI/develop PRs.
Thanks,
Hongli

src/gsi/control2state.f90 Outdated Show resolved Hide resolved
src/gsi/ensctl2model_ad.f90 Outdated Show resolved Hide resolved
src/gsi/ensctl2state.f90 Outdated Show resolved Hide resolved
src/gsi/intfed.f90 Outdated Show resolved Hide resolved
src/gsi/intfed.f90 Outdated Show resolved Hide resolved
src/gsi/stpfed.f90 Outdated Show resolved Hide resolved
@hongli-wang
Copy link
Collaborator Author

@TingLei-NOAA @wangym1111
Hi Ting and Yongming,

Could you please wait until next Monday to start the review? I may have some update this week.

Thanks for your time and effort!

Hongli

	modified:   src/gsi/control2state.f90
	modified:   src/gsi/cplr_get_fv3_regional_ensperts.f90
	modified:   src/gsi/ensctl2model_ad.f90
	modified:   src/gsi/ensctl2state.f90
	modified:   src/gsi/intfed.f90
	modified:   src/gsi/stpfed.f90
@hu5970
Copy link
Collaborator

hu5970 commented Oct 27, 2023

@hongli-wang Could you sync with develop branch? I will do regression tests on WCOSS2 after syncing.

@hongli-wang
Copy link
Collaborator Author

@hongli-wang Could you sync with develop branch? I will do regression tests on WCOSS2 after syncing.

@hu5970
Just merged the emc/develop branch into my branch.

Hongli

@hu5970
Copy link
Collaborator

hu5970 commented Oct 27, 2023

Regression tests on WCOSS2 finished:

[ming.hu@dlogin01 build] ctest -j7
Test project /lfs/h2/emc/ptmp/Ming.Hu/gsi/fed/GSI/build
    Start 1: global_4denvar
    Start 2: rtma
    Start 3: rrfs_3denvar_glbens
    Start 4: netcdf_fv3_regional
    Start 5: hafs_4denvar_glbens
    Start 6: hafs_3denvar_hybens
    Start 7: global_enkf
1/7 Test #4: netcdf_fv3_regional ..............***Failed  482.85 sec
2/7 Test #3: rrfs_3denvar_glbens ..............   Passed  485.85 sec
3/7 Test #7: global_enkf ......................   Passed  609.49 sec
4/7 Test #2: rtma .............................   Passed  969.78 sec
5/7 Test #6: hafs_3denvar_hybens ..............   Passed  1214.75 sec
6/7 Test #5: hafs_4denvar_glbens ..............   Passed  1218.08 sec
7/7 Test #1: global_4denvar ...................   Passed  1382.72 sec

86% tests passed, 1 tests failed out of 7

Total Test time (real) = 1382.73 sec

The following tests FAILED:
	  4 - netcdf_fv3_regional (Failed)
Errors while running CTest
[ming.hu@dlogin01 build] ctest -R netcdf_fv3_regional
Test project /lfs/h2/emc/ptmp/Ming.Hu/gsi/fed/GSI/build
    Start 4: netcdf_fv3_regional
1/1 Test #4: netcdf_fv3_regional ..............   Passed  482.93 sec

100% tests passed, 0 tests failed out of 1

Total Test time (real) = 483.22 sec

The failure of "netcdf_fv3_regional" in the first run is because over allowable memory.

So, All regression test cases passed.

@hongli-wang
Copy link
Collaborator Author

Regression tests on WCOSS2 finished:

[ming.hu@dlogin01 build] ctest -j7
Test project /lfs/h2/emc/ptmp/Ming.Hu/gsi/fed/GSI/build
    Start 1: global_4denvar
    Start 2: rtma
    Start 3: rrfs_3denvar_glbens
    Start 4: netcdf_fv3_regional
    Start 5: hafs_4denvar_glbens
    Start 6: hafs_3denvar_hybens
    Start 7: global_enkf
1/7 Test #4: netcdf_fv3_regional ..............***Failed  482.85 sec
2/7 Test #3: rrfs_3denvar_glbens ..............   Passed  485.85 sec
3/7 Test #7: global_enkf ......................   Passed  609.49 sec
4/7 Test #2: rtma .............................   Passed  969.78 sec
5/7 Test #6: hafs_3denvar_hybens ..............   Passed  1214.75 sec
6/7 Test #5: hafs_4denvar_glbens ..............   Passed  1218.08 sec
7/7 Test #1: global_4denvar ...................   Passed  1382.72 sec

86% tests passed, 1 tests failed out of 7

Total Test time (real) = 1382.73 sec

The following tests FAILED:
	  4 - netcdf_fv3_regional (Failed)
Errors while running CTest
[ming.hu@dlogin01 build] ctest -R netcdf_fv3_regional
Test project /lfs/h2/emc/ptmp/Ming.Hu/gsi/fed/GSI/build
    Start 4: netcdf_fv3_regional
1/1 Test #4: netcdf_fv3_regional ..............   Passed  482.93 sec

100% tests passed, 0 tests failed out of 1

Total Test time (real) = 483.22 sec

The failure of "netcdf_fv3_regional" in the first run is because over allowable memory.

So, All regression test cases passed.

@hu5970 Would you please look at my test:
/scratch1/BMC/wrfruc/hwang/ctest
Test #1: [=[global_4denvar]=]
Test #2: [=[rtma]=]
Test #3: [=[rrfs_3denvar_glbens]=]
Test #4: [=[netcdf_fv3_regional]=]
Test #5: [=[hafs_4denvar_glbens]=]
Test #6: [=[hafs_3denvar_hybens]=]
Test #7: [=[global_enkf]=]
1/7 Test #1: [=[global_4denvar]=] ............. Passed 1672.95 sec
2/7 Test #2: [=[rtma]=] ....................... Passed 1104.29 sec
3/7 Test #3: [=[rrfs_3denvar_glbens]=] ........***Failed 62.73 sec
4/7 Test #4: [=[netcdf_fv3_regional]=] ........ Passed 500.71 sec
5/7 Test #5: [=[hafs_4denvar_glbens]=] ........***Failed 1535.13 sec
6/7 Test #6: [=[hafs_3denvar_hybens]=] ........ Passed 1356.76 sec
7/7 Test #7: [=[global_enkf]=] ................ Passed 1210.17 sec

@hu5970
Copy link
Collaborator

hu5970 commented Oct 27, 2023

@hongli-wang

Your "hafs_4denvar_glbens" is OK. It failed at "time-thresh":
"The runtime for hafs_4denvar_glbens_loproc_updat is 432.770166 seconds. This has exceeded maximum allowable threshold time of 412.679578 seconds"

But "rrfs_3denvar_glbens" failed during start of the first run. Please rerun this one to see if you can repeat the crash.

@hongli-wang
Copy link
Collaborator Author

@hongli-wang

Your "hafs_4denvar_glbens" is OK. It failed at "time-thresh": "The runtime for hafs_4denvar_glbens_loproc_updat is 432.770166 seconds. This has exceeded maximum allowable threshold time of 412.679578 seconds"

But "rrfs_3denvar_glbens" failed during start of the first run. Please rerun this one to see if you can repeat the crash.

@hu5970 @TingLei-NOAA
It failed again. Please feel free to look into details. Thanks.

/scratch1/BMC/wrfruc/hwang/ctest/GSI

1/1 Test #3: [=[rrfs_3denvar_glbens]=] ........***Failed 61.64 sec

0% tests passed, 1 tests failed out of 1

Total Test time (real) = 61.66 sec

The following tests FAILED:
3 - [=[rrfs_3denvar_glbens]=] (Failed)
Errors while running CTest

forrtl: severe (71): integer divide by zero
Image PC Routine Line Source
gsi.x 00000000025DD7CB Unknown Unknown Unknown
libpthread-2.17.s 00002B4233931630 Unknown Unknown Unknown
gsi.x 00000000004BE091 general_sub2grid_ 447 general_sub2grid_mod.f90
gsi.x 000000000059A969 gridmod_mp_init_g 676 gridmod.F90
gsi.x 000000000041B0A3 gsimod_mp_gsimain 2288 gsimod.F90
gsi.x 000000000040CEFD MAIN__ 618 gsimain.f90
gsi.x 000000000040CEA2 Unknown Unknown Unknown
libc-2.17.so 00002B423440F555 _libc_start_main Unknown Unknown
gsi.x 000000000040CDA9 Unknown Unknown Unknown
==== backtrace (tid: 223967) ====
0 0x000000000004d455 ucs_debug_print_backtrace() ???:0
1 0x00000000004be091 general_sub2grid_mod_mp_general_sub2grid_create_info
() /scratch1/BMC/wrfruc/hwang/ctest/GSI/src/gsi/general_sub2grid_mod.f90:447

@hu5970
Copy link
Collaborator

hu5970 commented Oct 27, 2023

@hongli-wang The crash point is related to the number of cores used for test. It does not make sense to me why it crashed.
Could you rebuild the ctest (recompile GSI) and try again? Based on my test on Hera, I suspect your ctest was not setup right.

@hu5970
Copy link
Collaborator

hu5970 commented Oct 27, 2023

I run regression test on Hera also. All cases passed without problem:

[Ming.Hu@hfe10 build]$ ctest -j7
Test project /scratch1/BMC/wrfruc/mhu/gsi/fed/GSI/build
    Start 1: global_4denvar
    Start 2: rtma
    Start 3: rrfs_3denvar_glbens
    Start 4: netcdf_fv3_regional
    Start 5: hafs_4denvar_glbens
    Start 6: hafs_3denvar_hybens
    Start 7: global_enkf
1/7 Test #4: netcdf_fv3_regional ..............   Passed  737.59 sec
2/7 Test #3: rrfs_3denvar_glbens ..............   Passed  796.59 sec
3/7 Test #2: rtma .............................   Passed  1161.30 sec
4/7 Test #7: global_enkf ......................   Passed  1192.56 sec
5/7 Test #6: hafs_3denvar_hybens ..............   Passed  1475.04 sec
6/7 Test #5: hafs_4denvar_glbens ..............   Passed  1663.31 sec
7/7 Test #1: global_4denvar ...................   Passed  1736.56 sec

100% tests passed, 0 tests failed out of 7

Total Test time (real) = 1736.73 sec

@hongli-wang
Copy link
Collaborator Author

hongli-wang commented Oct 27, 2023

I run regression test on Hera also. All cases passed without problem:

[Ming.Hu@hfe10 build]$ ctest -j7
Test project /scratch1/BMC/wrfruc/mhu/gsi/fed/GSI/build
    Start 1: global_4denvar
    Start 2: rtma
    Start 3: rrfs_3denvar_glbens
    Start 4: netcdf_fv3_regional
    Start 5: hafs_4denvar_glbens
    Start 6: hafs_3denvar_hybens
    Start 7: global_enkf
1/7 Test #4: netcdf_fv3_regional ..............   Passed  737.59 sec
2/7 Test #3: rrfs_3denvar_glbens ..............   Passed  796.59 sec
3/7 Test #2: rtma .............................   Passed  1161.30 sec
4/7 Test #7: global_enkf ......................   Passed  1192.56 sec
5/7 Test #6: hafs_3denvar_hybens ..............   Passed  1475.04 sec
6/7 Test #5: hafs_4denvar_glbens ..............   Passed  1663.31 sec
7/7 Test #1: global_4denvar ...................   Passed  1736.56 sec

100% tests passed, 0 tests failed out of 7

Total Test time (real) = 1736.73 sec

@TingLei-NOAA @wangym1111
Based on Ting's ctest on WCOSS and Hera, all tests are passed. Thanks again for your time and effort.

Thanks,
Hongli

@hongli-wang
Copy link
Collaborator Author

@hu5970 @TingLei-NOAA @wangym1111

The previous regression test on case #3 is because of a submulde issue that uses fix files in master branch.
Now the regression for the case #3 test passed when the issue fixed.

Start 3: [=[rrfs_3denvar_glbens]=]

1/1 Test #3: [=[rrfs_3denvar_glbens]=] ........ Passed 739.48 sec

100% tests passed, 0 tests failed out of 1

Total Test time (real) = 739.50 sec

@RussTreadon-NOAA
Copy link
Contributor

RussTreadon-NOAA commented Oct 30, 2023

Four questions:

  1. Have all the conversations opened by peer reviewers been satisfactorily resolved?
  2. Do we need to run regression tests on Orion?
  3. Have we run tests with FED assimilation turned on in EnVar mode to confirm the new functionality performs as expected?
  4. Do we need to update the rrfs regression test to ensure this new functionality is not broken by future PRs?

@TingLei-NOAA
Copy link
Contributor

Four questions:

  1. Have all the conversations opened by peer reviewers been satisfactorily resolved?
  2. Do we need to run regression tests on Orion?
  3. Have we run tests with FED assimilation turned on in EnVar mode to confirm the new functionality performs as expected?
  4. Do we need to update the rrfs regression test to ensure this new functionality is not broken by future PRs?

To the first question, I think the "approval" after all those conversations could be used to show all conversations have finished to the reviewers' satisfactions.
To the second question, I assume this has been verified by the developers. Should some reports on it be given in the associated GSI issue? @hongli-wang

@ShunLiu-NOAA
Copy link
Contributor

@hu5970 and @hongli-wang could you please go over this PR and click "Resolve conversation" button if the comments or questions from reviewers are addressed.

@RussTreadon-NOAA
Copy link
Contributor

Thank you, @TingLei-NOAA , for your reply. Would you mind resolving your open conversations?

@TingLei-NOAA
Copy link
Contributor

Thank you, @TingLei-NOAA , for your reply. Would you mind resolving your open conversations?

@RussTreadon-NOAA as we talked on this before. For reviewers without write permission to EMC GSI, they could not "resolve" those conversations by any "resolve" button. Of course, I could reply at the end of each conversation of mine with "resolved" . But for the number of those conversations, that is really an unnecessary burden.. So, I hope the final "approval " by a certain reviewer could avoid any concerns to unresolved conversation related to the reviewer. Thanks.

@RussTreadon-NOAA
Copy link
Contributor

Thank you @TingLei-NOAA. My apologies for forgetting our previous conversation. You are right. You can not resolve conversations even if you initiate the conversation.

Resolving conversations
You can resolve a conversation in a pull request if you opened the pull request or if you have write access to the repository where the pull request was opened.

The PR author, in this case @hongli-wang , should resolve conversations after consultation with peer reviewers. If the PR author can not resolve conversations, a member of the handling review team must do so.

@TingLei-NOAA
Copy link
Contributor

@RussTreadon-NOAA Thanks a lot for sorting it out!

@hongli-wang
Copy link
Collaborator Author

Thank you @TingLei-NOAA. My apologies for forgetting our previous conversation. You are right. You can not resolve conversations even if you initiate the conversation.

Resolving conversations
You can resolve a conversation in a pull request if you opened the pull request or if you have write access to the repository where the pull request was opened.

The PR author, in this case @hongli-wang , should resolve conversations after consultation with peer reviewers. If the PR author can not resolve conversations, a member of the handling review team must do so.

@RussTreadon-NOAA
First, regarding the question #3 on FED EnVar test, I have a GSI real case test with 30 member, it works as expected. GSL has planed a retro run for a full evaluation when the PR is done.

Regarding reviews, I had addressed all the comments. I am working on resolving these conversions now.

Thanks.
Hongli

@RussTreadon-NOAA
Copy link
Contributor

Thank you @hongli-wang for confirming that the code in the PR is functionally correct with regards to FED EnVar assimilation. Thank you also for resolving open conversations.

@hongli-wang
Copy link
Collaborator Author

Four questions:

  1. Have all the conversations opened by peer reviewers been satisfactorily resolved?
  2. Do we need to run regression tests on Orion?
  3. Have we run tests with FED assimilation turned on in EnVar mode to confirm the new functionality performs as expected?
  4. Do we need to update the rrfs regression test to ensure this new functionality is not broken by future PRs?

To the first question, I think the "approval" after all those conversations could be used to show all conversations have finished to the reviewers' satisfactions. To the second question, I assume this has been verified by the developers. Should some reports on it be given in the associated GSI issue? @hongli-wang

@TingLei-NOAA @RussTreadon-NOAA @hu5970
Regarding the second question, regression tests on Orion, I haven't done that.

Thanks,
Hongli

@RussTreadon-NOAA
Copy link
Contributor

Thank you @hongli-wang for confirming that regression tests for this PR have not yet been run on Orion. GSI developers use Orion. If you have an account on Orion, I encourage you to run ctests on Orion as well as Hera.

I am in the process of cloning hongli-wang:GSI_fed_3denvar on Orion. After the clone finishes, I'll build and run ctests.

@hu5970
Copy link
Collaborator

hu5970 commented Oct 31, 2023

I did a regression test on Orion:

    Start 1: [=[global_4denvar]=]
    Start 2: [=[rtma]=]
    Start 3: [=[rrfs_3denvar_glbens]=]
    Start 4: [=[netcdf_fv3_regional]=]
    Start 5: [=[hafs_4denvar_glbens]=]
    Start 6: [=[hafs_3denvar_hybens]=]
    Start 7: [=[global_enkf]=]

1/7 Test #4: [=[netcdf_fv3_regional]=] ........***Failed  1683.35 sec
2/7 Test #7: [=[global_enkf]=] ................   Passed  1688.98 sec
3/7 Test #3: [=[rrfs_3denvar_glbens]=] ........   Passed  1986.31 sec
4/7 Test #6: [=[hafs_3denvar_hybens]=] ........   Passed  2482.23 sec
5/7 Test #5: [=[hafs_4denvar_glbens]=] ........   Passed  2536.61 sec
6/7 Test #1: [=[global_4denvar]=] .............   Passed  4744.45 sec
7/7 Test #2: [=[rtma]=] .......................   Passed  4990.27 sec

86% tests passed, 1 tests failed out of 7

Total Test time (real) = 4990.39 sec

The following tests FAILED:
	  4 - [=[netcdf_fv3_regional]=] (Failed)
Errors while running CTest

The "netcdf_fv3_regional" failed because of a little over the memory limitation. It is not a fatal failure.

@hu5970
Copy link
Collaborator

hu5970 commented Oct 31, 2023

On the 4th question, we are still testing SDL/VDL. After we finalize the configuration of the RRFS GSI, we will make several cases for RRFS.

@RussTreadon-NOAA
Copy link
Contributor

Thank you @hu5970 for running regression tests for this PR on Orion. Your ctest timings are consistent with behavior reported in g-w issue #1996. gsi.x (and possibly enkf.x) wall times have significantly increased on Orion. The longer run times appear to coincide with completion of the 10/23 Orion PM.

@RussTreadon-NOAA
Copy link
Contributor

On the 4th question, we are still testing SDL/VDL. After we finalize the configuration of the RRFS GSI, we will make several cases for RRFS.

Sounds good. Any idea how many new tests several means? We just reduced the number of ctests to 7. It would be nice to keep the number of ctests relatively low.

@hu5970
Copy link
Collaborator

hu5970 commented Oct 31, 2023

@RussTreadon-NOAA One case for GSI hybrid analysis (if we use SDL/VDL). Two cases for EnKF (conventional and radar dbz analysis). But we will remove current rrfs and netcdf_fv3_regional. So, we will have 8 cases.

@hu5970 hu5970 merged commit acfe56d into NOAA-EMC:develop Oct 31, 2023
@RussTreadon-NOAA
Copy link
Contributor

Great! Add 3, remove 2. Thanks for letting us know about future changes in the GSI regression test suite.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add lightning flash extent density (FED) EnVar assimilation capability
9 participants