Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fsurdat file needed for NEON MOAB site #2801

Closed
samsrabin opened this issue Sep 30, 2024 · 18 comments · Fixed by #2888
Closed

fsurdat file needed for NEON MOAB site #2801

samsrabin opened this issue Sep 30, 2024 · 18 comments · Fixed by #2888
Assignees
Labels
bfb bit-for-bit done Issues whose closing PR is done but not yet merged (pending test re-run ok) testing additions or changes to tests

Comments

@samsrabin
Copy link
Collaborator

samsrabin commented Sep 30, 2024

I didn't want to open a whole new issue for this BUT...
In #2500 this test changed from FAIL (expected) to PEND in the SHAREDLIB_BUILD phase
SMS_Ld10_D_Mmpi-serial.CLM_USRDAT.I1PtClm60Bgc.derecho_gnu.clm-NEON-MOAB--clm-PRISM
with this error CLMBuildNamelist::add_default() : No default value found for fsurdat.

Same on izumi:
SMS_Ld10_D_Mmpi-serial.CLM_USRDAT.I1PtClm60Bgc.izumi_nag.clm-NEON-MOAB--clm-PRISM

Originally posted by @slevis-lmwg in #2310 (comment)


I'm elevating this to its own issue because now the cs.status output is confusing, and the expected failure isn't detected.

@samsrabin samsrabin added bug something is working incorrectly testing additions or changes to tests next this should get some attention in the next week or two. Normally each Thursday SE meeting. and removed bug something is working incorrectly labels Sep 30, 2024
@samsrabin
Copy link
Collaborator Author

Note that the cs.status output will make more sense—i.e., SETUP will be marked as FAIL—once we bring in cime6.1.27 or later; see ESMCI/cime#4681.

@wwieder wwieder added this to the cesm3_0_beta04 milestone Oct 3, 2024
@wwieder wwieder removed the next this should get some attention in the next week or two. Normally each Thursday SE meeting. label Oct 3, 2024
@slevis-lmwg
Copy link
Contributor

slevis-lmwg commented Oct 8, 2024

The file already exists here (78pft)
/glade/campaign/cesm/cesmdata/inputdata/lnd/clm2/surfdata_esmf/NEON/ctsm5.3.0/surfdata_1x1_NEON_MOAB_hist_2000_78pfts_c240912.nc
and here (16pft)
.../16PFT_mixed/surfdata_1x1_NEON_MOAB_hist_2000_16pfts_c240912.nc

@samsrabin
Copy link
Collaborator Author

So I guess something just needs to be changed in the XML for the test to pick that up?

@slevis-lmwg
Copy link
Contributor

@samsrabin this sounds simple, although @olyson and I looked at this for a few minutes this morning and found:

  1. The fsurdat setting seems correct in namelist_defaults_ctsm.xml
  2. Other neon tests work suggesting that this test does something different that causes it to break...

@slevis-lmwg
Copy link
Contributor

Additional info. This test works:
SMS_Ld10_D_Mmpi-serial.CLM_USRDAT.I1PtClm60Bgc.derecho_gnu.clm-NEON-MOAB

@samsrabin
Copy link
Collaborator Author

Ah, so it seems like the addition of the PRISM testmod is the issue.

@slevis-lmwg
Copy link
Contributor

slevis-lmwg commented Oct 30, 2024

UPDATE

I reverted the order of /testmods in the test like this:
SMS_Ld10_D_Mmpi-serial.CLM_USRDAT.I1PtClm60Bgc.derecho_intel.clm-PRISM--clm-NEON-MOAB
and the test passed. I will follow up with a test to confirm that I get same answers relative to the original test:

Running the two tests from ctsm5.2.028, i.e. the last tag when the original test passed:
Diffs in the lnd_in files suggest that we may see diffs in answers. The runs fail on izumi because they think that cesm.exe does not exist, which it does. If this problem persists, I will repeat these two tests on derecho.

@slevis-lmwg
Copy link
Contributor

slevis-lmwg commented Oct 31, 2024

The new test works but gives diff answers in ctsm5.2.028 (last tag when the default test still worked)
due to diff lnd_in (new test versus default test)

28,29c28,32
<  hist_fincl2 = 'AR', 'ELAI', 'FCEV', 'FCTR', 'FGEV', 'FIRA', 'FSA', 'FSH', 'GPP', 'H2OSOI', 'HR', 'SNOW_DEPTH',
<          'TBOT', 'TSOI', 'SOILC_vr', 'FV', 'NET_NMIN_vr'
---
>  hist_fincl2 = 'TG', 'TBOT', 'FIRE', 'FIRA', 'FLDS', 'FSDS', 'FSR', 'FSA', 'FGEV', 'FSH', 'FGR',
>          'TSOI', 'ERRSOI', 'SABV', 'SABG', 'FSDSVD', 'FSDSND', 'FSDSVI', 'FSDSNI', 'FSRVD', 'FSRND', 'FSRVI',
>          'FSRNI', 'TSA', 'FCTR', 'FCEV', 'QBOT', 'RH2M', 'H2OSOI', 'H2OSNO', 'SOILLIQ', 'SOILICE', 'TSA_U',
>          'TSA_R', 'TREFMNAV_U', 'TREFMNAV_R', 'TREFMXAV_U', 'TREFMXAV_R', 'TG_U', 'TG_R', 'RH2M_U', 'RH2M_R', 'QRUNOFF_U', 'QRUNOFF_R',
>          'SoilAlpha_U', 'SWup', 'LWup', 'URBAN_AC', 'URBAN_HEAT'
114,115c117,118
<  stream_fldfilename_lightng = '/glade/campaign/cesm/cesmdata/inputdata/atm/datm7/NASA_LIS/clmforc.Li_2016_climo1995-2013.360x720.lnfm_Total_NEONarea_c210625.nc'
<  stream_meshfile_lightng = '/glade/campaign/cesm/cesmdata/inputdata/atm/datm7/NASA_LIS/ESMF_MESH.Li_2016.360x720.NEONarea_cdf5_c221104.nc'
---
>  stream_fldfilename_lightng = '/glade/campaign/cesm/cesmdata/inputdata/atm/datm7/NASA_LIS/clmforc.Li_2016_climo1995-2013.360x720.lnfm_Total_c160825.nc'
>  stream_meshfile_lightng = '/glade/campaign/cesm/cesmdata/inputdata/atm/datm7/NASA_LIS/clmforc.Li_2016_climo1995-2013.360x720_ESMFmesh_cdf5_150621.nc'

Next I want to look at code diffs between ctsm5.2.029 and ctsm5.2.028 in case I spot the root cause of the failure.

@slevis-lmwg
Copy link
Contributor

slevis-lmwg commented Oct 31, 2024

From the code diffs 029 vs. 028, I see three main areas to focus on:

--- a/cime_config/usermods_dirs/NEON/defaults/user_nl_clm
+++ b/cime_config/usermods_dirs/NEON/defaults/user_nl_clm
@@ -18,9 +18,6 @@
 ! Set glc_do_dynglacier  with GLC_TWO_WAY_COUPLING               env variable
 !----------------------------------------------------------------------------------

-flanduse_timeseries = ' '   ! This isn't needed for a non transient case, but will be once we start using transient compsets
-fsurdat = "$DIN_LOC_ROOT/lnd/clm2/surfdata_esmf/NEON/surfdata_1x1_NEON_${NEONSITE}_hist_2000_78pfts_c240206.nc"
-
 ! h1 output stream

@slevis-lmwg
Copy link
Contributor

slevis-lmwg commented Oct 31, 2024

Putting back the code shown in the last post fixes the test failure.
But it also reverses an attempt to reduce code clutter.
Is there an alternative solution?
Is the /testmods order-reversal -- that I showed works -- an acceptable solution?

@samsrabin
Copy link
Collaborator Author

I think the root issue is that the NEON site defaults only apply if simulating 2018:

<!-- for NEON sites present day simulations - year 2000 -->
<fsurdat hgrid="CLM_USRDAT" neon=".true." sim_year="2018" use_fates=".true.">
lnd/clm2/surfdata_esmf/NEON/ctsm5.3.0/16PFT_mixed/surfdata_1x1_NEON_${NEONSITE}_hist_2000_16pfts_c240912.nc</fsurdat>
<fsurdat hgrid="CLM_USRDAT" neon=".true." sim_year="2018" use_fates=".false.">
lnd/clm2/surfdata_esmf/NEON/ctsm5.3.0/surfdata_1x1_NEON_${NEONSITE}_hist_2000_78pfts_c240912.nc</fsurdat>

Is there a reason for that?

@samsrabin
Copy link
Collaborator Author

Or another way of looking at it: The issue is that adding the PRISM testmod after the NEON one means that the NEON testmod's shell_commands seemingly never get run. Otherwise, the date would be set to 2018.

But that raises another question: When you run with PRISM first, does the PRISM testmod's shell_commands get run?

@samsrabin
Copy link
Collaborator Author

Never mind, that's not it. Both orderings result in the following output for ./xmlquery -p YR:

Results in group run_component_datm
	DATM_YR_ALIGN: 2018
	DATM_YR_END: 2020
	DATM_YR_START: 2018
	DATM_YR_START_FILENAME: 9999

And the following for ./xmlquery --listall | grep 2018:

	CLM_NML_USE_CASE: 2018_control
	DATM_YR_ALIGN: 2018
	DATM_YR_START: 2018

But I have to say, I don't like not knowing why the order matters...

@samsrabin
Copy link
Collaborator Author

Found it! The problem is that CLMBuildNamelist.pm doesn't set neon to .true. unless CLM_USRDAT_NAME is NEON. When the PRISM testmod comes second, CLM_USRDAT_NAME is set to NEON.PRISM. The following change fixes it:

--- a/bld/CLMBuildNamelist.pm
+++ b/bld/CLMBuildNamelist.pm
@@ -713,7 +713,7 @@ sub setup_cmdl_resolution {
   $nl_flags->{'neon'} = ".false.";
   $nl_flags->{'neonsite'} = "";
   if ( $nl_flags->{'res'} eq "CLM_USRDAT" ) {
-    if ( $opts->{'clm_usr_name'} eq "NEON" ) {
+    if ( $opts->{'clm_usr_name'} eq "NEON" || $opts->{'clm_usr_name'} eq "NEON.PRISM" ) {
        $nl_flags->{'neon'} = ".true.";
        $nl_flags->{'neonsite'} = $envxml_ref->{'NEONSITE'};
        $log->verbose_message( "This is a NEON site with NEONSITE = " . $nl_flags->{'neonsite'} );

However, there's probably a better way to do this with Perl—e.g., instead of checking for exact matches, just check whether the name starts with NEON.

@samsrabin
Copy link
Collaborator Author

Yep, like so:

--- a/bld/CLMBuildNamelist.pm
+++ b/bld/CLMBuildNamelist.pm
@@ -678,6 +678,11 @@ sub setup_cmdl_chk_res {
   }
 }

+sub begins_with
+{
+    return substr($_[0], 0, length($_[1])) eq $_[1];
+}
+
 sub setup_cmdl_resolution {
   my ($opts, $nl_flags, $definition, $defaults, $envxml_ref) = @_;

@@ -713,7 +718,7 @@ sub setup_cmdl_resolution {
   $nl_flags->{'neon'} = ".false.";
   $nl_flags->{'neonsite'} = "";
   if ( $nl_flags->{'res'} eq "CLM_USRDAT" ) {
-    if ( $opts->{'clm_usr_name'} eq "NEON" ) {
+    if ( begins_with($opts->{'clm_usr_name'}, "NEON") ) {
        $nl_flags->{'neon'} = ".true.";
        $nl_flags->{'neonsite'} = $envxml_ref->{'NEONSITE'};
        $log->verbose_message( "This is a NEON site with NEONSITE = " . $nl_flags->{'neonsite'} );

@slevis-lmwg
Copy link
Contributor

Thank you @samsrabin
I'm testing your suggestion now.

@slevis-lmwg
Copy link
Contributor

./create_test SMS_Ld10_D_Mmpi-serial.CLM_USRDAT.I1PtClm60Bgc.derecho_intel.clm-NEON-MOAB--clm-PRISM
worked on derecho, so I will open a PR with your suggested mods.

@slevis-lmwg slevis-lmwg added bfb bit-for-bit done Issues whose closing PR is done but not yet merged (pending test re-run ok) labels Nov 6, 2024
@slevis-lmwg slevis-lmwg moved this from In Progress to Done in LMWG: Near Term Priorities Nov 6, 2024
@slevis-lmwg
Copy link
Contributor

@samsrabin thank you for coming up with this elegant solution in the perl code.

We will close this issue when we merge the cesm3_0_beta04_changes branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bfb bit-for-bit done Issues whose closing PR is done but not yet merged (pending test re-run ok) testing additions or changes to tests
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

5 participants