-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trouble diagnosing crash within mo_drag when called from LM4 UFS project #1256
Comments
Are you running with any openmp threads? Are you calling monin_obukhov_solve_zeta from an openmp region? Is this crash repeatable (it fails the same way every time? |
@JustinPerket The debug mode for FMS's CMake build was recently added, it's mainly for allowing the person building to set custom flags (it'll compile with what is set in the CFLAGS and FCFLAGS environment variables). It doesn't add any flags on its own, just sets them directly to CFLAGS and FCFLAGS. With this build, it's not adding the automatically added flags since it's set to debug so then it's failing to compile because its missing needed flags for the r4/r8 libraries which results in an arg mismatch in those interface calls. I would compile with the other build type (Release) if compiling with r4/r8, the debug can be used but flags will need to be manually set so you would have to do a separate compile for the r4/r8 libaries and also add in any debug flags via the environment variables. We could potentially make this behave more standard-ly, and just have it add in expected debug flags. |
I should probably back up and say the top, main repo for this project is: And here is the temp branch of the LM4 NUOPC driver where I'm implementing some of the FMS surface boundary layer functionality into it: Here is an even more temporary branch that reproduces the error: It's checked out on hera at /scratch2/GFDL/gfdlscr/Justin.Perket/UFSmodels/ufs-LM4-foo A more full trace of one of the threads is:
Within my top LM4 routine, a modified version of sfc_boundary_layer is called, which calls a very lightly modified version of surface_flux_1d After that, it's using FMS modules, so when mo-drag is called, it's from
I'm not sure. There is openmp threading in UFS enabled by default. As far as I'm aware, I'm not explicitly building or using it. There is a UFS build option |
Ok, that's what I thought it looked like it was doing.
ok, this may be a red herring then. I was hoping it would give some insight on why this crash only occurs with UFS compiled with it's debug flag. |
@JustinPerket It could be that the debug flags are catching a divide by zero that's happening in both (release & debug) runs. In standard fortran you can divide real values by zero without an error, you would just get an infinite value as a result. The debug build is adding the |
@rem1776 Ahh, thanks! I didn't know that |
In that case, I t's most likely something is wrong with the arguments to mo_drag. Though from a debugger and write statements, they seem sensible. I'll dig into it more using my release build of FMS 2022.04 |
So I'm unable to replicate the error produced by the FMS 2022.04 module on hera or gaea with my own build of FMS. Like I said before, the crash using the FMS module that UFS uses seems to be at where (mask_1)
rzeta = 1.0/zeta
zeta_0 = zeta/z_z0
zeta_t = zeta/z_zt
zeta_q = zeta/z_zq
end where but checking values for r_zeta and related variables with my build of FMS 2022.04 all seem fine. No Nans, Infs, and no values anywhere close to cause a divide by zero error.
I also tried adding UFS's DEBUG mode adds flags |
For the benefit of anyone searching for a solution to a similar problem: When FMS is built with So to summarize, a debug-mode UFS build shouldn't be linked with an optimized FMS build because |
The problem:
I've been stuck on this issue in my LM4 NUOPC cap for UFS. As part of this project, I've brought in parts of the surface boundary layer scheme into a LM4 driver, working on the lands' unstructured grid.
There is a crash in
mo_drag
within a lightly modified version ofsurface_flux_1d
, but only when UFS is in debug mode (cmake flags-DDEBUG=ON -DCMAKE_BUILD_TYPE=Debug
).If I build with no debug flags, there is no crash in my surface_flux adaption.
the stack trace is:
This is with FMS 2022.04, so it seems to point to this spot in
monin_obukhov_solve_zeta
:It seems that zeta is/becomes zero during the solver iteration loop?
Attempts to debug stymied:
Input arguments of
surface_flux_1d
andmo_drag
appear to be well-behaved, and unremarkable realistic values.It appears that the cause of the crash is sensitive to wind speeds and bottom atmosphere layer temperature.
Because UFS is using a release module of FMS, I can't dive into what values of the arguments might be causing an issue. And again, the issue only seems to appear when UFS is in debug mode.
I also built my own checkout of FMS 2022.04 both in Release and Debug modes.:
If I build and run with UFS's debug flags, and the Release version of FMS, to copy the module setup, there's no crash or sign of anything wrong.
FMS is built using CMake with the flags:
-D32BIT=ON -D64BIT=ON -DOPENMP=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=$FMS_INSTALL_DIR
.When UFS compiles, I then unload the FMS module and set FMS_INSTALL_DIR. And UFS's CMake happily picks this up with find_package / add_library
However, if I build FMS's debug version with
-DCMAKE_BUILD_TYPE=Debug
instead ofRelease
, UFS on compile seems to find FMS and adds the library, but then can't find a subroutine in grid/grid2_mod:(On hera to avoid any possible C4 issues)
The text was updated successfully, but these errors were encountered: