-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugfix for where some fields are not in the input increment file #53
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it should work but I did not test.
I'm a bit worried I was too simplistic with this given the MPI send/receive |
Looks good, but I will give it a test in my current PR. |
If this hangs on MPI send/receive I think the next option is to just make a dummy array of all 0s for these missing fields |
I'm spinning up JEDI ATM g-w CI on Hera. I can test this PR in that parallel. @DavidHuber-NOAA may be able to get a test done more quickly via g-w PR #2819. |
I think the latest commit will be safer, it assigns the array as 0 so there won't be any MPI communication hangs. |
Alright, I'll update to that version then. |
WCOSS2 (Dogwood) tests Install Debugging found that changes are also needed in
Able to get JEDI DA gdasanalcalc to run to completion with the above change in place. Need to examine output more carefully to ensure resulting output files are acceptable. |
Analysis file with the above changes to Adding @emilyhcliu as a reviewer to this PR. The changes in this PR are a follow on to modifications made in GSI-utils PR #46. @emilyhcliu , your thoughts on the changes in this PR would be much appreciated. |
@CoryMartin-NOAA I tested the global workflow |
The messages received for the crash occurred when attempting to add an increment for ensres_calc_anl OrderedDict([('setup', {'datapath': "'./'", 'analysis_filename': "'anl.ensres'", 'firstguess_filename': "'ges.ensres'", 'increment_filename': "'siginc.nc'", 'fhr': 6, 'jedi': '.true.'})])
srun -n 127 --verbose --export=ALL /scratch1/NCEPDEV/stmp2/David.Huber/RUNDIRS/gsiutil_fix_ufs/gdas.2024022400/analcalc.1207962/calcanl_ensres_06/calc_anl.x submitted
srun: warning: can't honor --ntasks-per-node set to 40 which doesn't match the requested tasks 127 with the number of requested nodes 4. Ignoring --ntasks-per-node.
srun: defined options
srun: -------------------- --------------------
srun: (null) : h32m[02-03,12,44]
srun: cpus-per-task : 1
srun: export : ALL
srun: jobid : 66460077
srun: job-name : gsiutil_fix_ufs_gdasanalcalc_00
srun: mem : 48G
srun: nodes : 4
srun: ntasks : 127
srun: tres-per-task : cpu:1
srun: verbose : 1
srun: -------------------- --------------------
srun: end of defined options
srun: jobid 66460077: nodes(4):`h32m[02-03,12,44]', cpu counts: 40(x4)
srun: Implicitly setting --exact, because -c/--cpus-per-task given.
srun: CpuBindType=(null type)
srun: launching StepId=66460077.2 on host h32m02, 32 tasks: [0-31]
srun: launching StepId=66460077.2 on host h32m03, 32 tasks: [32-63]
srun: launching StepId=66460077.2 on host h32m12, 32 tasks: [64-95]
srun: launching StepId=66460077.2 on host h32m44, 31 tasks: [96-126]
srun: topology/tree: init: topology tree plugin loaded
srun: Node h32m44, 31 tasks started
srun: Node h32m02, 32 tasks started
srun: Node h32m03, 32 tasks started
srun: Node h32m12, 32 tasks started
* . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * .
PROGRAM CALC_ANALYSIS HAS BEGUN. COMPILED 2019300.00 ORG: EMC
STARTING DATE-TIME SEP 12,2024 01:14:49.502 256 THU 2460566
Analysis File = .//anl.ensres.06
First Guess File = .//ges.ensres.06
Increment File = .//siginc.nc.06
Forecast Hour = 6
Number of PEs = 127
input guess file and increment file should be in netCDF format
writing analysis in netCDF format
Use JEDI format = T
Background initialization date= 2024 2 23 18
0 0
nlon= 192
nlat= 96
nlev= 127
Analysis valid date= 2024 2 24 0
0 0
Copying from Background grid_xt
Copying from Background grid_yt
Copying from Background pfull
Copying from Background phalf
Adding Increment to clwmr liq_wat
Adding Increment to delz delz
Adding Increment to dpres delp
Copying from Background dzdt
Adding Increment to grle grle
rank of data array != variable ndims (or ndims-1)
99
rank of data array != variable ndims (or ndims-1)
rank of data array != variable ndims (or ndims-1)
rank of data array != variable ndims (or ndims-1)
rank of data array != variable ndims (or ndims-1)
rank of data array != variable ndims (or ndims-1)
rank of data array != variable ndims (or ndims-1)
rank of data array != variable ndims (or ndims-1)
...
srun: Received task exit notification for 31 tasks of StepId=66460077.2 (status=0x6300).
srun: error: h32m44: tasks 96-126: Exited with exit code 99
srun: Terminating StepId=66460077.2
slurmstepd: error: *** STEP 66460077.2 ON h32m02 CANCELLED AT 2024-09-12T01:14:52 ***
srun: Received task exit notification for 32 tasks of StepId=66460077.2 (status=0x6300).
srun: error: h32m03: tasks 32-63: Exited with exit code 99
srun: Received task exit notification for 32 tasks of StepId=66460077.2 (status=0x6300).
srun: error: h32m12: tasks 64-95: Exited with exit code 99
srun: Received task exit notification for 32 tasks of StepId=66460077.2 (status=0x6300).
srun: error: h32m02: tasks 0-31: Exited with exit code 99 |
@RussTreadon-NOAA @CoryMartin-NOAA The job now runs to completion. Apologies for not reading the entirety of your debugging efforts, Russ. |
jj=nlat+1-j ! increment is S->N, history files are N->S | ||
work3d_bg(:,j,1) = work3d_bg(:,j,1) + work3d_inc_jedi(:,jj,1) | ||
end do | ||
if (has_var(incncfile, trim(incvar)//"_inc")) then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this check if specific variables are missing, similar to how interp_inc
now allows for only the hydrometeors to be missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One could limit this check to the specific hydrometeors flagged in the revised driver.F90
. A more general approach was taken here to exclude all variables for which no increment is found. What approach shall we take? Thoughts @CatherineThomas-NOAA , @emilyhcliu , @CoryMartin-NOAA
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The best way is probably to relate the required increments from the state variable table in anavinfo. This requires a bit more coding and related script changes. For now, the proposed change should work and be acceptable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @emilyhcliu for your suggestion. I agree with you. We can stick with the easier solution for the time being.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logical variable readinc
added to improve runtime printout. When a JEDI increment file is processed and a specified increment field is not present readinc
is set to .false.
. The calling routine uses the jedi
logical along with readinc
to select the appropriate runtime message. If the increment is added to the background the runtime printout is, for example,
nid002740.dogwood.wcoss2.ncep.noaa.gov 0: Adding Increment to spfh sphum
nid002740.dogwood.wcoss2.ncep.noaa.gov 0: Adding Increment to tmp T
nid002740.dogwood.wcoss2.ncep.noaa.gov 0: Adding Increment to ugrd u
Otherwise the printout is
nid002740.dogwood.wcoss2.ncep.noaa.gov 0: Copying from Background grle
nid002740.dogwood.wcoss2.ncep.noaa.gov 0: Copying from Background rwmr
nid002740.dogwood.wcoss2.ncep.noaa.gov 0: Copying from Background snmr
Previously, even though JEDI increments for grle, rwmr, and snmr were not available, the runtime printout was
nid001756.dogwood.wcoss2.ncep.noaa.gov 0: Adding Increment to grle grle
nid001756.dogwood.wcoss2.ncep.noaa.gov 0: Adding Increment to rwmr rwmr
nid001756.dogwood.wcoss2.ncep.noaa.gov 0: Adding Increment to snmr snmr
This change committed at 1c512a1.
WCOSS2 (Dogwood) parallel tests We do not have ctests or CI for GSI-utils applications. In lieu of this, g-w CI
has been run on Dogwood. gfs and gdas analcalc jobs from each g-w CI configuration successfully ran to completion. |
@XiaqiongZhou-NOAA: For the new omega calculation, should the omga field also be included in the gaussian analysis file (gfs/gdas.tHHz.atmanl.nc)? Note that the DA will not produce an analysis increment for this field, it would simply be a copy of the model output. |
Thank you @CatherineThomas-NOAA for your help! |
@CatherineThomas-NOAA |
Thanks @XiaqiongZhou-NOAA. Is "omga" in atmanl needed for the creation of pgbanl? Tagging @WenMeng-NOAA as well. |
g-w CI tests on Hercules Check 1 - does
g-w CI for C96C48_hybatmDA, C96C48_ufs_hybatmDA, and C96C48_hybatmaerosnowDA has been run. All jobs for each configuration successfully run to completion. In particular, gdas and gfs analcalc jobs run with out error. This is especially important for C96C48_ufs_hybatmDA (JEDI-based DA) which previously failed. Check 2 - do As a second check resinstall For all three g-w CI configurations the
whereas the update only has missing values for
All other fields in the two sets of
to the job log file to note the fact that the The Unless reviewers have additional requests, the changes in this PR are ready for final review, approval, and merger into GSI-utils |
Any more changes or testing you want to see @DavidHuber-NOAA or @emilyhcliu? If not would you please approve this PR. I'll merge into |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good. Thanks @RussTreadon-NOAA @CoryMartin-NOAA!
Thank you @DavidHuber-NOAA for your review and approval |
@CatherineThomas-NOAA , Good question. If omga is included in pgbanl, it is still needed in atmanl. same question to @WenMeng-NOAA |
@RussTreadon-NOAA The test results look good to me. Especially, the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code changes look good to me.
@RussTreadon-NOAA Thank you for performing the tests, documenting them in detail, and adding the extra output lines to make it clear that the fields are copied from the background for fields with no increment (not analysis variables).
The omga
is new to me. If there is no increment to this field, but it is included in the analysis file, it should be filled with background for consistency and clarity, especially; if the downstream processing (e.g., post) needs it.
@WenMeng-NOAA , @XiaqiongZhou-NOAA , and @CatherineThomas-NOAA . |
Thank you @emilyhcliu for your comments and approval. I agree. We need to figure out the best way to handle |
The upp first reads 'omga' from model analysis/forecast. If it is not found, 'omga' will be calculated using 'W' in UPP. |
Thank you @WenMeng-NOAA for your reply. What does UPP do if |
@RussTreadon-NOAA UPP detects if 'dzdt' is missing from 'atmanl.nc'. If 'dzdt' is found, calculating omga in UPP. Please see the code here |
@RussTreadon-NOAA |
Thank you @WenMeng-NOAA and @XiaqiongZhou-NOAA. At present neither GSI nor JEDI based DA analyze
Sounds like the behavior you would like to see is for |
Thanks for running those thorough tests, @RussTreadon-NOAA.
I agree that I'm not a fan of non-analyses being in the analysis file, but this is already the case for other variables like dzdt. Having omga be a copy of the background would make it consistent with dzdt as well as having the same omga calculation for the pgbanl and pgbfFFF files. |
Hercules tests Install g-w PR #2875 on Hercules. g-w currently points at GSI-utils at gsi_utils.fd @ 9382fd0. Set up and run GSI-based DA using C96C48_hybatmDA and JEDI-based DA using C96C48_ufs_hybatmDA. Rerun gdasanalcalc three times. The first run used the GSI-utils g-w currently points at. The second run moved to the current head of GSI-utils The table below shows the
The final column exhibits the desired behavior. The @CatherineThomas-NOAA , @XiaqiongZhou-NOAA , @WenMeng-NOAA , @emilyhcliu , @DavidHuber-NOAA , @CoryMartin-NOAA , unless anyone objects, I plan on merging this PR in its current state into GSI-utils |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran this through the gdasanalcalc
job for three cycles a JEDI-based experiment without issue then was able to run the follow-on gdasanlupp
job without issue. Changes look good to me as well. Approve.
Thank you @DavidHuber-NOAA for testing the revised code. I'll go ahead and merge into |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approve.
…A-EMC#53) * skip possibly missing hydrometeors * add warning print * handle case of missing increment in calc_analysis, add omga to calc_analysis * correct erroneous calc_analysis runtime printout
Hopefully fixes problem found in NOAA-EMC/global-workflow#2819