Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix GEOS-Chem Classic parallelization errors revealed by parallelization tests - closes #1637 #1682

Merged
merged 22 commits into from
Jul 13, 2023

Conversation

yantosca
Copy link
Contributor

@yantosca yantosca commented Mar 2, 2023

Name and institution

Name: Bob Yantosca
Institution: Harvard / GEOS-Chem Support Team

Overview

The new GEOS-Chem Classic parallelization tests reveal several failures:

gc_4x5_47L_merra2_fullchem_TOMAS15..................Execute Simulation....FAIL
gc_4x5_merra2_fullchem_APM..........................Execute Simulation....FAIL
gc_4x5_merra2_fullchem_LuoWd........................Execute Simulation....FAIL
gc_4x5_merra2_Hg....................................Execute Simulation....FAIL

... etc... 

Summary of test results:
------------------------------------------------------------------------------
Parallelization tests passed: 19
Parallelization tests failed: 4
Parallelization tests not yet completed: 0

We will fix these issues in this PR. This is currently a draft and should not be merged.

closes #1637

GeosCore/ocean_mercury_mod.F90
- Rewrote the !$OMP PRIVATE declarations in routine OCEAN_MERCURY_FLUX,
  where some variables had not been declared as !$OMP PRIVATE.  Also
  made sure to zero all !$OMP PRIVATE loop variables at the top of the
  loop in order to prevent leftover values from prior iterations from
  propagating forward.
- Added !$OMP COLLAPSE to all parallel loops in the module for
  better computational efficiency.
- Added !$OMP SCHEDULE( DYNAMIC, 24 ) where expedient.

These updates produced identical results w/r/t the prior commit
124a934 using 8 cores Dev (this commit) and 5 cores for Ref
(124a934cb).

Signed-off-by: Bob Yantosca <[email protected]>
GeosCore/wetscav_mod.F90
- Add the following updates to the Luo wetdep scheme:
  - Nullify the p_pHCloud pointer in routine RAINOUT
  - Zero scalars used for Luo wetdep scheme in routine WASHOUT

This seems to avoid parallel errors
TODO: Verify with parallelization test.

Signed-off-by: Bob Yantosca <[email protected]>
@yantosca yantosca added category: Bug Something isn't working topic: Performance Related to GEOS-Chem speed, memory, or parallelization labels Mar 2, 2023
@yantosca yantosca added this to the 14.2.0 milestone Mar 2, 2023
@yantosca yantosca self-assigned this Mar 2, 2023
yantosca added 4 commits March 3, 2023 18:04
GeosCore/wetscav_mod.F90
- Add the !$OMP COLLAPSE statement to all parallel loops for
  better efficiency.
- Also add !$OMP SCHEDULE( DYNAMIC, 24 ) to loops where load balancing
  is an issue.

Signed-off-by: Bob Yantosca <[email protected]>
GeosCore/wetscav_mod.F90
- In the parallel loop near line 413, now collapse over 3 loops
  instead of 2.  The 2 was a mistake.  This will be most efficient.

Signed-off-by: Bob Yantosca <[email protected]>
GeosCore/sulfate_mod.F90
- In routine CHEM_SO2, in the main parallel loop:
  - Zero one_m_KRATE at the top of the loop to prevent uninitialized
    values from causing numerical noise differences
  - Add !$OMP COLLAPSE(3) for improved parallelization efficiency
  - Assign 24 grid boxes to each CPU dynamically (DYNAMIC,24)
@msulprizio
Copy link
Contributor

@yantosca What is the status of this PR? Is it still a draft and if so can we change the milestone to In Progress?

Copy link
Contributor Author

Still some work needs to be done. We can change it to In progress.

@msulprizio msulprizio modified the milestones: 14.2.0, In progress Mar 28, 2023
@yantosca yantosca changed the base branch from dev/14.2.0 to main April 7, 2023 18:01
@stale
Copy link

stale bot commented May 8, 2023

This issue has been automatically marked as stale because it has not had recent activity. If there are no updates within 7 days it will be closed. You can add the "never stale" tag to prevent the Stale bot from closing this issue.

@stale stale bot added the stale No recent activity on this issue label May 8, 2023
@stale stale bot removed the stale No recent activity on this issue label May 17, 2023
GeosCore/mercury_mod.F90
- Remove START and FINISH from the !$OMP PRIVATE declaration, these
  variables are no longer used.
- Cosmetic changes

GeosCore/drydep_mod.F90
- Add !$OMP COLLAPSE(2) to the parallel loop in routine METERO
- Add private variable F0_K in routine DRYDEP
- Set F0_K = F0(K) in routine DRYDEP.  Use F0_K in equations where
  F0_K was previously used
- For the Hg0 drydep update over the Amazon rainforest, modify the
  F0_K variable instead of F0(K).  The F0 variable is global and not
  private, so modifying it within the parallel loop was the source
  of numerical noise differences.
- Add !$OMP COLLAPSE( 2 ) to the parallel loop in DEPVEL for
  improved efficiency.
- Zero all PRIVATE variables that are not already assigned at the
  top of the parallel loop in routine DRYDEP.
- Replace DO loops that zero VD and RSURFC with direct assignments
- Remove ELSE block for "not Amazon rainforest" (i.e. never-nesting)
- Cosmetic changes

Signed-off-by: Bob Yantosca <[email protected]>
@yantosca
Copy link
Contributor Author

I have fixed the differences caused by parallelization in the Hg simulation. I traced it down to the dry deposition routine DEPVEL, where F0(K) was modified for the Amazon rainforest. F0 is a global module array and thus was not included in the !$OMP PRIVATE statements for the loop.

The solution is to store F0(K) in a PRIVATE variable F0_K, and then use F0_K wherever F0(K) was used, F0_K can then be modified for the Amazon rainforest without causing parallel differences.

          DO 160  K = 1,NUMDEP

             ! Save F0(K) in a private variable to avoid diffs due
             ! to parallelization -- Bob Yantosca (17 May 2023)
             F0_K = F0(K)

             ... etc ...

             !** exit for non-depositing species or aerosols.
             IF (.NOT. LDEP(K) .OR. AIROSOL(K)) GOTO 155

             ! Test for special treatment for O3 drydep to ocean
             N_SPC = State_Chm%Map_DryDep(K)
             IF ((N_SPC .EQ. ID_O3) .AND. (II .EQ. 11)) THEN
                IF (State_Chm%SALINITY(I,J) .GT. 20.0_f8) THEN
                   ... etc ...
                ELSE
                   ... etc ..,.
                ENDIF

             ELSE IF ((N_SPC .EQ. ID_O3) .AND. (State_Met%isSnow(I,J))) THEN
                ... etc. ...
    
             ELSE
                ! Check latitude and longitude, alter F0 only for Amazon
                ! rainforest for Hg0 (see reference: Feinberg et al., ESPI,
                ! 2022: Evaluating atmospheric mercury (Hg) uptake by
                ! vegetation in a chemistry-transport model)
                !
                ! Remove IF/ELSE block using never-nesting technique.
                !   - Bob Yantosca (17 May 2023)
                IF ( N_SPC == ID_Hg0 ) THEN

                   ! Assume lower reactivity
                   F0_K = 3.0e-05_f8 

                   ! But if this is the rainforest land type and we fall
                   ! within the bounding box of the Amazon rainforest,
                   ! then increase reactivity as inferred from observations.
                   IF ( II                   ==  6         .AND.             &
                        State_Grid%XMid(I,J) >  -82.0_f8   .AND.             &
                        State_Grid%XMid(I,J) <  -33.0_f8   .AND.             &
                        State_Grid%YMid(I,J) >  -34.0_f8   .AND.             &
                        State_Grid%YMid(I,J) <   14.0_f8 ) THEN
                      F0_K = 2.0e-01_f8
                   ENDIF
                ENDIF

                !XMWH2O = 18.e-3_f8 ! Use global H2OMW (ewl, 1/6/16)
                XMWH2O = H2OMW * 1.e-3_f8
#ifdef LUO_WETDEP
                RIXX = RIX*DIFFG(TEMPK,PRESSU(I,J),XMWH2O)/ &
                     DIFFG(TEMPK,PRESSU(I,J),XMW(K)) &
                     + 1.e+0_f8/(HSTAR3D(I,J,K)/3000.e+0_f8+100.e+0_f8*F0_K)
#else
                RIXX = RIX*DIFFG(TEMPK,PRESSU(I,J),XMWH2O)/ &
                     DIFFG(TEMPK,PRESSU(I,J),XMW(K)) &
                     + 1.e+0_f8/(HSTAR(K)/3000.e+0_f8+100.e+0_f8*F0_K)
#endif
                RLUXX = 1.e+12_f8
                IF (RLU(LDT).LT.9999.e+0_f8) &
#ifdef LUO_WETDEP
                     RLUXX = RLU(LDT)/(HSTAR3D(I,J,K)/1.0e+05_f8 + F0_K)
#else
                     RLUXX = RLU(LDT)/(HSTAR(K)/1.0e+05_f8 + F0_K)
#endif

                ! If POPs simulation, scale cuticular resistances with octanol-
                ! air partition coefficient (Koa) instead of HSTAR
                ! (clf, 1/3/2011)
                IF (IS_POPS) &
                     RLUXX = RLU(LDT)/(KOA(K)/1.0e+05_f8 + F0_K)

                !*
                !* To prevent virtually zero resistance to species with huge
                !* HSTAR, such as HNO3, a minimum value of RLUXX needs to be
                !* set. The rationality of the existence of such a minimum is
                !* demonstrated by the observed relationship between Vd(NOy-NOx)
                !* and Ustar in Munger et al.[1996];
                !* Vd(HNO3) never exceeds 2 cm s-1 in observations. The
                !* corresponding minimum resistance is 50 s m-1. This correction
                !* was introduced by J.Y. Liang on 7/9/95.
                !*
#ifdef LUO_WETDEP
                RGSX = 1.e+0_f8/(HSTAR3D(I,J,K)/1.0e+05_f8/RGSS(LDT) + &
                       F0_K/RGSO(LDT))
                RCLX = 1.e+0_f8/(HSTAR3D(I,J,K)/1.0e+05_f8/RCLS(LDT) + &
                       F0_K/RCLO(LDT))
#else
                RGSX = 1.e+0_f8/(HSTAR(K)/1.0e+05_f8/RGSS(LDT) + &
                       F0_K/RGSO(LDT))
                RCLX = 1.e+0_f8/(HSTAR(K)/1.0e+05_f8/RCLS(LDT) + &
                       F0_K/RCLO(LDT))
#endif
                !*
... etc. ...

@yantosca yantosca force-pushed the bugfix/parallel-issues branch from 1e56b5e to 4af48f8 Compare May 23, 2023 13:35
yantosca added 3 commits May 23, 2023 12:07
GeosCore/carbon_mod.F90
- In routine SOA_CHEMISTRY,
  - Add variables IFINORG, OCBIN_SUM to the !$OMP PRIVATE statement
  - Zero private variables at the top of the parallel loop
  - Comment out the ISINORG==1 and ELSE blocks, since IFINORG is always
    set to 2, so the other blocks will never get done.
- In routines BCDRY_SETTLINGBIN and OCDRY_SETTLINGBIN
  - Zero private loop variables at top of parallel loops

Signed-off-by: Bob Yantosca <[email protected]>
The variable PSO4_SO2APM2 was originally migrated to the State_Met
object.  However, State_Met gets passed as INTENT(IN) to relevant
routines in wetscav_mod.F90 and sulfate_mod.F90.  Therefore, any updates
made to State_Met%PSO4_SO2APM2 will not be applied.

For this reason, we have migrated State_Met%PSO4_SO2APM2 to State_Chm,
which is passed with INTENT(INOUT) to the relevant routines in
sulfate_mod.F90 and wetscav_mod.F90.

Signed-off-by: Bob Yantosca <[email protected]>
GeosCore/wetscav_mod.F90
- Now use State_Chm%PSO4_SO2APM2 instead of State_Met%PSO4_SO2APM2.
  This should have been done in the prior commit.

Signed-off-by: Bob Yantosca <[email protected]>
@yantosca
Copy link
Contributor Author

All GEOS-Chem Classic integration tests passed.

==============================================================================
GEOS-Chem Classic: Execution Test Results

GCClassic #8b24714 GEOS-Chem submodule update: Add CH4 emissions from hydroelectric reservoirs
GEOS-Chem #6194bd00c Set RIN=0 in the #ifdef APM block in routine WASHOUT
HEMCO     #98adbe2 Update CHANGELOG.md

Using 24 OpenMP threads
Number of execution tests: 26

Submitted as SLURM job: 55189257
==============================================================================
 
Execution tests:
------------------------------------------------------------------------------
gc_05x0625_NA_47L_merra2_CH4........................Execute Simulation....PASS
gc_05x0625_NA_47L_merra2_fullchem...................Execute Simulation....PASS
gc_4x5_47L_merra2_fullchem..........................Execute Simulation....PASS
gc_4x5_47L_merra2_fullchem_TOMAS15..................Execute Simulation....PASS
gc_4x5_47L_merra2_fullchem_TOMAS40..................Execute Simulation....PASS
gc_4x5_merra2_aerosol...............................Execute Simulation....PASS
gc_4x5_merra2_carbon................................Execute Simulation....PASS
gc_4x5_merra2_CH4...................................Execute Simulation....PASS
gc_4x5_merra2_CO2...................................Execute Simulation....PASS
gc_4x5_merra2_fullchem..............................Execute Simulation....PASS
gc_4x5_merra2_fullchem_aciduptake...................Execute Simulation....PASS
gc_4x5_merra2_fullchem_APM..........................Execute Simulation....PASS
gc_4x5_merra2_fullchem_benchmark....................Execute Simulation....PASS
gc_4x5_merra2_fullchem_complexSOA...................Execute Simulation....PASS
gc_4x5_merra2_fullchem_complexSOA_SVPOA.............Execute Simulation....PASS
gc_4x5_merra2_fullchem_LuoWd........................Execute Simulation....PASS
gc_4x5_merra2_fullchem_marinePOA....................Execute Simulation....PASS
gc_4x5_merra2_fullchem_RRTMG........................Execute Simulation....PASS
gc_4x5_merra2_Hg....................................Execute Simulation....PASS
gc_4x5_merra2_metals................................Execute Simulation....PASS
gc_4x5_merra2_POPs_BaP..............................Execute Simulation....PASS
gc_4x5_merra2_tagCH4................................Execute Simulation....PASS
gc_4x5_merra2_tagCO.................................Execute Simulation....PASS
gc_4x5_merra2_tagO3.................................Execute Simulation....PASS
gc_4x5_merra2_TransportTracers......................Execute Simulation....PASS
gc_4x5_merra2_TransportTracers_LuoWd................Execute Simulation....PASS
 
Summary of test results:
------------------------------------------------------------------------------
Execution tests passed: 26
Execution tests failed: 0
Execution tests not yet completed: 0

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%  All execution tests passed!  %%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

@yantosca
Copy link
Contributor Author

All GCHP execution tests passed:

==============================================================================
GCHP: Execution Test Results

GCClassic #8747f23 Merge PR #312 (branch 'patch-2' of github.com:sdeastham/GCHP) into dev/14.2.0
GEOS-Chem #6194bd00c Set RIN=0 in the #ifdef APM block in routine WASHOUT
HEMCO     #98adbe2 Update CHANGELOG.md

Number of execution tests: 5

Submitted as SLURM job: 55189425
==============================================================================
 
Execution tests:
------------------------------------------------------------------------------
gchp_merra2_fullchem................................Execute Simulation....PASS
gchp_merra2_fullchem_benchmark......................Execute Simulation....PASS
gchp_merra2_fullchem_RRTMG..........................Execute Simulation....PASS
gchp_merra2_tagO3...................................Execute Simulation....PASS
gchp_merra2_TransportTracers........................Execute Simulation....PASS
 
Summary of test results:
------------------------------------------------------------------------------
Execution tests passed: 5
Execution tests failed: 0
Execution tests not yet completed: 0

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%  All execution tests passed!  %%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

@yantosca
Copy link
Contributor Author

yantosca commented May 25, 2023

A new set of parallelization tests revealed that:

  • The parallelization issue in the Hg simulation is fixed
  • The parallelization issue in the TOMAS microphysics simulation persists. We will fix this after the TOMAS updates in 14.3.0.
  • The parallelization issue in the APM microphysics simulation persists.
  • The fullchem simulation with the Luo wetdep option has a parallelization issue. We are continuing to investigate.
  • The TransportTracers simulation with the Luo wetdep option does not have a parallelization issue.
==============================================================================
GEOS-Chem Classic: Parallelization Test Results

GCClassic #8b24714 GEOS-Chem submodule update: Add CH4 emissions from hydroelectric reservoirs
GEOS-Chem #6194bd00c Set RIN=0 in the #ifdef APM block in routine WASHOUT
HEMCO     #98adbe2 Update CHANGELOG.md

1st run uses 24 OpenMP threads
2nd run uses 13 OpenMP threads
Number of parallelization tests: 23

Submitted as SLURM job: 55363143
==============================================================================
 
Parallelization tests:
------------------------------------------------------------------------------
gc_05x0625_NA_47L_merra2_CH4........................Execute Simulation....PASS
gc_4x5_47L_merra2_fullchem..........................Execute Simulation....PASS
gc_4x5_47L_merra2_fullchem_TOMAS15..................Execute Simulation....FAIL
gc_4x5_merra2_aerosol...............................Execute Simulation....PASS
gc_4x5_merra2_carbon................................Execute Simulation....PASS
gc_4x5_merra2_CH4...................................Execute Simulation....PASS
gc_4x5_merra2_fullchem..............................Execute Simulation....PASS
gc_4x5_merra2_fullchem_aciduptake...................Execute Simulation....PASS
gc_4x5_merra2_fullchem_APM..........................Execute Simulation....FAIL
gc_4x5_merra2_fullchem_benchmark....................Execute Simulation....PASS
gc_4x5_merra2_fullchem_complexSOA...................Execute Simulation....PASS
gc_4x5_merra2_fullchem_complexSOA_SVPOA.............Execute Simulation....PASS
gc_4x5_merra2_fullchem_LuoWd........................Execute Simulation....FAIL
gc_4x5_merra2_fullchem_marinePOA....................Execute Simulation....PASS
gc_4x5_merra2_fullchem_RRTMG........................Execute Simulation....PASS
gc_4x5_merra2_Hg....................................Execute Simulation....PASS
gc_4x5_merra2_metals................................Execute Simulation....PASS
gc_4x5_merra2_POPs_BaP..............................Execute Simulation....PASS
gc_4x5_merra2_tagCH4................................Execute Simulation....PASS
gc_4x5_merra2_tagCO.................................Execute Simulation....PASS
gc_4x5_merra2_tagO3.................................Execute Simulation....PASS
gc_4x5_merra2_TransportTracers......................Execute Simulation....PASS
gc_4x5_merra2_TransportTracers_LuoWd................Execute Simulation....PASS
 
Summary of test results:
------------------------------------------------------------------------------
Parallelization tests passed: 20
Parallelization tests failed: 3
Parallelization tests not yet completed: 0

KPP/fullchem/fullchem_SulfurChemFuncs.F90
- Initialize the IS_QQ3D variable so that it is TRUE if wetdep or
  convection are switched on.  This prevents an parallelization error
  caused by IS_QQ3D being undefined.
- Updated comments

CHANGELOG.md
- Updated accordingly

Signed-off-by: Bob Yantosca <[email protected]>
@yantosca
Copy link
Contributor Author

We have fixed the bug causing fullchem simulations with Luo Wetdep simulations to fail parallelization tests in commit 9c149f6. In routine SET_SO2, the Is_QQ3D variable was declared but never set to a value. At present only the APM and TOMAS microphysics simulations fail the parallelization tests:

==============================================================================
GEOS-Chem Classic: Parallelization Test Results

GCClassic #8b24714 GEOS-Chem submodule update: Add CH4 emissions from hydroelectric reservoirs
GEOS-Chem #9c149f685 Initialize IS_QQ3D variable in fullchem_SulfurChemFuncs.F90
HEMCO     #98adbe2 Update CHANGELOG.md

1st run uses 24 OpenMP threads
2nd run uses 13 OpenMP threads
Number of parallelization tests: 23

Submitted as SLURM job: 55989384
==============================================================================
 
Parallelization tests:
------------------------------------------------------------------------------
gc_05x0625_NA_47L_merra2_CH4........................Execute Simulation....PASS
gc_4x5_47L_merra2_fullchem..........................Execute Simulation....PASS
gc_4x5_47L_merra2_fullchem_TOMAS15..................Execute Simulation....FAIL
gc_4x5_merra2_aerosol...............................Execute Simulation....PASS
gc_4x5_merra2_carbon................................Execute Simulation....PASS
gc_4x5_merra2_CH4...................................Execute Simulation....PASS
gc_4x5_merra2_fullchem..............................Execute Simulation....PASS
gc_4x5_merra2_fullchem_aciduptake...................Execute Simulation....PASS
gc_4x5_merra2_fullchem_APM..........................Execute Simulation....FAIL
gc_4x5_merra2_fullchem_benchmark....................Execute Simulation....PASS
gc_4x5_merra2_fullchem_complexSOA...................Execute Simulation....PASS
gc_4x5_merra2_fullchem_complexSOA_SVPOA.............Execute Simulation....PASS
gc_4x5_merra2_fullchem_LuoWd........................Execute Simulation....PASS
gc_4x5_merra2_fullchem_marinePOA....................Execute Simulation....PASS
gc_4x5_merra2_fullchem_RRTMG........................Execute Simulation....PASS
gc_4x5_merra2_Hg....................................Execute Simulation....PASS
gc_4x5_merra2_metals................................Execute Simulation....PASS
gc_4x5_merra2_POPs_BaP..............................Execute Simulation....PASS
gc_4x5_merra2_tagCH4................................Execute Simulation....PASS
gc_4x5_merra2_tagCO.................................Execute Simulation....PASS
gc_4x5_merra2_tagO3.................................Execute Simulation....PASS
gc_4x5_merra2_TransportTracers......................Execute Simulation....PASS
gc_4x5_merra2_TransportTracers_LuoWd................Execute Simulation....PASS
 
Summary of test results:
------------------------------------------------------------------------------
Parallelization tests passed: 21
Parallelization tests failed: 2
Parallelization tests not yet completed: 0

@yantosca
Copy link
Contributor Author

@msulprizio: I have opened this PR as ready for review. I have fixed the parallelization issues in both GEOS-Chem Classic Hg and fullchem_LuoWd simulations. The parallelization issues in APM and TOMAS still exist. Nevertheless, we should not let that hold up development of 14.2.1.

@yantosca yantosca added this to the 14.2.1 milestone Jul 11, 2023
@yantosca
Copy link
Contributor Author

After merging on to of GEOS-Chem PR #1808 and HEMCO PR #218 all GEOS-Chem Classic integration tests (with the exception of HEMCO) passed.

==============================================================================
GEOS-Chem Classic: Execution Test Results

GCClassic #570a173 GEOS-Chem submod update: Merge PR #1808 (SatDiagn diagnostic fixes)
GEOS-Chem #dc0f2e505 Merge PR #1682 (Fix GEOS-Chem Classic parallelization errors)
HEMCO     #bb3b465 Merge PR #218 (Remove redundant code in hco_extlist_mod.F90)

Using 24 OpenMP threads
Number of execution tests: 26

Submitted as SLURM job: 62047333
==============================================================================
 
Execution tests:
------------------------------------------------------------------------------
gc_05x0625_NA_47L_merra2_CH4........................Execute Simulation....PASS
gc_05x0625_NA_47L_merra2_fullchem...................Execute Simulation....PASS
gc_4x5_47L_merra2_fullchem..........................Execute Simulation....PASS
gc_4x5_47L_merra2_fullchem_TOMAS15..................Execute Simulation....FAIL
gc_4x5_47L_merra2_fullchem_TOMAS40..................Execute Simulation....FAIL
gc_4x5_merra2_aerosol...............................Execute Simulation....PASS
gc_4x5_merra2_carbon................................Execute Simulation....PASS
gc_4x5_merra2_CH4...................................Execute Simulation....PASS
gc_4x5_merra2_CO2...................................Execute Simulation....PASS
gc_4x5_merra2_fullchem..............................Execute Simulation....PASS
gc_4x5_merra2_fullchem_aciduptake...................Execute Simulation....PASS
gc_4x5_merra2_fullchem_APM..........................Execute Simulation....PASS
gc_4x5_merra2_fullchem_benchmark....................Execute Simulation....PASS
gc_4x5_merra2_fullchem_complexSOA...................Execute Simulation....PASS
gc_4x5_merra2_fullchem_complexSOA_SVPOA.............Execute Simulation....PASS
gc_4x5_merra2_fullchem_LuoWd........................Execute Simulation....PASS
gc_4x5_merra2_fullchem_marinePOA....................Execute Simulation....PASS
gc_4x5_merra2_fullchem_RRTMG........................Execute Simulation....PASS
gc_4x5_merra2_Hg....................................Execute Simulation....PASS
gc_4x5_merra2_metals................................Execute Simulation....PASS
gc_4x5_merra2_POPs_BaP..............................Execute Simulation....PASS
gc_4x5_merra2_tagCH4................................Execute Simulation....PASS
gc_4x5_merra2_tagCO.................................Execute Simulation....PASS
gc_4x5_merra2_tagO3.................................Execute Simulation....PASS
gc_4x5_merra2_TransportTracers......................Execute Simulation....PASS
gc_4x5_merra2_TransportTracers_LuoWd................Execute Simulation....PASS
 
Summary of test results:
------------------------------------------------------------------------------
Execution tests passed: 24
Execution tests failed: 2
Execution tests not yet completed: 0

Also, note that most integration tests are identical to the reference version except for the ones with noted parallelization issues.

Checking gc_05x0625_NA_47L_merra2_CH4
   -> No differences in OutputDir
   -> No differences in Restarts

Checking gc_05x0625_NA_47L_merra2_fullchem
   -> No differences in OutputDir
   -> No differences in Restarts

Checking gc_4x5_47L_merra2_fullchem
   -> No differences in OutputDir
   -> No differences in Restarts

Checking gc_4x5_47L_merra2_fullchem_TOMAS15
   -> No differences in OutputDir
   -> No differences in Restarts

Checking gc_4x5_47L_merra2_fullchem_TOMAS40
   -> No differences in OutputDir
   -> No differences in Restarts

Checking gc_4x5_merra2_aerosol
   -> No differences in OutputDir
   -> No differences in Restarts

Checking gc_4x5_merra2_carbon
   -> No differences in OutputDir
   -> No differences in Restarts

Checking gc_4x5_merra2_CH4
   -> No differences in OutputDir
   -> No differences in Restarts

Checking gc_4x5_merra2_CO2
   -> No differences in OutputDir
   -> No differences in Restarts

Checking gc_4x5_merra2_fullchem
   -> No differences in OutputDir
   -> No differences in Restarts

Checking gc_4x5_merra2_fullchem_aciduptake
   -> No differences in OutputDir
   -> No differences in Restarts

Checking gc_4x5_merra2_fullchem_APM
   -> 3 differences found in OutputDir
      * GCC_14.2.1_r8/rundirs/gc_4x5_merra2_fullchem_APM/OutputDir/GEOSChem.Metrics.20190701_0000z.nc4 
        GCC_14.2.1_r9/rundirs/gc_4x5_merra2_fullchem_APM/OutputDir/GEOSChem.Metrics.20190701_0000z.nc4 
      * GCC_14.2.1_r8/rundirs/gc_4x5_merra2_fullchem_APM/OutputDir/GEOSChem.SpeciesConc.20190701_0000z.nc4 
        GCC_14.2.1_r9/rundirs/gc_4x5_merra2_fullchem_APM/OutputDir/GEOSChem.SpeciesConc.20190701_0000z.nc4 
      * GCC_14.2.1_r8/rundirs/gc_4x5_merra2_fullchem_APM/OutputDir/HEMCO_diagnostics.201907010000.nc 
        GCC_14.2.1_r9/rundirs/gc_4x5_merra2_fullchem_APM/OutputDir/HEMCO_diagnostics.201907010000.nc 
   -> 1 difference found in Restarts
      * GCC_14.2.1_r8/rundirs/gc_4x5_merra2_fullchem_APM/Restarts/GEOSChem.Restart.20190701_0100z.nc4 
        GCC_14.2.1_r9/rundirs/gc_4x5_merra2_fullchem_APM/Restarts/GEOSChem.Restart.20190701_0100z.nc4 

Checking gc_4x5_merra2_fullchem_benchmark
   -> No differences in OutputDir
   -> No differences in Restarts

Checking gc_4x5_merra2_fullchem_complexSOA
   -> No differences in OutputDir
   -> No differences in Restarts

Checking gc_4x5_merra2_fullchem_complexSOA_SVPOA
   -> No differences in OutputDir
   -> No differences in Restarts

Checking gc_4x5_merra2_fullchem_LuoWd
   -> 3 differences found in OutputDir
      * GCC_14.2.1_r8/rundirs/gc_4x5_merra2_fullchem_LuoWd/OutputDir/GEOSChem.Metrics.20190701_0000z.nc4 
        GCC_14.2.1_r9/rundirs/gc_4x5_merra2_fullchem_LuoWd/OutputDir/GEOSChem.Metrics.20190701_0000z.nc4 
      * GCC_14.2.1_r8/rundirs/gc_4x5_merra2_fullchem_LuoWd/OutputDir/GEOSChem.SpeciesConc.20190701_0000z.nc4 
        GCC_14.2.1_r9/rundirs/gc_4x5_merra2_fullchem_LuoWd/OutputDir/GEOSChem.SpeciesConc.20190701_0000z.nc4 
      * GCC_14.2.1_r8/rundirs/gc_4x5_merra2_fullchem_LuoWd/OutputDir/HEMCO_diagnostics.201907010000.nc 
        GCC_14.2.1_r9/rundirs/gc_4x5_merra2_fullchem_LuoWd/OutputDir/HEMCO_diagnostics.201907010000.nc 
   -> 1 difference found in Restarts
      * GCC_14.2.1_r8/rundirs/gc_4x5_merra2_fullchem_LuoWd/Restarts/GEOSChem.Restart.20190701_0100z.nc4 
        GCC_14.2.1_r9/rundirs/gc_4x5_merra2_fullchem_LuoWd/Restarts/GEOSChem.Restart.20190701_0100z.nc4 

Checking gc_4x5_merra2_fullchem_marinePOA
   -> No differences in OutputDir
   -> No differences in Restarts

Checking gc_4x5_merra2_fullchem_RRTMG
   -> 1 difference found in OutputDir
      * GCC_14.2.1_r8/rundirs/gc_4x5_merra2_fullchem_RRTMG/OutputDir/GEOSChem.RRTMG.20190701_0000z.nc4 
        GCC_14.2.1_r9/rundirs/gc_4x5_merra2_fullchem_RRTMG/OutputDir/GEOSChem.RRTMG.20190701_0000z.nc4 
   -> No differences in Restarts

Checking gc_4x5_merra2_Hg
   -> 1 difference found in OutputDir
      * GCC_14.2.1_r8/rundirs/gc_4x5_merra2_Hg/OutputDir/GEOSChem.SpeciesConc.20190101_0000z.nc4 
        GCC_14.2.1_r9/rundirs/gc_4x5_merra2_Hg/OutputDir/GEOSChem.SpeciesConc.20190101_0000z.nc4 
   -> 1 difference found in Restarts
      * GCC_14.2.1_r8/rundirs/gc_4x5_merra2_Hg/Restarts/GEOSChem.Restart.20190101_0100z.nc4 
        GCC_14.2.1_r9/rundirs/gc_4x5_merra2_Hg/Restarts/GEOSChem.Restart.20190101_0100z.nc4 

Checking gc_4x5_merra2_metals
   -> No differences in OutputDir
   -> No differences in Restarts

Checking gc_4x5_merra2_POPs_BaP
   -> No differences in OutputDir
   -> No differences in Restarts

Checking gc_4x5_merra2_tagCH4
   -> No differences in OutputDir
   -> No differences in Restarts

Checking gc_4x5_merra2_tagCO
   -> No differences in OutputDir
   -> No differences in Restarts

Checking gc_4x5_merra2_tagO3
   -> No differences in OutputDir
   -> No differences in Restarts

Checking gc_4x5_merra2_TransportTracers
   -> No differences in OutputDir
   -> No differences in Restarts

Checking gc_4x5_merra2_TransportTracers_LuoWd
   -> No differences in OutputDir
   -> No differences in Restarts

We will do a separate parallelization test to confirm if the Hg an LuoWd parallelization issues are resolved.

Also, there seems to be another parallelization issue in the RRTMG diagnostic outputs:

Checking gc_4x5_merra2_fullchem_RRTMG
   -> 1 difference found in OutputDir
      * GCC_14.2.1_r8/rundirs/gc_4x5_merra2_fullchem_RRTMG/OutputDir/GEOSChem.RRTMG.20190701_0000z.nc4 
        GCC_14.2.1_r9/rundirs/gc_4x5_merra2_fullchem_RRTMG/OutputDir/GEOSChem.RRTMG.20190701_0000z.nc4 
   -> No differences in Restarts

But we will open a new issue and PR to address this.

@yantosca
Copy link
Contributor Author

After merging on top of GEOS-Chem PR #1808 and HEMCO PR #218 all GCHP integration tests passed:

==============================================================================
GCHP: Execution Test Results

GCClassic #d023bc5 GEOS-Chem submod update: Merge PR #1808 (SatDiagn diagnostic fixes)
GEOS-Chem #dc0f2e505 Merge PR #1682 (Fix GEOS-Chem Classic parallelization errors)
HEMCO     #24cf7e0 PR #215 post-merge fixes: Update CHANGELOG.md & version numbers

Number of execution tests: 5

Submitted as SLURM job: 62064849
==============================================================================
 
Execution tests:
------------------------------------------------------------------------------
gchp_merra2_fullchem................................Execute Simulation....PASS
gchp_merra2_fullchem_benchmark......................Execute Simulation....PASS
gchp_merra2_fullchem_RRTMG..........................Execute Simulation....PASS
gchp_merra2_tagO3...................................Execute Simulation....PASS
gchp_merra2_TransportTracers........................Execute Simulation....PASS
 
Summary of test results:
------------------------------------------------------------------------------
Execution tests passed: 5
Execution tests failed: 0
Execution tests not yet completed: 0

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%  All execution tests passed!  %%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Also, all integration tests were zero-diff w/r/t GEOS-Chem PR #1808 and HEMCO PR #218:

Checking gchp_merra2_fullchem
   -> No differences in OutputDir
   -> No differences in Restarts

Checking gchp_merra2_fullchem_benchmark
   -> No differences in OutputDir
   -> No differences in Restarts

Checking gchp_merra2_fullchem_RRTMG
   -> No differences in OutputDir
   -> No differences in Restarts

Checking gchp_merra2_tagO3
   -> No differences in OutputDir
   -> No differences in Restarts

Checking gchp_merra2_TransportTracers
   -> No differences in OutputDir
   -> No differences in Restarts

@yantosca
Copy link
Contributor Author

GEOS-Chem parallelization tests for Hg and fullchem_LuoWd now pass. However, the fullchem_benchmark simulation did not pass the parallel test. We are investigating.

==============================================================================
GEOS-Chem Classic: Parallelization Test Results

GCClassic #570a173 GEOS-Chem submod update: Merge PR #1808 (SatDiagn diagnostic fixes)
GEOS-Chem #dc0f2e505 Merge PR #1682 (Fix GEOS-Chem Classic parallelization errors)
HEMCO     #bb3b465 Merge PR #218 (Remove redundant code in hco_extlist_mod.F90)

1st run uses 24 OpenMP threads
2nd run uses 13 OpenMP threads
Number of parallelization tests: 23

Submitted as SLURM job: 62059002
==============================================================================
 
Parallelization tests:
------------------------------------------------------------------------------
gc_05x0625_NA_47L_merra2_CH4........................Execute Simulation....PASS
gc_4x5_47L_merra2_fullchem..........................Execute Simulation....PASS
gc_4x5_47L_merra2_fullchem_TOMAS15..................Execute Simulation....FAIL
gc_4x5_merra2_aerosol...............................Execute Simulation....PASS
gc_4x5_merra2_carbon................................Execute Simulation....PASS
gc_4x5_merra2_CH4...................................Execute Simulation....PASS
gc_4x5_merra2_fullchem..............................Execute Simulation....PASS
gc_4x5_merra2_fullchem_aciduptake...................Execute Simulation....PASS
gc_4x5_merra2_fullchem_APM..........................Execute Simulation....FAIL
gc_4x5_merra2_fullchem_benchmark....................Execute Simulation....FAIL
gc_4x5_merra2_fullchem_complexSOA...................Execute Simulation....PASS
gc_4x5_merra2_fullchem_complexSOA_SVPOA.............Execute Simulation....PASS
gc_4x5_merra2_fullchem_LuoWd........................Execute Simulation....PASS
gc_4x5_merra2_fullchem_marinePOA....................Execute Simulation....PASS
gc_4x5_merra2_fullchem_RRTMG........................Execute Simulation....PASS
gc_4x5_merra2_Hg....................................Execute Simulation....PASS
gc_4x5_merra2_metals................................Execute Simulation....PASS
gc_4x5_merra2_POPs_BaP..............................Execute Simulation....PASS
gc_4x5_merra2_tagCH4................................Execute Simulation....PASS
gc_4x5_merra2_tagCO.................................Execute Simulation....PASS
gc_4x5_merra2_tagO3.................................Execute Simulation....PASS
gc_4x5_merra2_TransportTracers......................Execute Simulation....PASS
gc_4x5_merra2_TransportTracers_LuoWd................Execute Simulation....PASS
 
Summary of test results:
------------------------------------------------------------------------------
Parallelization tests passed: 20
Parallelization tests failed: 3
Parallelization tests not yet completed: 0

@yantosca
Copy link
Contributor Author

Upon further investigation the fullchem_benchmark parallelization failure was caused by a SLURM issue that caused the 24-core job not to run:

Reset simulation start date in cap_restart if using GCHP
Now using 24
srun: error: Unable to create step for job 62059002: Memory required by task is not available
Reset simulation start date in cap_restart if using GCHP
Now using 13

I ran an in-directory parallelization test and confirmed that the fullchem_benchmark simulation has no differences when using 13 or 24 cores.

Parallel test result: PASS
Wed Jul 12 14:04:08 EDT 2023

So I am confident that all parallelization tests except for APM and TOMAS are all good. We should be able to merge now.

@yantosca
Copy link
Contributor Author

@msulprizio: now ready for review

Copy link
Contributor

@msulprizio msulprizio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes look good to merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: Bug Something isn't working topic: Performance Related to GEOS-Chem speed, memory, or parallelization
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG/ISSUE] Parallelization issues in GEOS-Chem Classic simulations
2 participants