-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade subcomponents and the global workflow to use spack-stack #1868
Comments
Opened a PR to upgrade the GSI to spack-stack NOAA-EMC/GSI#624. |
MET/METplus will be an issue with spack-stack. Currently, the builds available are 10.1.1/4.1.1, respectively, while I believe the verif-global system requires 9.1.x/3.1.x. I think I will need to create a separate module file for each system that loads the appropriate hpc-stack builds of these modules, though reading through #1756 and #1342, it seems this package needs an update anyway. Am I correct about the MET/METplus verions @malloryprow? That said, looking through the spack repo, met/9.1.3 and metplus/3.1.1 are included and could at least in theory be installed. @AlexanderRichert-NOAA Would it be possible to install these under spack-stack 1.4.1 and/or 1.5.0 alongside the existing 10.1.1/4.1.1 builds? |
It be MET v9.1.3 and METplus v3.1.1. |
@DavidHuber-NOAA yeah that shouldn't be a problem. Can you file an issue under spack-stack so we can track things there? |
@AlexanderRichert-NOAA installed a test environment for spack-stack on Hera:/scratch1/NCEPDEV/nems/Alexander.Richert/spack-stack-1.4.1-gw that was used to build UFS_Utils, GFS-Utils, GSI, GSI-monitor, and GSI-utils. I then removed all of the module use/load statements from the jobs/rocoto/* scripts and updated the global-workflow modulefiles on Hera to point to Alex's build and ran a test C96/C48 case for 1.5 cycles and compared this against develop. The half-cycle The first location where differences are obvious are during the GDAS and GFS analyses on the first full cycle. Initial radiance penalties differ at the 12th decimal place. This is consistent with the regression test results seen previously in NOAA-EMC/GSI#589 that were tracked to CRTM optimization differences between spack-stack (compiled with Log files were then checked for errors/warnings and compared against develop. All warnings/errors were identical with the exception of the half cycle gdasarch job, which reported different errors when attempting to access HPSS. HPSS was down for maintenance yesterday, which explains these differences. I will now move on to a C384/C192 test case. |
Also, I compared runtimes between develop and spack-stack. Almost every spack-stack job runs a little slower than the develop version. I believe this can likely be tracked down to optimization flags used in the libraries. Results shown below.
|
Two C384/C192 test cases were run, one with 2 members out 1.5 cycles and another with 80 members out 3.5 cycles. A control was also run with 2 members out 1.5 cycles. For the 2-member test, all jobs completed successfully. Log files were compared against the control and no new errors/warnings were generated. Additionally, file counts between the test and control archived products were identical. For the 3.5-cycle test, one job initially failed (enkfgdaseupd on the 2nd full cycle) but then completed without intervention on the second attempt. enkf.x crashed when attempting to create an empty increment netCDF file at gridio_gfs.f90 with a segmentation fault ( Timing differences seemed to improve for C384, with the analyses and forecasts coming in about the same or a little faster for the spack-stack case (see below). The post jobs are quite a bit slower, but they are also relatively cheap jobs to begin with. This may improve if we switch to spack-stack/1.5.0 which compiled many libraries with
|
I successfully ran 2 cycles on Hera, with the exception of the metplus and awips jobs. For awips, I opened issue NOAA-EMC/gfs-utils#33. For metplus, the initial failures were due to the undefined variable METPLUS_PATH and an old path for HOMEMET. After corrected these to point to the spack-stack installs of MET and METplus, the jobs failed with some cryptic python errors:
And continues from there. This suggests to me that this version of metplus (3.1.1) is not compatible with the spack-stack Python version (3.10.8) and/or one of the Python packages installed with spack-stack. So, unfortunately, I don't think I will be able to get verif-global to work with spack-stack unless @malloryprow has an idea on how to fix this. In the meantime, I am going to set `DO_METP=NO" in config.base.emc.dyn by default and users will then need to run verif-global offline. |
On first look, I can't say I know how to fix it. Not sure if there may be some difference with the python versions? It looks like this is using python-3.10.8? I know the python version I'm using in EMC_verif-global on WCOSS2 is 3.8.6. |
This points to the head of develop on NOAA-EMC which supports spack-stack.
@WalterKolczynski-NOAA It seems the required METplus version for EMC_verif-global is not compatible with the spack-stack python version. On WCOSS2, EMC_verif-global is using 3.8.6 with no issues. It look like spack-stack is using 3.10.8. |
Okay, we're going to need to figure out the MET issue, likely in a separate issue. I did a little poking around, but we're going to need more info before figuring out how to precede:
The first step after collecting this info may be asking the MET team if they can point us in the direction of what may be wrong. I'm sure if there are changes needed in MET, they won't want to be working on something two-versions old. That said, I don't know that python has had any changes that would break PosixPath or imports. I suspect some configuration error. At any rate, if the MET and MET+ in the stack aren't working, they shouldn't stay in the stack broken. Whether that involves fixing them and/or replacing them with different versions is the question. |
The issues is coming from METplus. EMC_verif-global is using these older versions of MET and METplus; honestly I'd call EMC_verif-global pretty much frozen code with EVS nearing code hand off to NCO. I don't think it makes sense to upgrade them to newer versions as that would be a big overhaul. |
I'll note it looks like the latest METplus version is using python 3.10.4. |
I don't know anything about spack-stack but maybe it is a configuration error like Walter mentioned? Looking at this part Traceback (most recent call last):
The first line has /scratch1/NCEPDEV/nems/role.epic/spack-stack/spack-stack-1.5.1/envs/unified-env |
@malloryprow The gsi-addon environment is built on top of the unified-env environment, so when the gsi-addon environment is loaded, you get access to both. There are only a few packages in the gsi-addon environment, primarily met/9.1.3, metplus/3.1.1, bufr/11.7.0, and gsi-ncdiag/1.1.2. There could still be a configuration issue elsewhere, though. |
Ah got it, okay! Thanks @DavidHuber-NOAA! |
I did a test of METplus (outside of EMC_verif-global) with /scratch1/NCEPDEV/nems/role.epic/spack-stack/spack-stack-1.5.1/envs/gsi-addon/install/intel/2021.5.0/metplus-3.1.1-uu37v6c vs. /scratch1/NCEPDEV/nems/role.epic/spack-stack/spack-stack-1.5.1/envs/unified-env/install/intel/2021.5.0/metplus-5.1.0-n3vysib. /scratch1/NCEPDEV/nems/role.epic/spack-stack/spack-stack-1.5.1/envs/gsi-addon/install/intel/2021.5.0/metplus-3.1.1-uu37v6c threw similar errors as described above. /scratch1/NCEPDEV/nems/role.epic/spack-stack/spack-stack-1.5.1/envs/unified-env/install/intel/2021.5.0/metplus-5.1.0-n3vysib worked though. |
I opened issue #2091 to continue tracking the spack-stack/metplus issue. |
Great! Let me know if there is anything I can help with. My development for EVS is all done. |
@DavidHuber-NOAA , GDASApp PR #774 adds a Hercules build capability to GDASApp (UFS, aka JEDI, DA). GDASApp has been built on Hercules. All Realize that this issue is spack-stack specific, not hercules specific. GDASApp |
Sounds good, thanks for the news @RussTreadon-NOAA! |
During WCOSS2 testing, it was apparent that the hacks for efcs, fcst, and post jobs are still needed for WCOSS2 at least until gsi-ncdiag/1.1.2 is installed in |
What new functionality do you need?
Migrate to spack-stack libraries for all subcomponents and the global workflow modules.
What are the requirements for the new functionality?
All module systems/components use spack-stack libraries. This has already been completed for the UFS, UFS_utils, GDAS App, and UPP repositories. The repos remaining are the GSI (NOAA-EMC/GSI#589), GSI-Utils (NOAA-EMC/GSI-utils#18), GSI-Monitor (NOAA-EMC/GSI-Monitor#98), gfs-utils, and verif-global.
Acceptance Criteria
The global workflow is able to run forecast-only and cycled experiments at all resolutions referencing spack-stack libraries on at least Hera and Orion.
Suggest a solution (optional)
No response
The text was updated successfully, but these errors were encountered: