Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI Refactoring and STALLED case detection #2488

Conversation

TerrenceMcGuinness-NOAA
Copy link
Collaborator

@TerrenceMcGuinness-NOAA TerrenceMcGuinness-NOAA commented Apr 15, 2024

Description

These updates to the CI Framework does some bash refactoring and adds python tools in order to effectively create the feature for detecting when a CI Case has an experiment that is in a state where it can not advance such as missing a requisite dependency:

  • Added separate python script for checking the status of Rocoto driven cases and integrated its use into the bash CI drivers and Jenkins having state logic done in one place.

  • Added log publishing python utilities into the bash CI drivers as part of refactoring and consolidations of functionalities

  • Update Jenkins behavior while incorporating the above python codes for Rocoto state checking:

    • polling on PR works with one update away from including multiple labels as well
    • Label updates to FAIL as soon as first case fails and continues other cases until completes or is killed by user

    Resolves Feature BASH CI detects when a dependacy isn't being met #2008

Type of change

  • New feature (adds functionality to detect stalled CI cases)
  • Maintenance (code refactor, clean-up, new CI functionalities and behavior)

Change characteristics

  • Is this a breaking change (a change in existing functionality)? NO
  • Does this change require a documentation update? NO

How has this been tested?

BASH cases tested for stall and log failing reporting in dev bash cron
Jenkins tested in development mulit-branch project with fail tests and full-end-to end success path

Checklist

  • Any dependent changes have been merged and published
  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • My changes generate no new warnings
  • New and existing tests pass with my changes
  • I have made corresponding changes to the documentation if necessary

@TerrenceMcGuinness-NOAA TerrenceMcGuinness-NOAA added CI-Orion-Ready **CM use only** PR is ready for CI testing on Orion and removed CI-Orion-Failed **Bot use only** CI testing on Orion for this PR has failed labels Apr 19, 2024
@emcbot emcbot added CI-Orion-Building **Bot use only** CI testing is cloning/building on Orion CI-Orion-Running **Bot use only** CI testing on Orion for this PR is in-progress CI-Orion-Passed **Bot use only** CI testing on Orion for this PR has completed successfully and removed CI-Orion-Ready **CM use only** PR is ready for CI testing on Orion CI-Orion-Building **Bot use only** CI testing is cloning/building on Orion CI-Orion-Running **Bot use only** CI testing on Orion for this PR is in-progress labels Apr 19, 2024
@emcbot
Copy link

emcbot commented Apr 19, 2024

CI Passed Orion at
Built and ran in directory /work2/noaa/stmp/CI/ORION/2488

@WalterKolczynski-NOAA WalterKolczynski-NOAA added the CI-Wcoss2-Ready **CM use only** PR is ready for CI testing on WCOSS label Apr 19, 2024
@emcbot emcbot added CI-Wcoss2-Building **Bot use only** CI testing is cloning/building on WCOSS and removed CI-Wcoss2-Ready **CM use only** PR is ready for CI testing on WCOSS labels Apr 19, 2024
@emcbot
Copy link

emcbot commented Apr 19, 2024

CI Update on Wcoss2 at 04/19/24 09:21:13 PM
============================================
Cloning and Building global-workflow PR: 2488
with PID: 70662 on host: dlogin08

@emcbot emcbot added CI-Wcoss2-Running **Bot use only** CI testing on WCOSS for this PR is in-progress and removed CI-Wcoss2-Building **Bot use only** CI testing is cloning/building on WCOSS labels Apr 19, 2024
@emcbot
Copy link

emcbot commented Apr 19, 2024

Automated global-workflow Testing Results:

Machine: Wcoss2
Start: Fri Apr 19 21:25:06 UTC 2024 on dlogin08
---------------------------------------------------
Build: Completed at 04/19/24 09:36:26 PM
Case setup: Completed for experiment C48_ATM_07617928
Case setup: Skipped for experiment C48mx500_3DVarAOWCDA_07617928
Case setup: Skipped for experiment C48_S2SWA_gefs_07617928
Case setup: Completed for experiment C48_S2SW_07617928
Case setup: Completed for experiment C96_atm3DVar_07617928
Case setup: Skipped for experiment C96_atmaerosnowDA_07617928
Case setup: Completed for experiment C96C48_hybatmDA_07617928
Case setup: Skipped for experiment C96C48_ufs_hybatmDA_07617928

@emcbot
Copy link

emcbot commented Apr 19, 2024

Experiment C48_ATM_07617928 SUCCESS on Wcoss2 at 04/19/24 10:52:10 PM

@emcbot
Copy link

emcbot commented Apr 19, 2024

Experiment C96C48_hybatmDA_07617928 SUCCESS on Wcoss2 at 04/19/24 11:52:24 PM

@emcbot
Copy link

emcbot commented Apr 19, 2024

Experiment C96_atm3DVar_07617928 SUCCESS on Wcoss2 at 04/19/24 11:56:11 PM

@emcbot
Copy link

emcbot commented Apr 20, 2024

Experiment C48_S2SW_07617928 SUCCESS on Wcoss2 at 04/20/24 12:08:14 AM

@emcbot emcbot added CI-Wcoss2-Passed **Bot use only** CI testing on WCOSS for this PR has completed successfully and removed CI-Wcoss2-Running **Bot use only** CI testing on WCOSS for this PR is in-progress labels Apr 20, 2024
@emcbot
Copy link

emcbot commented Apr 20, 2024

All CI Test Cases Passed on Wcoss2:

Experiment C48_ATM_07617928 *** SUCCESS *** at 04/19/24 10:52:10 PM
Experiment C96C48_hybatmDA_07617928 *** SUCCESS *** at 04/19/24 11:52:24 PM
Experiment C96_atm3DVar_07617928 *** SUCCESS *** at 04/19/24 11:56:11 PM
Experiment C48_S2SW_07617928 *** SUCCESS *** at 04/20/24 12:08:14 AM

@WalterKolczynski-NOAA WalterKolczynski-NOAA merged commit 1cfc8e5 into NOAA-EMC:develop Apr 20, 2024
9 of 10 checks passed
danholdaway added a commit to danholdaway/global-workflow that referenced this pull request Apr 23, 2024
* upstream/develop:
  Add CCPP suite and FASTER option to UFS build (NOAA-EMC#2521)
  New "atmanlfv3inc" Rocoto job (NOAA-EMC#2420)
  Hotfix to disable STALLED in CI as an error (NOAA-EMC#2523)
  Add restart on failure capability for the forecast executable (NOAA-EMC#2510)
  Update parm/transfer list files to match vetted GFSv16 set (NOAA-EMC#2517)
  Update gdas_gsibec_ver to 20240416 (NOAA-EMC#2497)
  Adding more cycles to gempak script gfs_meta_sa2.sh (NOAA-EMC#2518)
  Update gsi_enkf.sh hash to 457510c (NOAA-EMC#2514)
  Enable using the FV3_global_nest_v1 CCPP suite (NOAA-EMC#2512)
  CI Refactoring and STALLED case detection (NOAA-EMC#2488)
  Add C768 and C1152 S2SW test cases (NOAA-EMC#2509)
  Fix paths for refactored prepocnobs task (NOAA-EMC#2504)
@TerrenceMcGuinness-NOAA TerrenceMcGuinness-NOAA deleted the feature/check_stalled_cases branch April 30, 2024 16:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI/CD Issue related to CI/CD CI-Hera-Passed **Bot use only** CI testing on Hera for this PR has completed successfully CI-Orion-Passed **Bot use only** CI testing on Orion for this PR has completed successfully CI-Wcoss2-Passed **Bot use only** CI testing on WCOSS for this PR has completed successfully
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature BASH CI detects when a dependacy isn't being met
3 participants