Adding logic to poll deployment health instead of static wait #1240

vatsrahul1001 · 2023-07-12T08:37:35Z

Currently, we have a static sleep time of 1800 seconds for the deployment to reach a healthy state. However, instead of relying on this fixed duration, we can implement a polling mechanism to check the health of the deployment. If the deployment is found to be unhealthy, we can Exit 1. This improvement will address the following issues:

We observed today that even when the deployment was unhealthy, our workflow still succeeded. This PR will prevent such occurrences in the future.
By implementing polling, we can reduce the overall execution time of the workflow. In most cases, deployments become healthy within 5-10 minutes, and with polling, we can optimize this process.

codecov · 2023-07-12T08:42:19Z

Codecov Report

Patch and project coverage have no change.

Comparison is base (d50a8b6) 98.58% compared to head (189c2d4) 98.58%.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1240   +/-   ##
=======================================
  Coverage   98.58%   98.58%           
=======================================
  Files          90       90           
  Lines        5389     5389           
=======================================
  Hits         5313     5313           
  Misses         76       76

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

.github/workflows/deploy-integration-tests.yaml

.github/workflows/trigger-dag-reuse-wf.yaml

.github/workflows/deploy-integration-tests.yaml

.github/workflows/trigger-dag-reuse-wf.yaml

pankajkoti · 2023-07-12T14:14:02Z

.github/workflows/trigger-dag-reuse-wf.yaml

@@ -45,30 +45,25 @@ jobs:
        run: |
          astro_core_api="https://api.astronomer.io/v1alpha1/organizations/${{secrets.organization_id }}/\
          deployments"
-          tries=60
+          tries=15


It might make sense to wait for an hour maybe in some scenarios where it might take longer. So tries=30?

or the previous one would have been fine too. 60 * 60

pankajkoti · 2023-07-12T14:21:11Z

.github/workflows/trigger-dag-reuse-wf.yaml

+          health_flag=false
+
+          while [[ $tries -gt 0 && $health_flag == false ]]; do
+              sleep 120


I think would be good to place this log at the end of the while loop, just to see logs saying Deployment status is different than Healthy and it's waiting for deployment to be healthy in case it becomes healthy in <2 mins.

pankajkoti · 2023-07-12T15:25:28Z

We can address the open comments later if needed. I think we're good to merge for now since we tested the workflow.

adding deployment health check

d471156

vatsrahul1001 requested review from phanikumv, pankajastro, pankajkoti, Lee-W, sunank200 and utkarsharma2 as code owners July 12, 2023 08:37

pankajkoti reviewed Jul 12, 2023

View reviewed changes

.github/workflows/deploy-integration-tests.yaml Outdated Show resolved Hide resolved

passing var for KE as well

b23c2cc

vatsrahul1001 requested a review from pankajkoti July 12, 2023 08:46

pankajkoti reviewed Jul 12, 2023

View reviewed changes

.github/workflows/trigger-dag-reuse-wf.yaml Show resolved Hide resolved

Lee-W reviewed Jul 12, 2023

View reviewed changes

.github/workflows/deploy-integration-tests.yaml Outdated Show resolved Hide resolved

.github/workflows/trigger-dag-reuse-wf.yaml Show resolved Hide resolved

fixing pre-commit

7a5d1be

vatsrahul1001 requested a review from Lee-W July 12, 2023 09:20

vatsrahul1001 added 4 commits July 12, 2023 16:41

fixing var name

9dbcdfb

fixing invalid syntax

5f39bd9

formatting fix

0fc1ed0

update polling time to 1800

189c2d4

pankajkoti reviewed Jul 12, 2023

View reviewed changes

vatsrahul1001 requested a review from pankajkoti July 12, 2023 15:19

pankajkoti approved these changes Jul 12, 2023

View reviewed changes

vatsrahul1001 merged commit 01aa262 into main Jul 12, 2023

vatsrahul1001 deleted the add_deployemnt_status_check branch July 12, 2023 15:26

Lee-W mentioned this pull request Jul 13, 2023

Add actionlint #1243

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding logic to poll deployment health instead of static wait #1240

Adding logic to poll deployment health instead of static wait #1240

vatsrahul1001 commented Jul 12, 2023

codecov bot commented Jul 12, 2023 •

edited

Loading

pankajkoti Jul 12, 2023

pankajkoti Jul 12, 2023

pankajkoti Jul 12, 2023 •

edited

Loading

pankajkoti commented Jul 12, 2023 •

edited

Loading

Adding logic to poll deployment health instead of static wait #1240

Adding logic to poll deployment health instead of static wait #1240

Conversation

vatsrahul1001 commented Jul 12, 2023

codecov bot commented Jul 12, 2023 • edited Loading

Codecov Report

pankajkoti Jul 12, 2023

Choose a reason for hiding this comment

pankajkoti Jul 12, 2023

Choose a reason for hiding this comment

pankajkoti Jul 12, 2023 • edited Loading

Choose a reason for hiding this comment

pankajkoti commented Jul 12, 2023 • edited Loading

codecov bot commented Jul 12, 2023 •

edited

Loading

pankajkoti Jul 12, 2023 •

edited

Loading

pankajkoti commented Jul 12, 2023 •

edited

Loading