Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding logic to poll deployment health instead of static wait #1240

Merged
merged 7 commits into from
Jul 12, 2023

Conversation

vatsrahul1001
Copy link
Contributor

Currently, we have a static sleep time of 1800 seconds for the deployment to reach a healthy state. However, instead of relying on this fixed duration, we can implement a polling mechanism to check the health of the deployment. If the deployment is found to be unhealthy, we can Exit 1. This improvement will address the following issues:

  1. We observed today that even when the deployment was unhealthy, our workflow still succeeded. This PR will prevent such occurrences in the future.
  2. By implementing polling, we can reduce the overall execution time of the workflow. In most cases, deployments become healthy within 5-10 minutes, and with polling, we can optimize this process.

@codecov
Copy link

codecov bot commented Jul 12, 2023

Codecov Report

Patch and project coverage have no change.

Comparison is base (d50a8b6) 98.58% compared to head (189c2d4) 98.58%.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1240   +/-   ##
=======================================
  Coverage   98.58%   98.58%           
=======================================
  Files          90       90           
  Lines        5389     5389           
=======================================
  Hits         5313     5313           
  Misses         76       76           

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@vatsrahul1001 vatsrahul1001 requested a review from pankajkoti July 12, 2023 08:46
@vatsrahul1001 vatsrahul1001 requested a review from Lee-W July 12, 2023 09:20
@@ -45,30 +45,25 @@ jobs:
run: |
astro_core_api="https://api.astronomer.io/v1alpha1/organizations/${{secrets.organization_id }}/\
deployments"
tries=60
tries=15
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might make sense to wait for an hour maybe in some scenarios where it might take longer. So tries=30?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or the previous one would have been fine too. 60 * 60

health_flag=false

while [[ $tries -gt 0 && $health_flag == false ]]; do
sleep 120
Copy link
Collaborator

@pankajkoti pankajkoti Jul 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think would be good to place this log at the end of the while loop, just to see logs saying Deployment status is different than Healthy and it's waiting for deployment to be healthy in case it becomes healthy in <2 mins.

@vatsrahul1001 vatsrahul1001 requested a review from pankajkoti July 12, 2023 15:19
@pankajkoti
Copy link
Collaborator

pankajkoti commented Jul 12, 2023

We can address the open comments later if needed. I think we're good to merge for now since we tested the workflow.

@vatsrahul1001 vatsrahul1001 merged commit 01aa262 into main Jul 12, 2023
@vatsrahul1001 vatsrahul1001 deleted the add_deployemnt_status_check branch July 12, 2023 15:26
@Lee-W Lee-W mentioned this pull request Jul 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants