Skip to content

Commit

Permalink
Adding logic to poll deployment health instead of static wait (#1240)
Browse files Browse the repository at this point in the history
Currently, we have a static sleep time of 1800 seconds for the deployment to reach a healthy state. However, instead of relying on this fixed duration, we can implement a polling mechanism to check the health of the deployment. If the deployment is found to be unhealthy, we can Exit 1. This improvement will address the following issues:

We observed today that even when the deployment was unhealthy, our workflow still succeeded. This PR will prevent such occurrences in the future.
By implementing polling, we can reduce the overall execution time of the workflow. In most cases, deployments become healthy within 5-10 minutes, and with polling, we can optimize this process.
  • Loading branch information
vatsrahul1001 authored Jul 12, 2023
1 parent d50a8b6 commit 01aa262
Show file tree
Hide file tree
Showing 2 changed files with 34 additions and 10 deletions.
4 changes: 4 additions & 0 deletions .github/workflows/deploy-integration-tests.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,8 @@ jobs:
deployment_id: ${{ secrets.PROVIDER_INTEGRATION_TESTS_DEPLOYMENT_ID }}
astronomer_key_id: ${{ secrets.PROVIDER_INTEGRATION_TESTS_ASTRONOMER_KEY_ID }}
astronomer_key_secret: ${{ secrets.PROVIDER_INTEGRATION_TESTS_ASTRONOMER_KEY_SECRET }}
organization_id: ${{ secrets.ORGANIZATION_ID }}
bearer_token: ${{ secrets.BEARER_TOKEN }}

deploy-to-providers-integration-tests-on-KE:
if: |
Expand Down Expand Up @@ -88,3 +90,5 @@ jobs:
deployment_id: ${{ secrets.PROVIDER_INTEGRATION_TESTS_ON_KE_DEPLOYMENT_ID }}
astronomer_key_id: ${{ secrets.PROVIDER_INTEGRATION_TESTS_ON_KE_ASTRONOMER_KEY_ID }}
astronomer_key_secret: ${{ secrets. PROVIDER_INTEGRATION_TESTS_ON_KE_ASTRONOMER_KEY_SECRET }}
organization_id: ${{ secrets.ORGANIZATION_ID }}
bearer_token: ${{ secrets.BEARER_TOKEN }}
40 changes: 30 additions & 10 deletions .github/workflows/trigger-dag-reuse-wf.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,6 @@ on: # yamllint disable-line rule:truthy
required: false
type: string
default: ''
wait_time:
description: 'seconds to wait (default 1800 seconds = 30 minutes)'
required: false
type: number
default: 1800
dags_to_trigger_after_deployment:
description: |
Comma separated list of dag_ids to trigger after deployment
Expand All @@ -34,17 +29,42 @@ on: # yamllint disable-line rule:truthy
astronomer_key_secret:
description: 'astro cloud astronomer_key_secret'
required: true
organization_id:
description: 'astro cloud organization_id'
required: true
bearer_token:
description: 'workspace bearer token'
required: true

jobs:
wait-for-deployment-to-be-ready-and-trigger-dag:
runs-on: 'ubuntu-20.04'
steps:
- name: Sleep and wait for astro cloud deployment

- name: Wait for deployment to be healthy
run: |
echo "Current timestamp is" `date`
echo "Sleeping for ${{ inputs.wait_time }}"
echo "allowing the deployed image to be updated across all Airflow components."
sleep ${{ inputs.wait_time }}
astro_core_api="https://api.astronomer.io/v1alpha1/organizations/${{secrets.organization_id }}/\
deployments"
tries=15
health_flag=false
while [[ $tries -gt 0 && $health_flag == false ]]; do
sleep 120
response=$(curl -s -H "Authorization: Bearer ${{ secrets.bearer_token }}" -X GET \
"$astro_core_api?deploymentIds=${{ secrets.deployment_id }}")
deployment_status=$(echo "$response" | jq -r '.deployments[0].status')
echo "Deployment status is: $deployment_status"
echo "Waiting for deployment to be in ready state!!!"
if [[ $deployment_status == "HEALTHY" ]]; then
health_flag=true
fi
tries=$((tries - 1))
done
if [[ $health_flag == false ]]; then
echo "Timed out waiting for deployment ${{ secrets.deployment_id }} to be HEALTHY"
exit 1
fi
echo "${{ secrets.deployment_id }} is in HEALTHY state now"
- name: Checkout
uses: actions/checkout@v3
Expand Down

0 comments on commit 01aa262

Please sign in to comment.