Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update the timeout of the clean-up cluster step to 1h #7938

Merged
merged 1 commit into from
Sep 19, 2024

Conversation

ytimocin
Copy link
Contributor

Description

Update the timeout of the clean-up cluster step to 1h.

Type of change

  • This pull request is a minor refactor, code cleanup, test improvement, or other maintenance task and doesn't change the functionality of Radius (issue link optional).

@ytimocin ytimocin requested review from a team as code owners September 18, 2024 03:21
Copy link

github-actions bot commented Sep 18, 2024

Unit Tests

3 366 tests  ±0   3 360 ✅ ±0   3m 57s ⏱️ -1s
  265 suites ±0       6 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit f1bd766. ± Comparison against base commit 69d6726.

♻️ This comment has been updated with latest results.

Copy link

codecov bot commented Sep 18, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 61.07%. Comparing base (69d6726) to head (f1bd766).
Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #7938   +/-   ##
=======================================
  Coverage   61.07%   61.07%           
=======================================
  Files         531      531           
  Lines       28061    28061           
=======================================
  Hits        17138    17138           
  Misses       9427     9427           
  Partials     1496     1496           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -512,7 +514,7 @@ jobs:
- name: Clean up cluster
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to have two cleanup cluster steps? These both seem to be part of the test job

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One cleans up before the tests and the other one after the tests. This is to be extra defensive. We want to have a clean start before running the tests and want to make sure we delete everything after we run the tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense. PR lgtm then

@ytimocin ytimocin force-pushed the ytimocin/lrw/update-clean-up-timeout branch from 235b973 to 7d22b2d Compare September 18, 2024 19:27
@radius-functional-tests
Copy link

radius-functional-tests bot commented Sep 18, 2024

Radius functional test overview

🔍 Go to test action run

Name Value
Repository radius-project/radius
Commit ref 7d22b2d
Unique ID func7f3bc5d9d6
Image tag pr-func7f3bc5d9d6
Click here to see the list of tools in the current test run
  • gotestsum 1.12.0
  • KinD: v0.20.0
  • Dapr: 1.12.0
  • Azure KeyVault CSI driver: 1.4.2
  • Azure Workload identity webhook: 1.3.0
  • Bicep recipe location ghcr.io/radius-project/dev/test/testrecipes/test-bicep-recipes/<name>:pr-func7f3bc5d9d6
  • Terraform recipe location http://tf-module-server.radius-test-tf-module-server.svc.cluster.local/<name>.zip (in cluster)
  • applications-rp test image location: ghcr.io/radius-project/dev/applications-rp:pr-func7f3bc5d9d6
  • controller test image location: ghcr.io/radius-project/dev/controller:pr-func7f3bc5d9d6
  • ucp test image location: ghcr.io/radius-project/dev/ucpd:pr-func7f3bc5d9d6
  • deployment-engine test image location: ghcr.io/radius-project/deployment-engine:latest

Test Status

⌛ Building Radius and pushing container images for functional tests...
✅ Container images build succeeded
⌛ Publishing Bicep Recipes for functional tests...
✅ Recipe publishing succeeded
⌛ Starting datastoresrp-cloud functional tests...
⌛ Starting ucp-cloud functional tests...
⌛ Starting corerp-cloud functional tests...
✅ datastoresrp-cloud functional tests succeeded
✅ ucp-cloud functional tests succeeded
✅ corerp-cloud functional tests succeeded

@lakshmimsft
Copy link
Contributor

lakshmimsft commented Sep 19, 2024

@ytimocin I am curious if this will alleviate the issue. If we are terminating the script in an hour and there are still resources/finalizers than need clean up, will they be cleaned up while the finalizers are still attached during the second run on the script? Based on the overlapping runs for multiple days, it seems like multiple runs of the script did not help clean it up.
Atleast we will not wait for the script to complete over an hour, so that is good.
let's try this. If it works, great, if not, maybe after the first run of the script, we can cycle through all resources in the namespace with 'Terminating' state and log them, so we narrow down what the finalizers are waiting on.

@ytimocin
Copy link
Contributor Author

@ytimocin I am curious if this will alleviate the issue. If we are terminating the script in an hour and there are still resources/finalizers than need clean up, will they be cleaned up while the finalizers are still attached during the second run on the script? Based on the overlapping runs for multiple days, it seems like multiple runs of the script did not help clean it up. Atleast we will not wait for the script to complete over an hour, so that is good. let's try this. If it works, great, if not, maybe after the first run of the script, we can cycle through all resources in the namespace with 'Terminating' state and log them, so we narrow down what the finalizers are waiting on.

The problem with 6-hour timeout is that it doesn't do clean-up during that time. It just waits for the finalizers to finalize. But that doesn't happen. If the workflow waited for 6 days, it wouldn't also happen. The problem is something else.

I wanted to reduce the time to 1h so that we don't wait 6 hours for a run to be finalized.

Thanks for the feedback and the approval @lakshmimsft!

@ytimocin ytimocin force-pushed the ytimocin/lrw/update-clean-up-timeout branch from 7d22b2d to f1bd766 Compare September 19, 2024 16:55
@radius-functional-tests
Copy link

radius-functional-tests bot commented Sep 19, 2024

Radius functional test overview

🔍 Go to test action run

Name Value
Repository radius-project/radius
Commit ref f1bd766
Unique ID func95acc5885e
Image tag pr-func95acc5885e
Click here to see the list of tools in the current test run
  • gotestsum 1.12.0
  • KinD: v0.20.0
  • Dapr: 1.12.0
  • Azure KeyVault CSI driver: 1.4.2
  • Azure Workload identity webhook: 1.3.0
  • Bicep recipe location ghcr.io/radius-project/dev/test/testrecipes/test-bicep-recipes/<name>:pr-func95acc5885e
  • Terraform recipe location http://tf-module-server.radius-test-tf-module-server.svc.cluster.local/<name>.zip (in cluster)
  • applications-rp test image location: ghcr.io/radius-project/dev/applications-rp:pr-func95acc5885e
  • controller test image location: ghcr.io/radius-project/dev/controller:pr-func95acc5885e
  • ucp test image location: ghcr.io/radius-project/dev/ucpd:pr-func95acc5885e
  • deployment-engine test image location: ghcr.io/radius-project/deployment-engine:latest

Test Status

⌛ Building Radius and pushing container images for functional tests...
✅ Container images build succeeded
⌛ Publishing Bicep Recipes for functional tests...
✅ Recipe publishing succeeded
⌛ Starting ucp-cloud functional tests...
⌛ Starting datastoresrp-cloud functional tests...
⌛ Starting corerp-cloud functional tests...
✅ ucp-cloud functional tests succeeded
✅ datastoresrp-cloud functional tests succeeded
✅ corerp-cloud functional tests succeeded

@ytimocin ytimocin merged commit 50c1600 into main Sep 19, 2024
31 checks passed
@ytimocin ytimocin deleted the ytimocin/lrw/update-clean-up-timeout branch September 19, 2024 18:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants