-
Notifications
You must be signed in to change notification settings - Fork 301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
e2e tests on CI - actually await k8s resources to be ready before starting tests #1997
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
joeyorlando
commented
May 23, 2023
@@ -361,7 +364,7 @@ jobs: | |||
--set oncall.twilio.authToken="${{ secrets.TWILIO_AUTH_TOKEN }}" \ | |||
--set oncall.twilio.phoneNumber="\"${{ secrets.TWILIO_PHONE_NUMBER }}"\" \ | |||
--set oncall.twilio.verifySid="${{ secrets.TWILIO_VERIFY_SID }}" \ | |||
--set grafana.replicas=3 \ | |||
--set grafana.replicas=1 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't reliably use > 1 grafana replica when using SQLite as the grafana database
bitnami/charts#10905
joeyorlando
commented
May 23, 2023
@@ -287,6 +287,9 @@ jobs: | |||
- name: Checkout | |||
uses: actions/checkout@v3 | |||
|
|||
- name: Collect Workflow Telemetry | |||
uses: runforesight/workflow-telemetry-action@v1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
brojd
pushed a commit
that referenced
this pull request
Sep 18, 2024
…rting tests (#1997) Occasionally, the Playwright global setup step (which authenticates w/ the Grafana API + configures the plugin) would fail, leading to the CI job to instantly fail (playwright doesn't retry global setup if it fails). My current hypothesis as to why this is happening is because the `oncall-engine` and `oncall-celery` pods aren't _actually_ ready in these cases based on the way the `jupyterhub/action-k8s-await-workloads` action await k8s workloads: <img width="1076" alt="Screenshot 2023-05-23 at 18 24 36" src="https://github.com/grafana/oncall/assets/9406895/68d8d2d9-4274-4749-8788-e0a9a3dbad83"> By using the `kubectl rollout status deployment/<deployment-name> --timeout=300s` instead, we can be sure that these pods are _actually_ ready to receive traffic before we start the tests. ```bash ❯ kubectl rollout status --help Show the status of the rollout. By default 'rollout status' will watch the status of the latest rollout until it's done. If you don't want to wait for the rollout to finish then you can use --watch=false. Note that if a new rollout starts in-between, then 'rollout status' will continue watching the latest revision. If you want to pin to a specific revision and abort if it is rolled over by another revision, use --revision=N where N is the revision you need to watch for. ``` Lastly, even despite this, sometimes the `POST /api/internal/v1/plugin/sync` endpoint will return HTTP 500 ([example logs](https://github.com/grafana/oncall/actions/runs/5062712137/jobs/9088529416#step:19:2536) from failed CI job). In this case, let's setup the Playwright global setup to retry 3 times.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Occasionally, the Playwright global setup step (which authenticates w/ the Grafana API + configures the plugin) would fail, leading to the CI job to instantly fail (playwright doesn't retry global setup if it fails).
My current hypothesis as to why this is happening is because the
oncall-engine
andoncall-celery
pods aren't actually ready in these cases based on the way thejupyterhub/action-k8s-await-workloads
action await k8s workloads:By using the
kubectl rollout status deployment/<deployment-name> --timeout=300s
instead, we can be sure that these pods are actually ready to receive traffic before we start the tests.Lastly, even despite this, sometimes the
POST /api/internal/v1/plugin/sync
endpoint will return HTTP 500 (example logs from failed CI job). In this case, let's setup the Playwright global setup to retry 3 times.