-
Notifications
You must be signed in to change notification settings - Fork 248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(chore) Enable continue-on-error
for deploy job in CI workflow
#1593
Conversation
Size Change: 0 B Total Size: 10.6 MB ℹ️ View Unchanged
|
@denniskigen So, yeah, this needs some discussion. When I setup the project I was aware this was likely to be an issue. The underlying cause is that Bamboo only allows (for good reason) a single instance of each job to be running at once. This is one way we could fix things, but its a bit of a blunt instrument. Most of the time the Bamboo job is hanging is actually down to building the backend and most of the reason that we have only one job running at a time is that we both publish tagged images to Docker Hub and then run a deployment task. I'm thinking that a better solution here is to split the Bamboo workflow up a bit into:
Basically, we might be able to setup Bamboo to queue up the deployment job as a child of the first three and then change this to only be submitting to the job to update the frontend image. That would likely make the existing deploy task fail less and also actually speed up getting merged changes to deploy. Plus, by queueing the slow parts separately, I think we can actually not fail the job as frequently (we may need to add an intermediate job that queues tagging the frontend and then we can allow Bamboo to run as many parallel versions of the job as we want, so this would never fail). Ultimately, though, I think we should also update the job to not mark the build as failed if the deploy step doesn't work for this reason, because, really, it's very likely nothing went wrong. Thoughts? |
@ibacher I agree. This PR was really a shot in the dark more than anything concrete. The plan you laid out is very sensible indeed and I defer to your judgement fully because of your knowledge of the Bamboo setup. Any step towards making the job less susceptible to (perceived) failure is obviously very welcome! As it doesn't seem like any of that setup will happen in the workflow file, I'm happy to close this PR. |
Although it kinda invalidates the description, I've updated the PR with the change I think we need here. I still need to look into the Bamboo side of things... |
continue-on-error
for deploy job in CI workflow
I've updated the PR title and description |
Requirements
Summary
This PR adds a new step to the CI workflow file that retries thedeploy_patient_chart
job when it fails. Thedeploy_patient_chart
job takes on average around 30 minutes to deploy a build on Bamboo.If a build is ongoing for a merged PR, and a subsequent PR gets merged and triggers the job again, the request will fail with the following error:To mitigate this issue and avoid the need for manual intervention to re-trigger the deploy job after the pending build completes, this PR adds a step that uses the wretry.action Github Action to attempt to rerun the deploy job every 15 minutes up to 3 times. This will allow hopefully the job to complete successfully even if there are issues with the initial deployment attempt.UPDATED:
This PR adds a
continue-on-error
option to thedeploy-patient-chart
job and sets its value totrue
. This makes it so that the workflow will continue to execute even if that step or job encounters an error. This is a prerequisite to additional changes earmarked for the Bamboo deployment job further downstream. See the thread below for exact details from @ibacher.Screenshots
Related Issue
Other