-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changing Image tags causes Error: UPGRADE FAILED: cannot patch "<name>-create-user" with kind Job: Job.batch "<name>-create-user" #21943
Comments
Thanks for opening your first issue here! Be sure to follow the issue template! |
This was a sideeffect of have
That was required with the --wait flag which doesn't work as per #11979 |
@repl-mike-roest Curious if you ever found a permanent resolution for this. We have this in terraform, so we can't easily remove the |
I think #27148 might be the fix for this. It adds |
@mconigliaro Thank you kindly |
The helm chart is released now, and this doesn't solve the problem. Updating the |
When you upgrade/change job in K8S that has been finished and not manually removed, this leads to "Field is immutable" error. This is a known kubernetes issue: kubernetes/kubernetes#89657 And there are some workarounds (manually removing the job for example), but the only good solution is possible only in K8S 1.23+ with ttlSecondsAfterFinished set for the job, so that K8S can auto clean it. This PR adds it conditionally for K8S >= 1.23 Fixes: apache#21943
I think #29439 should handle it long term. Seems that it is a known issue with K8S @jay-olulana kubernetes/kubernetes#89657 and it has been fixed in 1.23 by adding UPDATE: The #29439 has been closed in favour of more complete: #29314 There is no automated way for you to recover, but you can do it manually if I am right:
Once you redeploy the chart with the PR including the I would appreciate @jay-olulana if you could test some scenarios involved and confirm that my proposed fix works for you. |
Hi @potiuk, Sorry that the reply is late. But I did apply your fix after upgrading to
createUserJob:
useHelmHooks: false
applyCustomEnv: false
migrateDatabaseJob:
useHelmHooks: false
applyCustomEnv: false
ttlSecondsAfterFinished: 300
Good News is that my airflow pods are recreated in EKS with the new tags and are healthy. |
Did you wait 5 minutes (after the job completed) before updating the tag? |
Yes, I did (more than even). But I will run more tests this weekend and let you know if it persists. |
Please - also you can check if the ttls is observed the job should disappear after 5 minutes, so if it is still there, maybe for some reason the version of K8S you run it has it disabled. |
Note that this fix has not yet been released and requires manual patching of the chart and recreating your deployment from scratch. Is So you need to make sure that your chart contains the changes, that your job gets the spec parameter and that k8s handles it. If those conditions are not fulfilled you can always redeploy Airflow from the scratch. |
@potiuk Is there any particular reason the chart is not released yet? I think it'd ease on deployment.
|
The image is released semi-regularly when release managers decide to relese it (I am not one for Helm Chart BTW). I think asking "reason for not releasing" is a wrongly asked question. It takes time and effort to publish the release. And it is done by volunteers when they see the time is good for it, and one issue affecting small group of users might not be enough to warrant it. I think the right question you could ask is "what can I do to help with speeding up the release". Let me answer this question instead. I think if you confirm that the change fixes the problem by applying the changes locally and confirming it here, it might definitely increase the chances that release managers will make a decision about releasing the helm chart. Also - as a follow up (after you confirrm it) it would immensely help if you help testing the release candidate. Subscribe to the devlist to get announcement about it and whenever we release an RC for chart, we ask people to test it and confirm that it works. I looked it up and I have not seen your help in Can we count on your help there @elongl to verify and confirm it and then later take part in testing when an RC is out? That would certainly help to speed up the release. |
@potiuk Thanks a lot for sharing the context on the Helm releases. |
Of course - aee #21943 (comment) and the usual Helm chart things. Helm chart is just a folder you can install. You have install instructions in https://github.com/apache/airflow/tree/main/chart (INSTALL) or you can host it yourself somewhere. This is where manual patching (or using latest sources) come into play, Just check it out, patch the changes (or use lates main) and use helm installation from there. |
And when RC is out you will also be able to install it following the RC instructions https://github.com/apache/airflow/blob/main/dev/README_RELEASE_HELM_CHART.md#verify-release-candidates-by-contributors Those instructions are posted every time RC is out. |
Thanks again, that really helped. Also, I noticed that the |
Is there an alternative to including it into my version control? |
No - not until we release it |
OK. Closing since it is confirmed. |
cc: @jedcunningham @ephraimbuddy @pierrejeambrun -> FYI, might be useful to determine on when to release the new chart. |
Official Helm Chart version
1.4.0 (latest released)
Apache Airflow version
2.2.4 (latest released)
Kubernetes Version
1.21.5 (EKS)
Helm Chart configuration
We are using a external RDS DB server configured via secrets.
Also have specified
Along with these flags As without them while deploying via codebuild the chart was never progressing as the create-user/run-db-migrations jobs were not running
Docker Image customisations
Happens both with a transition from airflow default image 2.2.3 -> 2.2.4 and with changing our custom image between versions or from a default airflow image to our custom image
What happened
The following error was returned from the helm upgrade command
What you expected to happen
The helm chart to successfully upgrade and change my running images to a new version
How to reproduce
deploy helm chart with
in your values.yaml
and the following command
helm upgrade --install --wait --timeout 900s pre-production apache-airflow/airflow --namespace airflow --version 1.4.0
-f values.yaml
Then run the same command after changing the image tags to 2.2.4
Anything else
Seems to happen whenever we change the image tag (even within the same release) if we're using a custom image that contains our dags changing from one tag to the other gets the same error.
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: