Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add time to live for finished jobs in Kubernetes #29439

Conversation

potiuk
Copy link
Member

@potiuk potiuk commented Feb 9, 2023

When you upgrade/change job in K8S that has been finished and not manually removed, this leads to "Field is immutable" error.

This is a known kubernetes issue:

kubernetes/kubernetes#89657

And there are some workarounds (manually removing the job for example), but the only good solution is possible only in K8S 1.23+ with ttlSecondsAfterFinished set for the job, so that K8S can auto
clean it.

This PR adds it conditionally for K8S >= 1.23

Fixes: #27561


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@boring-cyborg boring-cyborg bot added the area:helm-chart Airflow Helm Chart label Feb 9, 2023
@potiuk potiuk requested a review from dimberman February 9, 2023 10:23
@potiuk potiuk force-pushed the fix-cleanup-of-jobs-for-terraform-redeployments branch from 1cecb63 to 294d266 Compare February 9, 2023 10:25
@potiuk potiuk changed the title Add time to leave for finished jobs in Kubernetes Add time to live for finished jobs in Kubernetes Feb 9, 2023
When you upgrade/change job in K8S that has been finished and not
manually removed, this leads to "Field is immutable" error.

This is a known kubernetes issue:

kubernetes/kubernetes#89657

And there are some workarounds (manually removing the job for
example), but the only good solution is possible only in K8S 1.23+
with ttlSecondsAfterFinished set for the job, so that K8S can auto
 clean it.

This PR adds it conditionally for K8S >= 1.23

Fixes: apache#21943
@potiuk
Copy link
Member Author

potiuk commented Feb 12, 2023

Hey @dstandish @jedcunningham -> WDYT?

Copy link
Member

@hussein-awala hussein-awala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an interesting change, but there's still a chance of getting the same error if for some reason we run the jobs twice before the TTL ends, on top of that we can add {{ .Release.Revision }} as suffix for job name -> metadata.name: {{ .Release.Name }}-<job name>-{{ .Release.Revision }}, and keep the cleanup task for garbage collection.

@potiuk
Copy link
Member Author

potiuk commented Feb 13, 2023

It's an interesting change, but there's still a chance of getting the same error if for some reason we run the jobs twice before the TTL ends, on top of that we can add {{ .Release.Revision }} as suffix for job name -> metadata.name: {{ .Release.Name }}-<job name>-{{ .Release.Revision }}, and keep the cleanup task for garbage collection.

Thought about it, but I am not sure if there is another reason why we wanted to keep the job names "static" - @jedcunningham @dstandish ?

@potiuk
Copy link
Member Author

potiuk commented Feb 13, 2023

Closing in favour of #29314

@potiuk potiuk closed this Feb 13, 2023
@potiuk potiuk deleted the fix-cleanup-of-jobs-for-terraform-redeployments branch March 22, 2023 18:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:helm-chart Airflow Helm Chart
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Helm chart tries to patch immutable Job resources on helm upgrade
2 participants