Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support helm install --wait for the chart #11979

Closed
FloChehab opened this issue Oct 30, 2020 · 12 comments
Closed

Support helm install --wait for the chart #11979

FloChehab opened this issue Oct 30, 2020 · 12 comments
Labels
area:helm-chart Airflow Helm Chart kind:feature Feature Requests

Comments

@FloChehab
Copy link
Contributor

Hello,

We are used to using helm install / helm upgrade with the --wait option ; which is nice because the command returns only if install / upgrade is kind of done and not just "commited".

Unfortunately it doesn't currently play really nice with the chart as:

So if the database hasn't been migrated yet, the install / upgrade fails waiting for the migrations to be applied with the --wait option.

One option would be to switch to run the migrations with pre-install and pre-upgrade hooks instead.
I'd enable this behavior only if we have an external database (in a more production friendly env. where we are more likely to use the --wait option too -?-). If we use a database deployed by the chart, I don't think this could play nice.

What do you think ?

PS: another small question related to this ; on airflow versions bump (with schema migrations to perform), are we expecting the RollingUpdate of pods to work (ie are the schemas backward compatible for one airflow version bump).

@aquam8
Copy link

aquam8 commented May 4, 2021

Hi,

First of all, let me thank you for this project and your involvement in this awesome project.

Now onto this issue.. I have come to realise that this is also the problem I am encountering.
This is also the problem reported in this issue (see the use of --wait): #15340 where it's reported that the webserver fails reporting no migrations ran.

I could not understand why the Job resources would not get created, I could see the ServiceAccount resources but not the jobs.
I then created YAML files manually from the helm get values command, and they got created and ran successfully.
After looking more into it I have realised that they are Helm Hooks and indeed will not be installed when Helm is used in conjunction with --wait flag in an install/upgrade command.

I would argue that you cannot make assumption on how helm update is run - usually it's as part of a CI/CD pipeline and applies to potentially many other helm releases. I don't think you can force people to forgot '--wait' flag especially because of its nature. As the OP explains, that flag is particularly critical as it waits for the upgrade to be successful or it rolls back.

I think the solution is to not make these jobs as Hooks - but I'm not sure of the reason. In the past I have switched from Hooks to normal job to run my app migrations as part of normal app deployment. Applying migrations multiple time as no-op is a regular use-case.

I would say that this is a serious issue and should be prioritised to allow a smooth install and upgrade. At the moment, fresh install for people using --wait will fail.

Some more info on the problem:
https://helm.sh/docs/topics/charts_hooks/ says:

Note that if the --wait flag is set, the library will wait until all resources are in a ready state and will not run the post-install hook until they are ready.

At the same time, the webserver & scheduler wait for the migrations to have run, so we have no READY pod, and it's a dead-end. Eventually the Helm --wait will time out and roll back.

Thank you for your consideration

aquam8 added a commit to mx51/helm-chart-packages that referenced this issue May 4, 2021
aquam8 added a commit to mx51/helm-chart-packages that referenced this issue May 4, 2021
* Helm --wait does not work with post- hook so we are switching to pre-

Check remote issue apache/airflow#11979

* Bump version
@Dr-Denzy
Copy link
Contributor

I will like to look into this @kaxil

@thesuperzapper
Copy link
Contributor

thesuperzapper commented Jun 23, 2021

In the airflow-helm chart, we implemented a value to fix this called helmWait, it just removes the post-install hook from our db migration job.

However this can result in an issue with "field is immutable", as described our issue: airflow-helm/charts#230

@vsimon
Copy link
Contributor

vsimon commented Sep 30, 2021

I just got bit by this. Was running it part as part of a CI/CD pipeline using the same arguments as my other helm releases (with --wait). It was confusing why this was happening, noticed that no Jobs were being deployed, started removing arguments one by one until removing --wait worked. Then found this issue :)

@vsimon
Copy link
Contributor

vsimon commented Sep 30, 2021

This also precludes the use of the --atomic option as well since that internally sets the wait flag.

@potiuk
Copy link
Member

potiuk commented Sep 30, 2021

@kaxil @jedcunningham -> i do agre we have to fix it in the official chart, it's kinda unexpected behaviour and I am sure we can do better than post-install :D

@kaxil
Copy link
Member

kaxil commented Sep 30, 2021

Yeah this is something we have on our list to fix sooner, we hate post-install !!!

This will be either Helm Chart 1.3.0 or 2.0.0

@kaxil kaxil added this to the Airflow Helm Chart 1.3.0 milestone Sep 30, 2021
@thesuperzapper
Copy link
Contributor

To update everyone, the "user-community" airflow chart no longer uses a helmWait flag to address this issue. As of chart version 8.5.0, we simply have Deployments for all things which previously ran as post-install jobs (like db-migrations).

NOTE: for users who want to still run a post-install job (rather than a deployment), we added the airflow.dbMigrations.runAsJob value

@foxracle
Copy link

foxracle commented Aug 5, 2022

How is the issue going? with Airflow Helm Chart 1.6.0 deployed by argocd, we followed the instrunctions from official documents, installation is ok, but when we update the values.yaml, it will lead to the problem of "field is immutable" of airflow-run-airflow-migrations job. how can we work around this issue to upgrade helm with argocd? any hint?

meanwhile, some secrets include airflow-fernet-key, airflow-broker-url, airflow-redis-password, git-credentials was deleted(in fact, after click the sync button, they are all recreated successfully first, and 10 seconds later, then they are all deleted and not be created again any more), so no pod others can restart successfully because of the lack of git-credentials secret.

@calebwoofenden
Copy link

calebwoofenden commented Nov 2, 2022

I am running into the same issue. My solution was to set this value in the helm chart:

  migrateDatabaseJob:
    useHelmHooks: false

This solves the problem raised in this issue since it no longer uses the pre-install hook for the migrate db job.

However, I now see an issue sometimes when running helm upgrade --install to patch an existing helm install where it's trying to patch the Job resource, which cannot be done since Jobs are immutable. This causes the helm install to fail. What should be happening is either the Job gets a unique ID with each helm installation so that it's not trying to edit an existing Job, or the Job should be destroyed after it completes. One of the helm hooks that gets added to the Job is

"helm.sh/hook-delete-policy" "before-hook-creation,hook-succeeded"

which I think will handle deleting the Job upon completion, but we can't use the helm hooks for the above reasons. Kube does offer a parameter called .spec.ttlSecondsAfterFinished that, when specified, will delete the Job after the specified number of seconds after completion (see https://kubernetes.io/docs/concepts/workloads/controllers/ttlafterfinished/). This is a pretty good solution, though installs done in a period of time shorter than this value might still run into the same problem.

The solution I propose here is to set a helm pre-install hook on the database deployment as well as the db migrations, and set the hook weight on the database lower than that of the migrations, so the database will be created first, then the migrations will run, then the webserver and etc. pods will come up. Then I think we could safely remove the wait-for-db-migrations initContainer on the deployments.

@potiuk
Copy link
Member

potiuk commented Nov 7, 2022

Can you please open a new issue about it @calebwoofenden . I am closing this one as it is old and Helm Hooks disabling is described in the docs - https://airflow.apache.org/docs/helm-chart/stable/index.html#installing-the-chart-with-argo-cd-flux-or-terraform however your problem likely deserve a new issue.

@calebwoofenden
Copy link

Can you please open a new issue about it @calebwoofenden . I am closing this one as it is old and Helm Hooks disabling is described in the docs - https://airflow.apache.org/docs/helm-chart/stable/index.html#installing-the-chart-with-argo-cd-flux-or-terraform however your problem likely deserve a new issue.

Done: #27561

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:helm-chart Airflow Helm Chart kind:feature Feature Requests
Projects
None yet
Development

No branches or pull requests

10 participants