-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add git sync option and unit tests for the Helm chart #9371
Conversation
d2162f6
to
51fb039
Compare
@@ -82,6 +82,9 @@ spec: | |||
{{- include "custom_airflow_environment" . | indent 10 }} | |||
{{- include "standard_airflow_environment" . | indent 10 }} | |||
containers: | |||
{{- if .Values.gitSync.enabled }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dag sync won't be needed on the webserver if dag serialization is enabled
chart/values.yaml
Outdated
@@ -434,3 +434,23 @@ postgresql: | |||
enabled: true | |||
postgresqlPassword: postgres | |||
postgresqlUsername: postgres | |||
# Git sync | |||
gitSync: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should put this under a kubernetesExecutor
section in case we need to expand to further k8sexecutor-specific features
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially thought the same, but the git sync sidecars might also be used without the KubernetesExecutor
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok yeah I can see that. would just need to set a relatively fast cycle for pinging/re-pulling the repo s.t. the worker stays in sync with the scheduler/webserver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated the sync interval to a default of 1 minute, have added in a PVC as well
chart/values.yaml
Outdated
# ssh example: [email protected]:aneesh-joseph/dags-repo.git | ||
# https example: https://github.com/aneesh-joseph/dags-repo.git | ||
#TODO Change the URLs used for testing | ||
repo: https://github.com/aneesh-joseph/dags-repo.git |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please change this to the apache project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, updated it to use dags from https://github.com/apache/airflow/tree/v1-10-stable/tests/dags
8b13e51
to
5d60c26
Compare
c317639
to
d2abd4f
Compare
scripts/ci/ci_run_helm_testing.sh
Outdated
|
||
docker run -w /airflow-chart -v "$CHART_DIR":/airflow-chart \ | ||
--entrypoint /bin/sh \ | ||
divya12/helm-unittest \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to change this image
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created an issue: #9401 - we have a few more "private" images that we depend on, we should replace them with our own copy of those images. For now it can stay like it is .
chart/values.yaml
Outdated
# [email protected]:apache/airflow.git | ||
# https example: https://github.com/apache/airflow.git | ||
repo: https://github.com/apache/airflow.git | ||
branch: v1-10-stable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will change it later to a variable. But for now iI think it's good. I plan to use the Helm chart not only for unit tests (which are great BTW) but also to run our Kubernetes tests (we are currently using some custom, simple templates for the kubernetes resources - replacing it with the helm chart from master and using images built locally will be a great way to do doggfooding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool, that would be awesome 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Co-authored-by: Ash Berlin-Taylor <[email protected]>
1f54704
to
5b36045
Compare
@dimberman @ashb @potiuk - the tests are now passing, can you help in getting this merged please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great Change! I was thinking - shoudl not we also update the tests in test_kubernetes? we already have git_sync mode for those tests but it is disable - I am happy to help in switching the git_sync tests on. We can do it in a separate change but it actually belongs here.
@dimberman WDYT?
@@ -66,7 +66,7 @@ The command removes all the Kubernetes components associated with the chart and | |||
|
|||
## Updating DAGs | |||
|
|||
The recommended way to update your DAGs with this chart is to build a new docker image with the latest code (`docker build -t my-company/airflow:8a0da78 .`), push it to an accessible registry (`docker push my-company/airflow:8a0da78`), then update the Airflow pods with that image: | |||
The recommended way to update your DAGs with this chart is to build a new docker image with the latest DAG code (`docker build -t my-company/airflow:8a0da78 .`), push it to an accessible registry (`docker push my-company/airflow:8a0da78`), then update the Airflow pods with that image: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we will want to change that description as it is not valid any more and we have "EMBED_DAGS" directive while building the dockerfiles + we will add on-build most likely to add the DAGs on building deppendent image but I will do it separately.
# Refer values.yaml for details | ||
``` | ||
|
||
## Mounting DAGS using Git-Sync side car without Persistence |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice to have this option.!
chart/requirements.lock
Outdated
@@ -1,6 +0,0 @@ | |||
dependencies: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do wer delete this ?
@@ -0,0 +1,64 @@ | |||
# Licensed to the Apache Software Foundation (ASF) under one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love the unit tests of helm chart! ❤️
Co-authored-by: Jarek Potiuk <[email protected]>
Where is this chart hosted? Is it related to https://hub.helm.sh/charts/stable/airflow? |
@friedhardware, this chart is only available from this repo for now and it is the official one, not related to the current one on Helm.sh |
{{- if .Values.dags.persistence.enabled }} | ||
- name: dags | ||
persistentVolumeClaim: | ||
claimName: {{ .Release.Name }}-dags |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If uses dags.persistence.existingClaim
, it should be modified to
claimName: {{ .Values.dags.persistence.existingClaim }}
.
Currently, the webserver pod cannot find pvc, so it is put in a pending state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have raised a PR to fix this - #9688
I got an issue yesterday with git-sync and the K8s worker pods. Nonetheless, git-sync along with the webserver and scheduler of Airflow works since the env var AIRFLOW__KUBERNETES__RUN_AS_USER="50000" isn't set which is not the case in the K8S worker pods. To fix this, we have to define Also, we have to set the UID to 50000 otherwise Airflow fails with the following error:
|
yes it might be failing as the user 50000 doesn't exist in |
* add git sync sidecars * add a helm test * add more tests * allow users to provide git username and pass via a k8s secrets * set default values for airflow worker repository & tag * change ci timeout * fix link * add credentials_secret to airflow.cfg configmap * set GIT_SYNC_ADD_USER on kubernetes worker pods, set uid * add fsGroup to webserver and kubernete workers * move gitSync to dags.gitSync * rename valueFields * turn off git sync and dag persistence by default * provide option to specify known_hosts * add git-sync details into the chart documentation * Update .gitignore Co-authored-by: Ash Berlin-Taylor <[email protected]> * make git sync max failures configurable * Apply suggestions from code review Co-authored-by: Jarek Potiuk <[email protected]> * add back requirements.lock Co-authored-by: Ash Berlin-Taylor <[email protected]> Co-authored-by: Jarek Potiuk <[email protected]> (cherry picked from commit d93555b)
* add git sync sidecars * add a helm test * add more tests * allow users to provide git username and pass via a k8s secrets * set default values for airflow worker repository & tag * change ci timeout * fix link * add credentials_secret to airflow.cfg configmap * set GIT_SYNC_ADD_USER on kubernetes worker pods, set uid * add fsGroup to webserver and kubernete workers * move gitSync to dags.gitSync * rename valueFields * turn off git sync and dag persistence by default * provide option to specify known_hosts * add git-sync details into the chart documentation * Update .gitignore Co-authored-by: Ash Berlin-Taylor <[email protected]> * make git sync max failures configurable * Apply suggestions from code review Co-authored-by: Jarek Potiuk <[email protected]> * add back requirements.lock Co-authored-by: Ash Berlin-Taylor <[email protected]> Co-authored-by: Jarek Potiuk <[email protected]> (cherry picked from commit d93555b)
* add git sync sidecars * add a helm test * add more tests * allow users to provide git username and pass via a k8s secrets * set default values for airflow worker repository & tag * change ci timeout * fix link * add credentials_secret to airflow.cfg configmap * set GIT_SYNC_ADD_USER on kubernetes worker pods, set uid * add fsGroup to webserver and kubernete workers * move gitSync to dags.gitSync * rename valueFields * turn off git sync and dag persistence by default * provide option to specify known_hosts * add git-sync details into the chart documentation * Update .gitignore Co-authored-by: Ash Berlin-Taylor <[email protected]> * make git sync max failures configurable * Apply suggestions from code review Co-authored-by: Jarek Potiuk <[email protected]> * add back requirements.lock Co-authored-by: Ash Berlin-Taylor <[email protected]> Co-authored-by: Jarek Potiuk <[email protected]> (cherry picked from commit d93555b)
Make sure to mark the boxes below before creating PR: [x]
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.
Read the Pull Request Guidelines for more information.
Improve the helm chart with:
known_hosts
when using git sync over ssh with an ssh keyDockerfile
from a forked repohttps://github.com/aneesh-joseph/helm-unittest
.