Skip to content

Commit

Permalink
Add git sync option and unit tests for the Helm chart (#9371)
Browse files Browse the repository at this point in the history
* add git sync sidecars

* add a helm test

* add more tests

* allow users to provide git username and pass via  a k8s secrets

* set default values for airflow worker repository & tag

* change ci timeout

* fix link

* add credentials_secret to airflow.cfg configmap

* set GIT_SYNC_ADD_USER on kubernetes worker pods, set uid

* add fsGroup to webserver and kubernete workers

* move gitSync to dags.gitSync

* rename valueFields

* turn off git sync and dag persistence by default

* provide option to specify known_hosts

* add git-sync details into the chart documentation

* Update .gitignore

Co-authored-by: Ash Berlin-Taylor <[email protected]>

* make git sync max failures configurable

* Apply suggestions from code review

Co-authored-by: Jarek Potiuk <[email protected]>

* add back requirements.lock

Co-authored-by: Ash Berlin-Taylor <[email protected]>
Co-authored-by: Jarek Potiuk <[email protected]>
(cherry picked from commit d93555b)
  • Loading branch information
aneesh-joseph authored and potiuk committed Jul 22, 2020
1 parent f46763e commit 91d1371
Show file tree
Hide file tree
Showing 17 changed files with 721 additions and 6 deletions.
11 changes: 11 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -306,6 +306,17 @@ ${{ hashFiles('requirements/requirements-python${{matrix.python-version}}.txt')
- name: "Tests"
run: ./scripts/ci/ci_run_airflow_testing.sh

helm-tests:
timeout-minutes: 5
name: "Checks: Helm tests"
runs-on: ubuntu-latest
env:
CI_JOB_TYPE: "Tests"
steps:
- uses: actions/checkout@master
- name: "Helm Tests"
run: ./scripts/ci/ci_run_helm_testing.sh

requirements:
timeout-minutes: 80
name: "Requirements"
Expand Down
2 changes: 1 addition & 1 deletion CI.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ environments we use. Most of our CI jobs are written as bash scripts which are e
the CI jobs and we are mapping all the CI-specific environment variables to generic "CI" variables.
The only two places where CI-specific code might be are:

- CI-specific declaration file (for example it is `<.github/workflow/ci.yml>`_ for GitHub Actions
- CI-specific declaration file (for example it is `<.github/workflows/ci.yml>`_ for GitHub Actions
- The ``get_environment_for_builds_on_ci`` function in `<scripts/ci/libraries/_build_images.sh>`_ where mapping is
performed from the CI-environment specific to generic values. Example for that is CI_EVENT_TYPE variable
which determines whether we are running a ``push``. ``schedule`` or ``pull_request`` kind of CI job. For
Expand Down
4 changes: 4 additions & 0 deletions airflow/kubernetes/worker_configuration.py
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,10 @@ def _get_init_containers(self):
name='GIT_SSH_KEY_FILE',
value='/etc/git-secret/ssh'
),
k8s.V1EnvVar(
name='GIT_SYNC_ADD_USER',
value='true'
),
k8s.V1EnvVar(
name='GIT_SYNC_SSH',
value='true'
Expand Down
40 changes: 39 additions & 1 deletion chart/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ The command removes all the Kubernetes components associated with the chart and

## Updating DAGs

The recommended way to update your DAGs with this chart is to build a new docker image with the latest code (`docker build -t my-company/airflow:8a0da78 .`), push it to an accessible registry (`docker push my-company/airflow:8a0da78`), then update the Airflow pods with that image:
The recommended way to update your DAGs with this chart is to build a new docker image with the latest DAG code (`docker build -t my-company/airflow:8a0da78 .`), push it to an accessible registry (`docker push my-company/airflow:8a0da78`), then update the Airflow pods with that image:

```bash
helm upgrade airflow . \
Expand All @@ -77,6 +77,42 @@ helm upgrade airflow . \
For local development purppose you can also u
You can also build the image locally and use it via deployment method described by Breeze.

## Mounting DAGS using Git-Sync side car with Persistence enabled

This option will use a Persistent Volume Claim with an accessMode of `ReadWriteMany`. The scheduler pod will sync DAGs from a git repository onto the PVC every configured number of seconds. The other pods will read the synced DAGs. Not all volume plugins have support for `ReadWriteMany` accessMode. Refer [Persistent Volume Access Modes](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes) for details

```bash
helm upgrade airflow . \
--set dags.persistence.enabled=true \
--set dags.gitSync.enabled=true
# you can also override the other persistence or gitSync values
# by setting the dags.persistence.* and dags.gitSync.* values
# Please refer to values.yaml for details
```

## Mounting DAGS using Git-Sync side car without Persistence
This option will use an always running Git-Sync side car on every scheduler,webserver and worker pods. The Git-Sync side car containers will sync DAGs from a git repository every configured number of seconds. If you are using the KubernetesExecutor, Git-sync will run as an initContainer on your worker pods.

```bash
helm upgrade airflow . \
--set dags.persistence.enabled=false \
--set dags.gitSync.enabled=true
# you can also override the other gitSync values
# by setting the dags.gitSync.* values
# Refer values.yaml for details
```

## Mounting DAGS from an externally populated PVC
In this approach, Airflow will read the DAGs from a PVC which has `ReadOnlyMany` or `ReadWriteMany` accessMode. You will have to ensure that the PVC is populated/updated with the required DAGs(this won't be handled by the chart). You can pass in the name of the volume claim to the chart

```bash
helm upgrade airflow . \
--set dags.persistence.enabled=true \
--set dags.persistence.existingClaim=my-volume-claim
--set dags.gitSync.enabled=false
```


## Parameters

The following tables lists the configurable parameters of the Airflow chart and their default values.
Expand Down Expand Up @@ -160,6 +196,8 @@ The following tables lists the configurable parameters of the Airflow chart and
| `webserver.resources.requests.cpu` | CPU Request of webserver | `~` |
| `webserver.resources.requests.memory` | Memory Request of webserver | `~` |
| `webserver.defaultUser` | Optional default airflow user information | `{}` |
| `dags.persistence.*` | Dag persistence configutation | Please refer to `values.yaml` |
| `dags.gitSync.*` | Git sync configuration | Please refer to `values.yaml` |


Specify each parameter using the `--set key=value[,key=value]` argument to `helm install`. For example,
Expand Down
95 changes: 95 additions & 0 deletions chart/templates/_helpers.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,80 @@
{{ end }}
{{- end }}

{{/* Git ssh key volume */}}
{{- define "git_sync_ssh_key_volume"}}
- name: git-sync-ssh-key
secret:
secretName: {{ .Values.dags.gitSync.sshKeySecret }}
defaultMode: 288
{{- end }}

{{/* Git sync container */}}
{{- define "git_sync_container"}}
- name: {{ .Values.dags.gitSync.containerName }}
image: "{{ .Values.dags.gitSync.containerRepository }}:{{ .Values.dags.gitSync.containerTag }}"
env:
{{- if .Values.dags.gitSync.sshKeySecret }}
- name: GIT_SSH_KEY_FILE
value: "/etc/git-secret/ssh"
- name: GIT_SYNC_SSH
value: "true"
{{- if .Values.dags.gitSync.knownHosts }}
- name: GIT_KNOWN_HOSTS
value: "true"
- name: GIT_SSH_KNOWN_HOSTS_FILE
value: "/etc/git-secret/known_hosts"
{{- else }}
- name: GIT_KNOWN_HOSTS
value: "false"
{{- end }}
{{ else if .Values.dags.gitSync.credentialsSecret }}
- name: GIT_SYNC_USERNAME
valueFrom:
secretKeyRef:
name: {{ .Values.dags.gitSync.credentialsSecret | quote }}
key: GIT_SYNC_USERNAME
- name: GIT_SYNC_PASSWORD
valueFrom:
secretKeyRef:
name: {{ .Values.dags.gitSync.credentialsSecret | quote }}
key: GIT_SYNC_PASSWORD
{{- end }}
- name: GIT_SYNC_REV
value: {{ .Values.dags.gitSync.rev | quote }}
- name: GIT_SYNC_BRANCH
value: {{ .Values.dags.gitSync.branch | quote }}
- name: GIT_SYNC_REPO
value: {{ .Values.dags.gitSync.repo | quote }}
- name: GIT_SYNC_DEPTH
value: {{ .Values.dags.gitSync.depth | quote }}
- name: GIT_SYNC_ROOT
value: {{ .Values.dags.gitSync.root | quote }}
- name: GIT_SYNC_DEST
value: {{ .Values.dags.gitSync.dest | quote }}
- name: GIT_SYNC_ADD_USER
value: "true"
- name: GIT_SYNC_WAIT
value: {{ .Values.dags.gitSync.wait | quote }}
- name: GIT_SYNC_MAX_SYNC_FAILURES
value: {{ .Values.dags.gitSync.maxFailures | quote }}
volumeMounts:
- name: dags
mountPath: {{ .Values.dags.gitSync.root }}
{{- if and .Values.dags.gitSync.enabled .Values.dags.gitSync.sshKeySecret }}
- name: git-sync-ssh-key
mountPath: /etc/git-secret/ssh
readOnly: true
subPath: gitSshKey
{{- if .Values.dags.gitSync.knownHosts }}
- name: config
mountPath: /etc/git-secret/known_hosts
readOnly: true
subPath: known_hosts
{{- end }}
{{- end }}
{{- end }}

# This helper will change when customers deploy a new image.
{{ define "airflow_image" -}}
{{ printf "%s:%s" (.Values.images.airflow.repository | default .Values.defaultAirflowRepository) (.Values.images.airflow.tag | default .Values.defaultAirflowTag) }}
Expand Down Expand Up @@ -185,9 +259,30 @@ log_connections = {{ .Values.pgbouncer.logConnections }}
{{ (printf "%s/logs" .Values.airflowHome) | quote }}
{{- end }}

{{ define "airflow_dags" -}}
{{- if .Values.dags.gitSync.enabled -}}
{{ (printf "%s/dags/%s/%s" .Values.airflowHome .Values.dags.gitSync.dest .Values.dags.gitSync.subPath ) }}
{{- else -}}
{{ (printf "%s/dags" .Values.airflowHome) }}
{{- end -}}
{{- end -}}

{{ define "airflow_dags_volume_claim" -}}
{{- if and .Values.dags.persistence.enabled .Values.dags.persistence.existingClaim -}}
{{ .Values.dags.persistence.existingClaim }}
{{- else -}}
{{ .Release.Name }}-dags
{{- end -}}
{{- end -}}

{{ define "airflow_dags_mount_path" -}}
{{ (printf "%s/dags" .Values.airflowHome) }}
{{- end }}

{{ define "airflow_config_path" -}}
{{ (printf "%s/airflow.cfg" .Values.airflowHome) | quote }}
{{- end }}

{{ define "airflow_webserver_config_path" -}}
{{ (printf "%s/webserver_config.py" .Values.airflowHome) | quote }}
{{- end }}
Expand Down
40 changes: 37 additions & 3 deletions chart/templates/configmap.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ data:
# These are system-specified config overrides.
airflow.cfg: |
[core]
dags_folder = {{ include "airflow_dags" . }}
load_examples = False
colored_console_log = False
executor = {{ .Values.executor }}
Expand Down Expand Up @@ -84,13 +85,42 @@ data:
namespace = {{ .Release.Namespace }}
airflow_configmap = {{ include "airflow_config" . }}
airflow_local_settings_configmap = {{ include "airflow_config" . }}
worker_container_repository = {{ .Values.images.airflow.repository }}
worker_container_tag = {{ .Values.images.airflow.tag }}
worker_container_repository = {{ .Values.images.airflow.repository | default .Values.defaultAirflowRepository }}
worker_container_tag = {{ .Values.images.airflow.tag | default .Values.defaultAirflowTag }}
worker_container_image_pull_policy = {{ .Values.images.airflow.pullPolicy }}
worker_service_account_name = {{ .Release.Name }}-worker-serviceaccount
image_pull_secrets = {{ template "registry_secret" . }}
dags_in_image = True
dags_in_image = {{ if or .Values.dags.gitSync.enabled .Values.dags.persistence.enabled }}False{{ else }}True{{ end }}
delete_worker_pods = True
run_as_user = {{ .Values.uid }}
fs_group = {{ .Values.gid }}
{{- if or .Values.dags.gitSync.enabled .Values.dags.persistence.enabled }}
git_dags_folder_mount_point = {{ include "airflow_dags_mount_path" . }}
dags_volume_mount_point = {{ include "airflow_dags_mount_path" . }}
{{- if .Values.dags.persistence.enabled }}
dags_volume_claim = {{ .Release.Name }}-dags
dags_volume_subpath = {{.Values.dags.gitSync.dest }}/{{ .Values.dags.gitSync.subPath }}
{{- else }}
git_repo = {{ .Values.dags.gitSync.repo }}
git_branch = {{ .Values.dags.gitSync.branch }}
git_sync_rev = {{ .Values.dags.gitSync.rev }}
git_sync_depth = {{ .Values.dags.gitSync.depth }}
git_sync_root = {{ .Values.dags.gitSync.root }}
git_sync_dest = {{ .Values.dags.gitSync.dest }}
git_sync_container_repository = {{ .Values.dags.gitSync.containerRepository }}
git_sync_container_tag = {{ .Values.dags.gitSync.containerTag }}
git_sync_init_container_name = {{ .Values.dags.gitSync.containerName }}
git_sync_run_as_user = {{ .Values.uid }}
{{- if .Values.dags.gitSync.knownHosts }}
git_ssh_known_hosts_configmap_name = {{ include "airflow_config" . }}
{{- end }}
{{- if .Values.dags.gitSync.sshKeySecret }}
git_ssh_key_secret_name = {{ .Values.dags.gitSync.sshKeySecret }}
{{- else if .Values.dags.gitSync.credentialsSecret }}
git_sync_credentials_secret = {{ .Values.dags.gitSync.credentialsSecret }}
{{- end }}
{{- end }}
{{- end }}

[kubernetes_secrets]
AIRFLOW__CORE__SQL_ALCHEMY_CONN = {{ printf "%s=connection" (include "airflow_metadata_secret" .) }}
Expand All @@ -117,3 +147,7 @@ data:
airflow_local_settings.py: |
{{ .Values.scheduler.airflowLocalSettings | nindent 4 }}
{{- end }}
{{- if and .Values.dags.gitSync.enabled .Values.dags.gitSync.knownHosts }}
known_hosts: |
{{ .Values.dags.gitSync.knownHosts | nindent 4 }}
{{- end }}
41 changes: 41 additions & 0 deletions chart/templates/dags-persistent-volume-claim.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

{{- if and (not .Values.dags.persistence.existingClaim ) .Values.dags.persistence.enabled }}
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: {{ .Release.Name }}-dags
labels:
tier: airflow
component: dags-pvc
release: {{ .Release.Name }}
chart: "{{ .Chart.Name }}-{{ .Chart.Version }}"
heritage: {{ .Release.Service }}
spec:
accessModes: [{{ .Values.dags.persistence.accessMode | quote }}]
resources:
requests:
storage: {{ .Values.dags.persistence.size | quote }}
{{- if .Values.dags.persistence.storageClass }}
{{- if (eq "-" .Values.dags.persistence.storageClass) }}
storageClassName: ""
{{- else }}
storageClassName: "{{ .Values.dags.persistence.storageClass }}"
{{- end }}
{{- end }}
{{- end }}
16 changes: 16 additions & 0 deletions chart/templates/scheduler/scheduler-deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -144,6 +144,11 @@ spec:
mountPath: {{ template "airflow_local_setting_path" . }}
subPath: airflow_local_settings.py
readOnly: true
{{- end }}
{{- if .Values.dags.gitSync.enabled }}
- name: dags
mountPath: {{ template "airflow_dags_mount_path" . }}
{{- include "git_sync_container" . | indent 8 }}
{{- end }}
# Always start the garbage collector sidecar.
- name: scheduler-gc
Expand Down Expand Up @@ -177,6 +182,17 @@ spec:
- name: config
configMap:
name: {{ template "airflow_config" . }}
{{- if .Values.dags.persistence.enabled }}
- name: dags
persistentVolumeClaim:
claimName: {{ template "airflow_dags_volume_claim" . }}
{{- else if .Values.dags.gitSync.enabled }}
- name: dags
emptyDir: {}
{{- end }}
{{- if and .Values.dags.gitSync.enabled .Values.dags.gitSync.sshKeySecret }}
{{- include "git_sync_ssh_key_volume" . | indent 8 }}
{{- end }}
{{- if not $stateful }}
- name: logs
emptyDir: {}
Expand Down
19 changes: 19 additions & 0 deletions chart/templates/webserver/webserver-deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,7 @@ spec:
restartPolicy: Always
securityContext:
runAsUser: {{ .Values.uid }}
fsGroup: {{ .Values.gid }}
{{- if or .Values.registry.secretName .Values.registry.connection }}
imagePullSecrets:
- name: {{ template "registry_secret" . }}
Expand All @@ -82,6 +83,9 @@ spec:
{{- include "custom_airflow_environment" . | indent 10 }}
{{- include "standard_airflow_environment" . | indent 10 }}
containers:
{{- if and (.Values.dags.gitSync.enabled) (not .Values.dags.persistence.enabled) }}
{{- include "git_sync_container" . | indent 8 }}
{{- end }}
- name: webserver
image: {{ template "airflow_image" . }}
imagePullPolicy: {{ .Values.images.airflow.pullPolicy }}
Expand All @@ -105,6 +109,10 @@ spec:
subPath: airflow_local_settings.py
readOnly: true
{{- end }}
{{- if or .Values.dags.gitSync.enabled .Values.dags.persistence.enabled }}
- name: dags
mountPath: {{ template "airflow_dags_mount_path" . }}
{{- end }}
{{- if .Values.webserver.extraVolumeMounts }}
{{ toYaml .Values.webserver.extraVolumeMounts | indent 12 }}
{{- end }}
Expand Down Expand Up @@ -134,6 +142,17 @@ spec:
- name: config
configMap:
name: {{ template "airflow_config" . }}
{{- if .Values.dags.persistence.enabled }}
- name: dags
persistentVolumeClaim:
claimName: {{ .Release.Name }}-dags
{{- else if .Values.dags.gitSync.enabled }}
- name: dags
emptyDir: {}
{{- if .Values.dags.gitSync.sshKeySecret }}
{{- include "git_sync_ssh_key_volume" . | indent 8 }}
{{- end }}
{{- end }}
{{- if .Values.webserver.extraVolumes }}
{{ toYaml .Values.webserver.extraVolumes | indent 8 }}
{{- end }}
Loading

0 comments on commit 91d1371

Please sign in to comment.