Skip to content

Commit

Permalink
Improve install_cni_chaining to support updates to CNI conf file (#4012)
Browse files Browse the repository at this point in the history
The script is in charge of overwriting the cloud-specific CNI conf file
(e.g., 10-aws.conf for EKS).
However, the script is currently run as an initContainer, and does not
account for the possibility that the CNI conf file may be modified again
by the cloud provider at a later time, hence discarding the changes
made by the script.
For example, restarting aws-node on EKS will cause the 10-aws.conf file
to be overwritten with the default configuration, and Antrea will no
longer be involved in Pod networking configuration. For the user,
everything may appear to work from a connectivity standpoint, but
NetworkPolicies will not be enforced!

To avoid this issue, we run install_cni_chaining in a "normal"
container, and leverage inotify to monitor the CNI conf file. Every time
another process writes to the file, we process it one more time and
update it again if necessary.

This solution is not perfect. I think that there is a small possibility
of race conditions, but they remain very unlikely. One example is this
sequence of events:
1. aws-node overwrites the CNI conf file (because of a restart?)
2. a new Pod is created on the Node, the Antrea CNI is not used
3. install_cni_chaining updates the CNI conf file and adds Antrea to the
   chain

Avoiding this race would require some major changes (e.g., to
antrea-eks-node-init). Because changes to the CNI conf file are *very*
infrequent, I think this is acceptable.

This solution is loosely based on the linkerd CNI installation script:
https://github.com/linkerd/linkerd2/blob/main/cni-plugin/deployment/scripts/install-cni.sh

Fixes #3974

Signed-off-by: Antonin Bas <[email protected]>
  • Loading branch information
antoninbas authored Jul 19, 2022
1 parent 1655f9f commit 231b09d
Show file tree
Hide file tree
Showing 5 changed files with 141 additions and 29 deletions.
32 changes: 19 additions & 13 deletions build/charts/antrea/templates/agent/daemonset.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -65,12 +65,29 @@ spec:
{{- end }}
serviceAccountName: antrea-agent
initContainers:
{{- if .Values.whereabouts.enable }}
- name: install-whereabouts-config
image: "{{ .Values.image.repository }}:{{ default .Chart.AppVersion .Values.image.tag }}"
resources:
requests:
cpu: "100m"
command: ["install_whereabouts_config"]
volumeMounts:
- name: whereabouts-cni-conf
mountPath: /host/etc/cni/net.d/whereabouts.d
- name: whereabouts-secret
mountPath: /var/run/secrets/whereabouts
{{- end }}
{{- if eq .Values.trafficEncapMode "networkPolicyOnly" }}
containers:
{{- end }}
- name: install-cni
image: "{{ .Values.image.repository }}:{{ default .Chart.AppVersion .Values.image.tag }}"
imagePullPolicy: {{ .Values.image.pullPolicy }}
resources: {{- .Values.agent.installCNI.resources | toYaml | nindent 12 }}
{{- if eq .Values.trafficEncapMode "networkPolicyOnly" }}
command: ["install_cni_chaining"]
args: ["--monitor"]
{{- else }}
command: ["install_cni"]
{{- end }}
Expand Down Expand Up @@ -100,20 +117,9 @@ spec:
# For changing the default permissions of the run directory.
- name: host-var-run-antrea
mountPath: /var/run/antrea
{{- if .Values.whereabouts.enable }}
- name: install-whereabouts-config
image: "{{ .Values.image.repository }}:{{ default .Chart.AppVersion .Values.image.tag }}"
resources:
requests:
cpu: "100m"
command: ["install_whereabouts_config"]
volumeMounts:
- name: whereabouts-cni-conf
mountPath: /host/etc/cni/net.d/whereabouts.d
- name: whereabouts-secret
mountPath: /var/run/secrets/whereabouts
{{- end }}
{{- if ne .Values.trafficEncapMode "networkPolicyOnly" }}
containers:
{{- end }}
- name: antrea-agent
image: "{{ .Values.image.repository }}:{{ default .Chart.AppVersion .Values.image.tag }}"
imagePullPolicy: {{ .Values.image.pullPolicy }}
Expand Down
2 changes: 1 addition & 1 deletion build/images/base/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ USER root
# chmod in the RUN command below instead.
ADD https://raw.githubusercontent.com/kubernetes-sigs/iptables-wrappers/9e6ce59c864623ea71a6f7d59c35fcb13a919b87/iptables-wrapper-installer.sh /iptables-wrapper-installer.sh

RUN apt-get update && apt-get install -y --no-install-recommends ipset jq && \
RUN apt-get update && apt-get install -y --no-install-recommends ipset jq inotify-tools && \
rm -rf /var/lib/apt/lists/* && \
chmod +x /iptables-wrapper-installer.sh && \
/iptables-wrapper-installer.sh
Expand Down
130 changes: 117 additions & 13 deletions build/images/scripts/install_cni_chaining
Original file line number Diff line number Diff line change
Expand Up @@ -2,30 +2,134 @@

source logging

HOST_CNI_NET_DIR="/host/etc/cni/net.d"

run_monitor="false"

function usage {
echo "install_cni_chaining"
echo -e " -h|--help Print help message"
echo -e " --monitor Monitor the CNI conf file and re-apply changes every time it is overwritten"
}

while (( "$#" )); do
case "$1" in
-h|--help)
usage
exit 0
;;
--monitor)
run_monitor="true"
;;
-*|--*) # unsupported flags
echo "Error: unsupported flag $1" >&2
exit 1
;;
*) # standalone arguments are not supported
echo "Error: unsupported argument $1" >&2
exit 1
;;
esac
shift
done

# Find the cni conf file with lowest name
while true; do
cni_conf=$(ls /host/etc/cni/net.d | head -n1)
if [[ ! -z $cni_conf ]]; then
cni_conf_name=$(ls "$HOST_CNI_NET_DIR" | head -n1)
if [[ ! -z $cni_conf_name ]]; then
break
fi
log_info "CNI conf file not found. Retrying after 2 secs"
log_info "install_cni_chaining" "CNI conf file not found. Retrying after 2 secs"
sleep 2s
done
cni_conf="/host/etc/cni/net.d/$cni_conf"
cni_conf_path="$HOST_CNI_NET_DIR/$cni_conf_name"

if grep -sq "azure" $cni_conf; then
sed -i 's/"mode":"bridge",/"mode":"transparent",/g' $cni_conf
fi
function update_cni_conf {
log_info "install_cni_chaining" "updating CNI conf file $cni_conf_name"

cat $cni_conf | jq '.plugins[] | .type' | grep antrea > /dev/null 2>&1
if [[ $? != 0 ]]; then
content=$(cat $cni_conf | jq '.plugins += [{"type": "antrea"}]')
echo "$content" > $cni_conf
fi
# To limit the risk of issue because of race conditions, we use the
# following steps:
# 1. read the file once and store its contents in a variable
# 2. perform the necessary changes on the variable contents
# 3. compute the sha of the updated variable contents
# 4. write the variable contents back to the file
content=$(cat $cni_conf_path)

echo "$content" | grep -sq "azure"
if [[ $? == 0 ]]; then
# Note that in more recent AKS versions, transparent is the default:
# https://github.com/Azure/azure-container-networking/pull/709
content="$(echo "$content" | sed 's/"mode":"bridge",/"mode":"transparent",/g')"
fi

echo "$content" | jq '.plugins[] | .type' | grep -sq antrea
if [[ $? != 0 ]]; then
content="$(echo "$content" | jq '.plugins += [{"type": "antrea"}]')"
fi

cni_conf_sha="$(echo "$content" | sha256sum | while read -r s _; do echo "$s"; done)"
echo "$content" > $cni_conf_path
}
# monitor will start a watch on host's CNI config directory.
# when we detect a change to the CNI conf file, we call update_cni_conf.
function monitor {
inotifywait -m "$HOST_CNI_NET_DIR" -e create,close_write |
while read -r directory action filename; do
if [[ "$filename" == "$cni_conf_name" ]]; then
log_info "install_cni_chaining" "inotify event in $directory: $action $filename"
sha="$(sha256sum "$directory/$filename" | while read -r s _; do echo "$s"; done)"
if [[ "$sha" == "" ]]; then
log_warning "install_cni_chaining" "unable to compute sha, file may have been deleted"
continue
fi
if [[ "$sha" == "$cni_conf_sha" ]]; then
log_info "install_cni_chaining" "sha matches existing one, ignoring event"
continue
fi
update_cni_conf
fi
done
}
# Update conf file the first time
update_cni_conf
# Install Antrea binary file
install -m 755 /usr/local/bin/antrea-cni /host/opt/cni/bin/antrea
id
# Load the OVS kernel module
modprobe openvswitch || (echo "Failed to load the OVS kernel module from the container, try running 'modprobe openvswitch' on your Nodes"; exit 1)
modprobe openvswitch || { echo "Failed to load the OVS kernel module from the container, try running 'modprobe openvswitch' on your Nodes"; exit 1;}
if [[ "$run_monitor" == "false" ]]; then
exit 0
fi
SLEEP_PID=
MONITOR_PID=
function quit {
log_info "install_cni_chaining" "Exiting"
# terminate background monitor process
if [ "$MONITOR_PID" != "" ]; then kill $MONITOR_PID > /dev/null 2>&1 || true; fi
# terminate background sleep process
if [ "$SLEEP_PID" != "" ]; then kill $SLEEP_PID > /dev/null 2>&1 || true; fi
exit 0
}
# Do not trap EXIT as it would then ignore the "exit 0" statement in quit and
# exit with code 128 + SIGNAL
trap "quit" INT TERM HUP
log_info "install_cni_chaining" "Starting inotify monitor for $cni_conf_name"
monitor &
MONITOR_PID=$!
while true; do
# sleep so script never finishes
# we start sleep in bg so we can trap signals
sleep 3600 &
SLEEP_PID=$!
wait $SLEEP_PID
done
3 changes: 2 additions & 1 deletion build/yamls/antrea-aks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3711,13 +3711,15 @@ spec:
operator: Exists
serviceAccountName: antrea-agent
initContainers:
containers:
- name: install-cni
image: "projects.registry.vmware.com/antrea/antrea-ubuntu:latest"
imagePullPolicy: IfNotPresent
resources:
requests:
cpu: 100m
command: ["install_cni_chaining"]
args: ["--monitor"]
securityContext:
capabilities:
add:
Expand All @@ -3744,7 +3746,6 @@ spec:
# For changing the default permissions of the run directory.
- name: host-var-run-antrea
mountPath: /var/run/antrea
containers:
- name: antrea-agent
image: "projects.registry.vmware.com/antrea/antrea-ubuntu:latest"
imagePullPolicy: IfNotPresent
Expand Down
3 changes: 2 additions & 1 deletion build/yamls/antrea-eks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3711,13 +3711,15 @@ spec:
operator: Exists
serviceAccountName: antrea-agent
initContainers:
containers:
- name: install-cni
image: "projects.registry.vmware.com/antrea/antrea-ubuntu:latest"
imagePullPolicy: IfNotPresent
resources:
requests:
cpu: 100m
command: ["install_cni_chaining"]
args: ["--monitor"]
securityContext:
capabilities:
add:
Expand All @@ -3744,7 +3746,6 @@ spec:
# For changing the default permissions of the run directory.
- name: host-var-run-antrea
mountPath: /var/run/antrea
containers:
- name: antrea-agent
image: "projects.registry.vmware.com/antrea/antrea-ubuntu:latest"
imagePullPolicy: IfNotPresent
Expand Down

0 comments on commit 231b09d

Please sign in to comment.