Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: simulate latency with network chaos #9469

Merged
merged 10 commits into from
Oct 28, 2024
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions Dockerfile.fast
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,8 @@ ENV ACVM_WORKING_DIRECTORY=/usr/src/acvm
ENV ACVM_BINARY_PATH=/usr/src/noir/noir-repo/target/release/acvm
ENV PORT=8080

RUN apt-get update && apt-get install -y ipset

# Create necessary directories
RUN mkdir -p $BB_WORKING_DIRECTORY \
$ACVM_WORKING_DIRECTORY \
Expand Down
4 changes: 2 additions & 2 deletions spartan/chaos-mesh/Chart.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
apiVersion: v2
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didnt want this chart to have the same name as the dependency

name: chaos-mesh
description: A Helm chart for Kubernetes with chaos-mesh
name: chaos
description: A Helm chart for Kubernetes with chaos

type: application
version: 0.1.0
Expand Down
2 changes: 1 addition & 1 deletion spartan/chaos-mesh/install.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,4 @@
helm repo add chaos-mesh https://charts.chaos-mesh.org
helm dependency update

helm upgrade chaos-mesh . -n chaos-mesh --install --create-namespace --atomic
helm upgrade chaos . -n chaos-mesh --install --create-namespace --atomic
17 changes: 17 additions & 0 deletions spartan/chaos-mesh/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,21 @@ chaos-mesh:
subPath: ""

# Disable security mode when running locally, DO NOT DEPLOY THIS CHART IN PRODUCTION
# We disable locally so the local dashboard does not require configuring authentication
securityMode: false

chaosDaemon:
privileged: true
runtime: "containerd"
socketPath: "/run/containerd/containerd.sock"
# capabilities:
# - SYS_PTRACE
# - NET_ADMIN
# - IPC_LOCK
# - SYS_ADMIN

dnsServer:
create: true

podSecurityPolicy:
enabled: false
6 changes: 6 additions & 0 deletions spartan/network-shaping/Chart.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
apiVersion: v2
name: network-shaping
description: Network shaping for spartan using chaos-mesh
type: application
version: 0.1.0
appVersion: "1.0.0"
3 changes: 3 additions & 0 deletions spartan/network-shaping/scripts/apply_network_shaping.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
#!/bin/bash

helm upgrade chaos-mesh . -n chaos-mesh --install --atomic
4 changes: 4 additions & 0 deletions spartan/network-shaping/scripts/stop_experiments.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
#!/bin/bash

# Delete all actively running chaos experiments
kubectl delete networkchaos,podchaos,iochaos,httpchaos --all --all-namespaces
59 changes: 59 additions & 0 deletions spartan/network-shaping/scripts/troubleshoot.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
#!/bin/bash

# Script to troubleshoot chaos network experiment failures
# Specifically for network chaos (latency) on StatefulSet pods

# Check pod status
check_pod_status() {
local namespace=$1
local pod_name=$2
echo "=== Checking Pod Status ==="
kubectl get pod -n $namespace $pod_name -o wide
kubectl describe pod -n $namespace $pod_name
}

# Check network policies
check_network_policies() {
local namespace=$1
echo "=== Checking Network Policies ==="
kubectl get networkpolicies -n $namespace
}

# Check chaos experiment status
check_chaos_status() {
local namespace=$1
echo "=== Checking Chaos Experiment Status ==="
kubectl get chaosengine -n $namespace
kubectl get chaosresult -n $namespace
}

# Check iptables rules on the node
check_iptables() {
local node=$1
echo "=== Checking iptables rules ==="
kubectl debug node/$node -it --image=ubuntu -- bash -c "apt-get update && apt-get install -y iptables && iptables -L"
}

# Check privilege settings
check_privileges() {
local namespace=$1
local pod_name=$2
echo "=== Checking Pod Security Context ==="
kubectl get pod $pod_name -n $namespace -o jsonpath='{.spec.securityContext}'
}

# Main execution
main() {
local namespace="smoke"
local pod_name="spartan-aztec-network-validator-0"
local node=$(kubectl get pod -n $namespace $pod_name -o jsonpath='{.spec.nodeName}')

check_pod_status $namespace $pod_name
check_network_policies $namespace
check_chaos_status $namespace
check_iptables $node
check_privileges $namespace $pod_name
}

# Run script
main
33 changes: 33 additions & 0 deletions spartan/network-shaping/templates/_helpers.tpl
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
{{/*
Create a default fully qualified app name.
*/}}
{{- define "network-shaping.fullname" -}}
{{- if .Values.fullnameOverride }}
{{- .Values.fullnameOverride | trunc 63 | trimSuffix "-" }}
{{- else }}
{{- $name := default .Chart.Name .Values.nameOverride }}
{{- if contains $name .Release.Name }}
{{- .Release.Name | trunc 63 | trimSuffix "-" }}
{{- else }}
{{- printf "%s-%s" .Release.Name $name | trunc 63 | trimSuffix "-" }}
{{- end }}
{{- end }}
{{- end }}

{{/*
Selector labels
*/}}
{{- define "chaos-mesh.selectorLabels" -}}
{{- end }}

{{/*
Common labels
*/}}
{{- define "network-shaping.labels" -}}
app.kubernetes.io/name: {{ include "network-shaping.fullname" . }}
app.kubernetes.io/instance: {{ .Release.Name }}
{{- if .Chart.AppVersion }}
app.kubernetes.io/version: {{ .Chart.AppVersion | quote }}
{{- end }}
app.kubernetes.io/managed-by: {{ .Release.Service }}
{{- end }}
74 changes: 74 additions & 0 deletions spartan/network-shaping/templates/network-chaos.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
{{- if .Values.networkShaping.enabled }}
{{- if .Values.networkShaping.conditions.latency.enabled }}
---
apiVersion: chaos-mesh.org/v1alpha1
kind: NetworkChaos
metadata:
name: {{ .Values.global.namespace }}-latency
namespace: {{ .Values.global.chaosMeshNamespace }}
labels:
{{- include "network-shaping.labels" . | nindent 4 }}
annotations:
"helm.sh/hook": post-install,post-upgrade
"helm.sh/hook-weight": "0"
"helm.sh/resource-policy": keep
spec:
action: delay
mode: all
selector:
namespaces:
- {{ .Values.global.targetNamespace }}
delay:
{{- toYaml .Values.networkShaping.conditions.latency.delay | nindent 4 }}
duration: 8760h # 1 year
{{- end }}

{{- if .Values.networkShaping.conditions.bandwidth.enabled }}
---
apiVersion: chaos-mesh.org/v1alpha1
kind: NetworkChaos
metadata:
name: {{ .Values.global.namespace }}-bandwidth
namespace: {{ .Values.global.chaosMeshNamespace }}
labels:
{{- include "network-shaping.labels" . | nindent 4 }}
annotations:
"helm.sh/hook": post-install,post-upgrade
"helm.sh/hook-weight": "0"
"helm.sh/resource-policy": keep
spec:
action: bandwidth
mode: all
selector:
namespaces:
- {{ .Values.global.targetNamespace }}
bandwidth:
rate: {{ .Values.networkShaping.conditions.bandwidth.rate }}
limit: {{ .Values.networkShaping.conditions.bandwidth.limit }}
buffer: {{ .Values.networkShaping.conditions.bandwidth.buffer }}
duration: 8760h
{{- end }}

{{- if .Values.networkShaping.conditions.packetLoss.enabled }}
---
apiVersion: chaos-mesh.org/v1alpha1
kind: NetworkChaos
metadata:
name: {{ .Values.global.namespace }}-packet-loss
namespace: {{ .Values.global.chaosMeshNamespace }}
labels:
{{- include "network-shaping.labels" . | nindent 4 }}
annotations:
"helm.sh/resource-policy": keep
spec:
action: loss
mode: all
selector:
namespaces:
- {{ .Values.global.targetNamespace }}
loss:
loss: {{ .Values.networkShaping.conditions.packetLoss.loss | quote }}
correlation: {{ .Values.networkShaping.conditions.packetLoss.correlation | quote }}
duration: 8760h
{{- end }}
{{- end }}
102 changes: 102 additions & 0 deletions spartan/network-shaping/values.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
global:
# When deploying, override the namespace to where spartan will deploy to, this will apply all chaos experiments to all pods within that namespace
# run deployment with --values global.namespace=your-namespace
targetNamespace: smoke
chaosMeshNamespace: chaos-mesh

# Network shaping configuration
networkShaping:
# Master switch to enable network shaping
enabled: true

# Default settings
defaultSettings:
mode: all
# Set duration to 1 year so the the experiment will run indefinitely unless overridden
duration: 8760h

# Network conditions to apply
conditions:
# Latency Simulation
latency:
# Enable / disable latency configuration
enabled: false
delay:
# Base latency of all network traffic
# can be defined in ms / s
latency: "100ms"

# Random variation in latency
# Actual will be latency +- jitter
jitter: "50ms"

# Correlation
# This means previous delays will impact the immediate next delay
# Eg: 75 means the current delay will be 75% influenced by the previous delay
correlation: "75"


packetLoss:
# Enable / disable packet loss configuration
enabled: false
# Packet drop percentage
# 2 = 2% of packets will disappear into the ether
loss: "2"

# Correlation
# Higher values mean packet losses happen in bursts
# 25 = 25% influenced by the previous loss
correlation: "25"

bandwidth:
# Enable / disable bandwidth configuration
enabled: false

# Target bandwidth rate
# kbps, mbps, gbps
rate: "1024kbps"

# Burst size, buffer to allow before bandwidth limiting is applied
# Not a string!
limit: 20971520

# Buffer = smoother bandwidth restriction but higher memory usage
buffer: 1000



## Here are some exciting example configurations created by claude:
# Example use cases for different configurations:

# High latency network simulation (e.g., satellite)
# latency:
# enabled: true
# delay:
# latency: 500ms
# jitter: 50ms
# correlation: "75"

# Mobile network simulation (3G)
# bandwidth:
# enabled: true
# rate: 1500kbps
# limit: 1500kb
# buffer: 1000
# latency:
# enabled: true
# delay:
# latency: 100ms
# jitter: 40ms
# correlation: "75"

# Unreliable network simulation
# packetLoss:
# enabled: true
# loss: "5"
# correlation: "75"
# latency:
# enabled: true
# delay:
# latency: 150ms
# jitter: 30ms
# correlation: "75"
25 changes: 25 additions & 0 deletions spartan/network-shaping/values/hard.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Simulates global network conditions
# High latency, moderate bandwidth
global:
namespace: "smoke"

networkShaping:
enabled: true
conditions:
latency:
enabled: true
delay:
# Global network latency (e.g., intercontinental)
latency: 200ms
jitter: 40ms
correlation: "75"
bandwidth:
enabled: true
# 20Mbps
rate: 20mbps
limit: 10000000 # 10 MB
buffer: 4000
packetLoss:
enabled: true
loss: "1"
correlation: "70"
25 changes: 25 additions & 0 deletions spartan/network-shaping/values/mild.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@

# Moderate latency, very high bandwidth
global:
namespace: "smoke"

networkShaping:
enabled: true
conditions:
latency:
enabled: true
delay:
# Typical datacenter-to-datacenter latency
latency: 50ms
jitter: 10ms
correlation: "75"
bandwidth:
enabled: true
# 100Mbps
rate: 100mbps
limit: 50000000 # 50 MB
buffer: 8000
packetLoss:
enabled: true
loss: "0.1"
correlation: "50"
Loading
Loading