Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add longevity test plan and results #1113

Merged
merged 9 commits into from
Oct 11, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .yamllint.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ rules:
.github/workflows/
deploy/manifests/nginx-gateway.yaml
deploy/manifests/crds
tests/longevity/manifests/cronjob.yaml
new-line-at-end-of-file: enable
new-lines: enable
octal-values: disable
Expand Down
149 changes: 149 additions & 0 deletions tests/longevity/longevity.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,149 @@
# Longevity Test

This document describes how we test NGF for longevity.

<!-- TOC -->

- [Longevity Test](#longevity-test)
- [Goals](#goals)
- [Test Environment](#test-environment)
- [Steps](#steps)
- [Start](#start)
- [Check the Test is Running Correctly](#check-the-test-is-running-correctly)
- [End](#end)
- [Analyze](#analyze)
- [Results](#results)

<!-- TOC -->

## Goals

- Ensure that NGF successfully processes both control plane and data plane transactions over a period of time much
greater than in our other tests.
- Catch bugs that could only appear over a period of time (like resource leaks).

## Test Environment

- A Kubernetes cluster with 3 nodes on GKE
- Node: e2-medium (2 vCPU, 4GB memory)
- Enabled GKE logging.
- Enabled GKE Cloud monitoring with managed Prometheus service, with enabled:
- system.
- kube state - pods, deployments.
- Tester VMs:
pleshakov marked this conversation as resolved.
Show resolved Hide resolved
- Configuration:
- Debian
- Install packages: tmux, wrk
- Location - same zone as the Kubernetes cluster.
- First VM - for HTTP traffic
- Second VM - for sending HTTPs traffic
- NGF
- Deployment with 1 replica
- Exposed via a Service with type LoadBalancer, private IP
- Gateway, two listeners - HTTP and HTTPs
- Two apps:
- Coffee - 3 replicas
- Tea - 3 replicas
- Two HTTPRoutes
- Coffee (HTTP)
- Tea (HTTPS)

## Steps

### Start

Test duration - 4 days.

1. Create a Kubernetes cluster on GKE.
pleshakov marked this conversation as resolved.
Show resolved Hide resolved
2. Deploy NGF.
3. Expose NFG via a Load Balancer Service with `"networking.gke.io/load-balancer-type":"Internal"` annotation to
pleshakov marked this conversation as resolved.
Show resolved Hide resolved
allocate an internal load balancer.
4. Apply the manifests which will:
1. Deploy the coffee and tea backends.
2. Configure HTTP and HTTPS listeners on the Gateway.
3. Expose coffee via HTTP listener and tea via HTTPS listener.
4. Create two CronJobs to re-rollout backends:
1. Coffee - every minute for an hour every 6 hours
2. Tea - every minute for an hour every 6 hours, 3 ours apart from coffee.
pleshakov marked this conversation as resolved.
Show resolved Hide resolved
5. Configure Prometheus on GKE to pick up NGF metrics.

```shell
kubectl apply -f files
```

5. In Tester VMs, update `/etc/hosts` to have an entry with the External IP of the NGF Service (`10.128.0.10` in this
case):

```text
10.128.0.10 cafe.example.com
```

6. In Tester VMs, start a tmux session (this is needed so that even if you disconnect from the VM, any launched command:
pleshakov marked this conversation as resolved.
Show resolved Hide resolved
will keep running):

```shell
tmux
```

7. In First VM, start wrk for 4 days for coffee via HTTP:

```shell
wrk -t2 -c100 -d96h http://cafe.example.com/coffee
```

8. In Second VM, start wrk for 4 days for tea via HTTPS:

```shell
wrk -t2 -c100 -d96h https://cafe.example.com/tea
```

Notes:

- The updated coffee and tea backends in cafe.yaml include extra configuration for zero time upgrades, so that
wrk in Tester VMs don't get 502 from NGF. Based on https://learnk8s.io/graceful-shutdown
pleshakov marked this conversation as resolved.
Show resolved Hide resolved

### Check the Test is Running Correctly

Check that you don't see any errors:

1. Traffic is flowing - look at the access logs of NGINX.
pleshakov marked this conversation as resolved.
Show resolved Hide resolved
2. Check that cron job can run.
pleshakov marked this conversation as resolved.
Show resolved Hide resolved

```shell
kubectl create job --from=cronjob/coffee-rollout-mgr coffee-test
kubectl create job --from=cronjob/tea-rollout-mgr tea-test
```

3. Check that GKE exports logs and Prometheus metrics.

In case of errors, double check if you prepared the environment and launched the test correctly.

### End

- Remove CronJobs.

## Analyze

- Traffic
- Tester VMs (clients)
- As wrk stop, they will print output upon termination. To connect to the tmux session with wrk,
run `tmux attach -t 0`
- Check for errors, latency, RPS
- Logs
- Check the logs for errors in Google Cloud Operations Logging.
- NGF
- NGINX
- Check metrics in Google Cloud Monitoring.
- NGF
- CPU usage
- NGINX
- NGF
- Memory usage
- NGINX
- NGF
- NGINX metrics
- Reloads

## Results

- [1.0.0](results/1.0.0.md)
kate-osborn marked this conversation as resolved.
Show resolved Hide resolved
37 changes: 37 additions & 0 deletions tests/longevity/manifests/cafe-routes.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
apiVersion: gateway.networking.k8s.io/v1beta1
kind: HTTPRoute
metadata:
name: coffee
spec:
parentRefs:
- name: gateway
sectionName: http
hostnames:
- "cafe.example.com"
rules:
- matches:
- path:
type: PathPrefix
value: /coffee
backendRefs:
- name: coffee
port: 80
---
apiVersion: gateway.networking.k8s.io/v1beta1
kind: HTTPRoute
metadata:
name: tea
spec:
parentRefs:
- name: gateway
sectionName: https
hostnames:
- "cafe.example.com"
rules:
- matches:
- path:
type: PathPrefix
value: /tea
backendRefs:
- name: tea
port: 80
8 changes: 8 additions & 0 deletions tests/longevity/manifests/cafe-secret.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
apiVersion: v1
kind: Secret
metadata:
name: cafe-secret
type: kubernetes.io/tls
data:
tls.crt: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUNzakNDQVpvQ0NRQzdCdVdXdWRtRkNEQU5CZ2txaGtpRzl3MEJBUXNGQURBYk1Sa3dGd1lEVlFRRERCQmoKWVdabExtVjRZVzF3YkdVdVkyOXRNQjRYRFRJeU1EY3hOREl4TlRJek9Wb1hEVEl6TURjeE5ESXhOVEl6T1ZvdwpHekVaTUJjR0ExVUVBd3dRWTJGbVpTNWxlR0Z0Y0d4bExtTnZiVENDQVNJd0RRWUpLb1pJaHZjTkFRRUJCUUFECmdnRVBBRENDQVFvQ2dnRUJBTHFZMnRHNFc5aStFYzJhdnV4Q2prb2tnUUx1ek10U1Rnc1RNaEhuK3ZRUmxIam8KVzFLRnMvQVdlS25UUStyTWVKVWNseis4M3QwRGtyRThwUisxR2NKSE50WlNMb0NEYUlRN0Nhck5nY1daS0o4Qgo1WDNnVS9YeVJHZjI2c1REd2xzU3NkSEQ1U2U3K2Vab3NPcTdHTVF3K25HR2NVZ0VtL1Q1UEMvY05PWE0zZWxGClRPL051MStoMzROVG9BbDNQdTF2QlpMcDNQVERtQ0thaEROV0NWbUJQUWpNNFI4VERsbFhhMHQ5Z1o1MTRSRzUKWHlZWTNtdzZpUzIrR1dYVXllMjFuWVV4UEhZbDV4RHY0c0FXaGRXbElweHlZQlNCRURjczN6QlI2bFF1OWkxZAp0R1k4dGJ3blVmcUVUR3NZdWxzc05qcU95V1VEcFdJelhibHhJZVVDQXdFQUFUQU5CZ2txaGtpRzl3MEJBUXNGCkFBT0NBUUVBcjkrZWJ0U1dzSnhLTGtLZlRkek1ISFhOd2Y5ZXFVbHNtTXZmMGdBdWVKTUpUR215dG1iWjlpbXQKL2RnWlpYVE9hTElHUG9oZ3BpS0l5eVVRZVdGQ2F0NHRxWkNPVWRhbUloOGk0Q1h6QVJYVHNvcUNOenNNLzZMRQphM25XbFZyS2lmZHYrWkxyRi8vblc0VVNvOEoxaCtQeDljY0tpRDZZU0RVUERDRGh1RUtFWXcvbHpoUDJVOXNmCnl6cEJKVGQ4enFyM3paTjNGWWlITmgzYlRhQS82di9jU2lyamNTK1EwQXg4RWpzQzYxRjRVMTc4QzdWNWRCKzQKcmtPTy9QNlA0UFlWNTRZZHMvRjE2WkZJTHFBNENCYnExRExuYWRxamxyN3NPbzl2ZzNnWFNMYXBVVkdtZ2todAp6VlZPWG1mU0Z4OS90MDBHUi95bUdPbERJbWlXMGc9PQotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg==
tls.key: LS0tLS1CRUdJTiBQUklWQVRFIEtFWS0tLS0tCk1JSUV2UUlCQURBTkJna3Foa2lHOXcwQkFRRUZBQVNDQktjd2dnU2pBZ0VBQW9JQkFRQzZtTnJSdUZ2WXZoSE4KbXI3c1FvNUtKSUVDN3N6TFVrNExFeklSNS9yMEVaUjQ2RnRTaGJQd0ZuaXAwMFBxekhpVkhKYy92TjdkQTVLeApQS1VmdFJuQ1J6YldVaTZBZzJpRU93bXF6WUhGbVNpZkFlVjk0RlAxOGtSbjl1ckV3OEpiRXJIUncrVW51L25tCmFMRHF1eGpFTVBweGhuRklCSnYwK1R3djNEVGx6TjNwUlV6dnpidGZvZCtEVTZBSmR6N3Rid1dTNmR6MHc1Z2kKbW9RelZnbFpnVDBJek9FZkV3NVpWMnRMZllHZWRlRVJ1VjhtR041c09va3R2aGxsMU1udHRaMkZNVHgySmVjUQo3K0xBRm9YVnBTS2NjbUFVZ1JBM0xOOHdVZXBVTHZZdFhiUm1QTFc4SjFINmhFeHJHTHBiTERZNmpzbGxBNlZpCk0xMjVjU0hsQWdNQkFBRUNnZ0VBQnpaRE50bmVTdWxGdk9HZlFYaHRFWGFKdWZoSzJBenRVVVpEcUNlRUxvekQKWlV6dHdxbkNRNlJLczUyandWNTN4cU9kUU94bTNMbjNvSHdNa2NZcEliWW82MjJ2dUczYnkwaVEzaFlsVHVMVgpqQmZCcS9UUXFlL2NMdngvSkczQWhFNmJxdFRjZFlXeGFmTmY2eUtpR1dzZk11WVVXTWs4MGVJVUxuRmZaZ1pOCklYNTlSOHlqdE9CVm9Sa3hjYTVoMW1ZTDFsSlJNM3ZqVHNHTHFybmpOTjNBdWZ3ZGRpK1VDbGZVL2l0K1EvZkUKV216aFFoTlRpNVFkRWJLVStOTnYvNnYvb2JvandNb25HVVBCdEFTUE05cmxFemIralQ1WHdWQjgvLzRGY3VoSwoyVzNpcjhtNHVlQ1JHSVlrbGxlLzhuQmZ0eVhiVkNocVRyZFBlaGlPM1FLQmdRRGlrR3JTOTc3cjg3Y1JPOCtQClpoeXltNXo4NVIzTHVVbFNTazJiOTI1QlhvakpZL2RRZDVTdFVsSWE4OUZKZnNWc1JRcEhHaTFCYzBMaTY1YjIKazR0cE5xcVFoUmZ1UVh0UG9GYXRuQzlPRnJVTXJXbDVJN0ZFejZnNkNQMVBXMEg5d2hPemFKZUdpZVpNYjlYTQoybDdSSFZOcC9jTDlYbmhNMnN0Q1lua2Iwd0tCZ1FEUzF4K0crakEyUVNtRVFWNXA1RnRONGcyamsyZEFjMEhNClRIQ2tTazFDRjhkR0Z2UWtsWm5ZbUt0dXFYeXNtekJGcnZKdmt2eUhqbUNYYTducXlpajBEdDZtODViN3BGcVAKQWxtajdtbXI3Z1pUeG1ZMXBhRWFLMXY4SDNINGtRNVl3MWdrTWRybVJHcVAvaTBGaDVpaGtSZS9DOUtGTFVkSQpDcnJjTzhkUVp3S0JnSHA1MzRXVWNCMVZibzFlYStIMUxXWlFRUmxsTWlwRFM2TzBqeWZWSmtFb1BZSEJESnp2ClIrdzZLREJ4eFoyWmJsZ05LblV0YlhHSVFZd3lGelhNcFB5SGxNVHpiZkJhYmJLcDFyR2JVT2RCMXpXM09PRkgKcmppb21TUm1YNmxhaDk0SjRHU0lFZ0drNGw1SHhxZ3JGRDZ2UDd4NGRjUktJWFpLZ0w2dVJSSUpBb0dCQU1CVApaL2p5WStRNTBLdEtEZHUrYU9ORW4zaGxUN3hrNXRKN3NBek5rbWdGMU10RXlQUk9Xd1pQVGFJbWpRbk9qbHdpCldCZ2JGcXg0M2ZlQ1Z4ZXJ6V3ZEM0txaWJVbWpCTkNMTGtYeGh3ZEVteFQwVit2NzZGYzgwaTNNYVdSNnZZR08KditwVVovL0F6UXdJcWZ6dlVmV2ZxdStrMHlhVXhQOGNlcFBIRyt0bEFvR0FmQUtVVWhqeFU0Ym5vVzVwVUhKegpwWWZXZXZ5TW54NWZyT2VsSmRmNzlvNGMvMHhVSjh1eFBFWDFkRmNrZW96dHNpaVFTNkN6MENRY09XVWxtSkRwCnVrdERvVzM3VmNSQU1BVjY3NlgxQVZlM0UwNm5aL2g2Tkd4Z28rT042Q3pwL0lkMkJPUm9IMFAxa2RjY1NLT3kKMUtFZlNnb1B0c1N1eEpBZXdUZmxDMXc9Ci0tLS0tRU5EIFBSSVZBVEUgS0VZLS0tLS0K
81 changes: 81 additions & 0 deletions tests/longevity/manifests/cafe.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: coffee
spec:
replicas: 3
selector:
matchLabels:
app: coffee
template:
metadata:
labels:
app: coffee
spec:
containers:
- name: coffee
image: nginxdemos/nginx-hello:plain-text
ports:
- containerPort: 8080
readinessProbe:
httpGet:
path: /
port: 8080
lifecycle:
preStop:
exec:
command: ["/bin/sleep", "15"]
---
apiVersion: v1
kind: Service
metadata:
name: coffee
spec:
ports:
- port: 80
targetPort: 8080
protocol: TCP
name: http
selector:
app: coffee
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: tea
spec:
replicas: 3
selector:
matchLabels:
app: tea
template:
metadata:
labels:
app: tea
spec:
containers:
- name: tea
image: nginxdemos/nginx-hello:plain-text
ports:
- containerPort: 8080
readinessProbe:
httpGet:
path: /
port: 8080
lifecycle:
preStop:
exec:
command: ["/bin/sleep", "15"]
---
apiVersion: v1
kind: Service
metadata:
name: tea
spec:
ports:
- port: 80
targetPort: 8080
protocol: TCP
name: http
selector:
app: tea
92 changes: 92 additions & 0 deletions tests/longevity/manifests/cronjob.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
apiVersion: v1
kind: ServiceAccount
metadata:
name: rollout-mgr
namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: rollout-mgr
namespace: default
rules:
- apiGroups:
- "apps"
resources:
- deployments
verbs:
- patch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: rollout-mgr
namespace: default
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: rollout-mgr
subjects:
- kind: ServiceAccount
name: rollout-mgr
namespace: default
---
apiVersion: batch/v1
kind: CronJob
metadata:
name: coffee-rollout-mgr
namespace: default
spec:
schedule: "* */6 * * *" # every minute every 6 hours
jobTemplate:
spec:
template:
spec:
serviceAccountName: rollout-mgr
containers:
- name: coffee-rollout-mgr
image: curlimages/curl:8.3.0
imagePullPolicy: IfNotPresent
command:
- /bin/sh
- -c
args:
- |
TOKEN=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)
RESTARTED_AT=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
curl -X PATCH -s -k -v \
-H "Authorization: Bearer $TOKEN" \
-H "Content-type: application/merge-patch+json" \
--data-raw "{\"spec\": {\"template\": {\"metadata\": {\"annotations\": {\"kubectl.kubernetes.io/restartedAt\": \"$RESTARTED_AT\"}}}}}" \
"https://kubernetes/apis/apps/v1/namespaces/default/deployments/coffee?fieldManager=kubectl-rollout" 2>&1
restartPolicy: OnFailure
---
apiVersion: batch/v1
kind: CronJob
metadata:
name: tea-rollout-mgr
namespace: default
spec:
schedule: "* 3,9,15,21 * * *" # every minute every 6 hours, 3 hours apart from coffee
jobTemplate:
spec:
template:
spec:
serviceAccountName: rollout-mgr
containers:
- name: coffee-rollout-mgr
image: curlimages/curl:8.3.0
imagePullPolicy: IfNotPresent
command:
- /bin/sh
- -c
args:
- |
TOKEN=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)
RESTARTED_AT=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
curl -X PATCH -s -k -v \
-H "Authorization: Bearer $TOKEN" \
-H "Content-type: application/merge-patch+json" \
--data-raw "{\"spec\": {\"template\": {\"metadata\": {\"annotations\": {\"kubectl.kubernetes.io/restartedAt\": \"$RESTARTED_AT\"}}}}}" \
"https://kubernetes/apis/apps/v1/namespaces/default/deployments/tea?fieldManager=kubectl-rollout" 2>&1
restartPolicy: OnFailure
Loading