Skip to content

Commit

Permalink
Merge pull request #70 from stefanprodan/append-headers
Browse files Browse the repository at this point in the history
Allow headers to be appended to HTTP requests
  • Loading branch information
stefanprodan authored Mar 4, 2019
2 parents 25fbe7e + 3411a6a commit 535a92e
Show file tree
Hide file tree
Showing 10 changed files with 271 additions and 38 deletions.
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,12 @@

All notable changes to this project are documented in this file.

## Unreleased

#### Features

- Allow headers to be appended to HTTP requests [#70](https://github.com/stefanprodan/flagger/pull/70)

## 0.7.0 (2019-02-28)

Adds support for custom metric checks, HTTP timeouts and HTTP retries
Expand Down
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,11 +106,11 @@ spec:
# HTTP rewrite (optional)
rewrite:
uri: /
# timeout for HTTP requests (optional)
timeout: 5s
# retry policy when a HTTP request fails (optional)
retries:
attempts: 3
# Envoy timeout and retry policy (optional)
appendHeaders:
x-envoy-upstream-rq-timeout-ms: "15000"
x-envoy-max-retries: "10"
x-envoy-retry-on: "gateway-error,connect-failure,refused-stream"
# promote the canary without analysing it (default false)
skipAnalysis: false
# define the canary analysis timing and KPIs
Expand Down
12 changes: 8 additions & 4 deletions artifacts/canaries/canary.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -26,15 +26,19 @@ spec:
# Istio virtual service host names (optional)
hosts:
- app.istio.weavedx.com
# Istio virtual service HTTP match conditions (optional)
# HTTP match conditions (optional)
match:
- uri:
prefix: /
# Istio virtual service HTTP rewrite (optional)
# HTTP rewrite (optional)
rewrite:
uri: /
# for emergency cases when you want to ship changes
# in production without analysing the canary
# Envoy timeout and retry policy (optional)
appendHeaders:
x-envoy-upstream-rq-timeout-ms: "15000"
x-envoy-max-retries: "10"
x-envoy-retry-on: "gateway-error,connect-failure,refused-stream"
# promote the canary without analysing it (default false)
skipAnalysis: false
canaryAnalysis:
# schedule interval (default 60s)
Expand Down
1 change: 1 addition & 0 deletions docs/gitbook/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,3 +17,4 @@
## Tutorials

* [Canaries with Helm charts and GitOps](tutorials/canary-helm-gitops.md)
* [Zero downtime deployments](tutorials/zero-downtime-deployments.md)
26 changes: 14 additions & 12 deletions docs/gitbook/how-it-works.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,12 +46,11 @@ spec:
# HTTP rewrite (optional)
rewrite:
uri: /
# timeout for HTTP requests (optional)
timeout: 5s
# retry policy when a HTTP request fails (optional)
retries:
attempts: 3
perTryTimeout: 3s
# Envoy timeout and retry policy (optional)
appendHeaders:
x-envoy-upstream-rq-timeout-ms: "15000"
x-envoy-max-retries: "10"
x-envoy-retry-on: "gateway-error,connect-failure,refused-stream"
# promote the canary without analysing it (default false)
skipAnalysis: false
# define the canary analysis timing and KPIs
Expand Down Expand Up @@ -138,8 +137,11 @@ metadata:
# HTTP rewrite (optional)
rewrite:
uri: /
# timeout for HTTP requests (optional)
timeout: 5s
# Envoy timeout and retry policy (optional)
appendHeaders:
x-envoy-upstream-rq-timeout-ms: "15000"
x-envoy-max-retries: "10"
x-envoy-retry-on: "gateway-error,connect-failure,refused-stream"
# retry policy when a HTTP request fails (optional)
retries:
attempts: 3
Expand Down Expand Up @@ -174,10 +176,10 @@ spec:
prefix: /
rewrite:
uri: /
timeout: 5s
retries:
attempts: 3
perTryTimeout: 3s
appendHeaders:
x-envoy-upstream-rq-timeout-ms: "15000"
x-envoy-max-retries: "10"
x-envoy-retry-on: "gateway-error,connect-failure,refused-stream"
route:
- destination:
host: frontend-primary
Expand Down
206 changes: 206 additions & 0 deletions docs/gitbook/tutorials/zero-downtime-deployments.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,206 @@
# Zero downtime deployments

This is a list of things you should consider when dealing with a high traffic production environment if you want to
minimise the impact of rolling updates and downscaling.

### Deployment strategy

Limit the number of unavailable pods during a rolling update:

```yaml
apiVersion: apps/v1
kind: Deployment
spec:
progressDeadlineSeconds: 120
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 0
```
The default progress deadline for a deployment is ten minutes.
You should consider adjusting this value to make the deployment process fail faster.
### Liveness health check
You application should expose a HTTP endpoint that Kubernetes can call to determine if
your app transitioned to a broken state from which it can't recover and needs to be restarted.
```yaml
readinessProbe:
exec:
command:
- wget
- --quiet
- --tries=1
- --timeout=4
- --spider
- http://localhost:8080/healthz
timeoutSeconds: 5
initialDelaySeconds: 5
```
If you've enabled mTLS, you'll have to use `exec` for liveness and readiness checks since
kubelet is not part of the service mesh and doesn't have access to the TLS cert.

### Readiness health check

You application should expose a HTTP endpoint that Kubernetes can call to determine if
your app is ready to receive traffic.

```yaml
livenessProbe:
exec:
command:
- wget
- --quiet
- --tries=1
- --timeout=4
- --spider
- http://localhost:8080/readyz
timeoutSeconds: 5
initialDelaySeconds: 5
periodSeconds: 5
```

If your app depends on external services, you should check if those services are available before allowing Kubernetes
to route traffic to an app instance. Keep in mind that the Envoy sidecar can have a slower startup than your app.
This means that on application start you should retry for at least a couple of seconds any external connection.

### Graceful shutdown

Before a pod gets terminated, Kubernetes sends a `SIGTERM` signal to every container and waits for period of
time (30s by default) for all containers to exit gracefully. If your app doesn't handle the `SIGTERM` signal or if it
doesn't exit within the grace period, Kubernetes will kill the container and any inflight requests that your app is
processing will fail.

```yaml
apiVersion: apps/v1
kind: Deployment
spec:
template:
spec:
terminationGracePeriodSeconds: 60
containers:
- name: app
lifecycle:
preStop:
exec:
command:
- sleep
- "10"
```

Your app container should have a `preStop` hook that delays the container shutdown.
This will allow the service mesh to drain the traffic and remove this pod from all other Envoy sidecars before your app
becomes unavailable.

### Delay Envoy shutdown

Even if your app reacts to `SIGTERM` and tries to complete the inflight requests before shutdown, that
doesn't mean that the response will make it back to the caller. If the Envoy sidecar shuts down before your app, then
the caller will receive a 503 error.

To mitigate this issue you can add a `preStop` hook to the Istio proxy and wait for the main app to exist before Envoy exists.

```bash
#!/bin/bash
set -e
if ! pidof envoy &>/dev/null; then
exit 0
fi
if ! pidof pilot-agent &>/dev/null; then
exit 0
fi
while [ $(netstat -plunt | grep tcp | grep -v envoy | wc -l | xargs) -ne 0 ]; do
sleep 1;
done
exit 0
```

You'll have to build your own Envoy docker image with the above script and
modify the Istio injection webhook with the `preStop` directive.

Thanks to Stono for his excellent [tips](https://github.com/istio/istio/issues/12183) on minimising 503s.

### Resource requests and limits

Setting CPU and memory requests/limits for all workloads is a mandatory step if you're running a production system.
Without limits your nodes could run out of memory or become unresponsive due to CPU exhausting.
Without CPU and memory requests,
the Kubernetes scheduler will not be able to make decisions about which nodes to place pods on.

```yaml
apiVersion: apps/v1
kind: Deployment
spec:
template:
spec:
containers:
- name: app
resources:
limits:
cpu: 1000m
memory: 1Gi
requests:
cpu: 100m
memory: 128Mi
```

Note that without resource requests the horizontal pod autoscaler can't determine when to scale your app.

### Autoscaling

A production environment should be able to handle traffic bursts without impacting the quality of service.
This can be achieved with Kubernetes autoscaling capabilities.
Autoscaling in Kubernetes has two dimensions: the Cluster Autoscaler that deals with node scaling operations and
the Horizontal Pod Autoscaler that automatically scales the number of pods in a deployment.

```yaml
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: app
minReplicas: 2
maxReplicas: 4
metrics:
- type: Resource
resource:
name: cpu
targetAverageValue: 900m
- type: Resource
resource:
name: memory
targetAverageValue: 768Mi
```

The above HPA ensures your app will be scaled up before the pods reach the CPU or memory limits.

### Ingress retries

To minimise the impact of downscaling operations you can make use of Envoy retry capabilities.

```yaml
apiVersion: flagger.app/v1alpha3
kind: Canary
spec:
service:
port: 9898
gateways:
- public-gateway.istio-system.svc.cluster.local
hosts:
- app.example.com
appendHeaders:
x-envoy-upstream-rq-timeout-ms: "15000"
x-envoy-max-retries: "10"
x-envoy-retry-on: "gateway-error,connect-failure,refused-stream"
```

When the HPA scales down your app, your users could run into 503 errors.
The above configuration will make Envoy retry the HTTP requests that failed due to gateway errors.
15 changes: 8 additions & 7 deletions pkg/apis/flagger/v1alpha3/types.go
Original file line number Diff line number Diff line change
Expand Up @@ -109,13 +109,14 @@ type CanaryStatus struct {
// CanaryService is used to create ClusterIP services
// and Istio Virtual Service
type CanaryService struct {
Port int32 `json:"port"`
Gateways []string `json:"gateways"`
Hosts []string `json:"hosts"`
Match []istiov1alpha3.HTTPMatchRequest `json:"match,omitempty"`
Rewrite *istiov1alpha3.HTTPRewrite `json:"rewrite,omitempty"`
Timeout string `json:"timeout,omitempty"`
Retries *istiov1alpha3.HTTPRetry `json:"retries,omitempty"`
Port int32 `json:"port"`
Gateways []string `json:"gateways"`
Hosts []string `json:"hosts"`
Match []istiov1alpha3.HTTPMatchRequest `json:"match,omitempty"`
Rewrite *istiov1alpha3.HTTPRewrite `json:"rewrite,omitempty"`
Timeout string `json:"timeout,omitempty"`
Retries *istiov1alpha3.HTTPRetry `json:"retries,omitempty"`
AppendHeaders map[string]string `json:"appendHeaders,omitempty"`
}

// CanaryAnalysis is used to describe how the analysis should be done
Expand Down
7 changes: 7 additions & 0 deletions pkg/apis/flagger/v1alpha3/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

22 changes: 12 additions & 10 deletions pkg/controller/router.go
Original file line number Diff line number Diff line change
Expand Up @@ -203,11 +203,12 @@ func (c *CanaryRouter) syncVirtualService(cd *flaggerv1.Canary) error {
Gateways: gateways,
Http: []istiov1alpha3.HTTPRoute{
{
Match: cd.Spec.Service.Match,
Rewrite: cd.Spec.Service.Rewrite,
Timeout: cd.Spec.Service.Timeout,
Retries: cd.Spec.Service.Retries,
Route: route,
Match: cd.Spec.Service.Match,
Rewrite: cd.Spec.Service.Rewrite,
Timeout: cd.Spec.Service.Timeout,
Retries: cd.Spec.Service.Retries,
AppendHeaders: cd.Spec.Service.AppendHeaders,
Route: route,
},
},
}
Expand Down Expand Up @@ -319,11 +320,12 @@ func (c *CanaryRouter) SetRoutes(
vsCopy := vs.DeepCopy()
vsCopy.Spec.Http = []istiov1alpha3.HTTPRoute{
{
Match: cd.Spec.Service.Match,
Rewrite: cd.Spec.Service.Rewrite,
Timeout: cd.Spec.Service.Timeout,
Retries: cd.Spec.Service.Retries,
Route: []istiov1alpha3.DestinationWeight{primary, canary},
Match: cd.Spec.Service.Match,
Rewrite: cd.Spec.Service.Rewrite,
Timeout: cd.Spec.Service.Timeout,
Retries: cd.Spec.Service.Retries,
AppendHeaders: cd.Spec.Service.AppendHeaders,
Route: []istiov1alpha3.DestinationWeight{primary, canary},
},
}

Expand Down
4 changes: 4 additions & 0 deletions test/e2e-tests.sh
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,10 @@ spec:
progressDeadlineSeconds: 60
service:
port: 9898
appendHeaders:
x-envoy-upstream-rq-timeout-ms: "15000"
x-envoy-max-retries: "10"
x-envoy-retry-on: "gateway-error,connect-failure,refused-stream"
canaryAnalysis:
interval: 15s
threshold: 15
Expand Down

0 comments on commit 535a92e

Please sign in to comment.