Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pre and post rollout hooks #147

Merged
merged 6 commits into from
Apr 14, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,6 @@ workflows:
filters:
branches:
ignore:
- gh-pages
- /gh-pages.*/
- /docs-.*/
- /release-.*/
11 changes: 5 additions & 6 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,11 @@
sudo: required
language: go

branches:
except:
- /gh-pages.*/
- /docs-.*/

go:
- 1.12.x

Expand All @@ -12,13 +17,7 @@ addons:
packages:
- docker-ce

#before_script:
# - go get -u sigs.k8s.io/kind
# - curl https://raw.githubusercontent.com/kubernetes/helm/master/scripts/get | bash
# - curl -LO https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl && chmod +x kubectl && sudo mv kubectl /usr/local/bin/

script:
- set -e
- make test-fmt
- make test-codegen
- go test -race -coverprofile=coverage.txt -covermode=atomic $(go list ./pkg/...)
Expand Down
35 changes: 32 additions & 3 deletions artifacts/flagger/crd.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@ apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
name: canaries.flagger.app
annotations:
helm.sh/resource-policy: keep
spec:
group: flagger.app
version: v1alpha3
Expand Down Expand Up @@ -39,9 +41,9 @@ spec:
properties:
spec:
required:
- targetRef
- service
- canaryAnalysis
- targetRef
- service
- canaryAnalysis
properties:
progressDeadlineSeconds:
type: number
Expand Down Expand Up @@ -119,9 +121,36 @@ spec:
properties:
name:
type: string
type:
type: string
enum:
- ""
- pre-rollout
- rollout
- post-rollout
url:
type: string
format: url
timeout:
type: string
pattern: "^[0-9]+(m|s)"
status:
properties:
phase:
type: string
enum:
- ""
- Initialized
- Progressing
- Succeeded
- Failed
canaryWeight:
type: number
failedChecks:
type: number
iterations:
type: number
lastAppliedSpec:
type: string
lastTransitionTime:
type: string
27 changes: 27 additions & 0 deletions charts/flagger/templates/crd.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -122,10 +122,37 @@ spec:
properties:
name:
type: string
type:
type: string
enum:
- ""
- pre-rollout
- rollout
- post-rollout
url:
type: string
format: url
timeout:
type: string
pattern: "^[0-9]+(m|s)"
status:
properties:
phase:
type: string
enum:
- ""
- Initialized
- Progressing
- Succeeded
- Failed
canaryWeight:
type: number
failedChecks:
type: number
iterations:
type: number
lastAppliedSpec:
type: string
lastTransitionTime:
type: string
{{- end }}
54 changes: 39 additions & 15 deletions docs/gitbook/how-it-works.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

[Flagger](https://github.com/weaveworks/flagger) takes a Kubernetes deployment and optionally
a horizontal pod autoscaler \(HPA\) and creates a series of objects
\(Kubernetes deployments, ClusterIP services and Istio virtual services\) to drive the canary analysis and promotion.
\(Kubernetes deployments, ClusterIP services and Istio or App Mesh virtual services\) to drive the canary analysis and promotion.

![Flagger Canary Process](https://raw.githubusercontent.com/weaveworks/flagger/master/docs/diagrams/flagger-canary-hpa.png)

Expand Down Expand Up @@ -268,16 +268,22 @@ Gated canary promotion stages:
* check primary and canary deployments status
* halt advancement if a rolling update is underway
* halt advancement if pods are unhealthy
* call pre-rollout webhooks are check results
* halt advancement if any hook returned a non HTTP 2xx result
* increment the failed checks counter
* increase canary traffic weight percentage from 0% to 5% (step weight)
* call webhooks and check results
* call rollout webhooks and check results
* check canary HTTP request success rate and latency
* halt advancement if any metric is under the specified threshold
* increment the failed checks counter
* check if the number of failed checks reached the threshold
* route all traffic to primary
* scale to zero the canary deployment and mark it as failed
* call post-rollout webhooks
* post the analysis result to Slack
* wait for the canary deployment to be updated and start over
* increase canary traffic weight by 5% (step weight) till it reaches 50% (max weight)
* halt advancement if any webhook call fails
* halt advancement while canary request success rate is under the threshold
* halt advancement while canary request duration P99 is over the threshold
* halt advancement if the primary or canary deployment becomes unhealthy
Expand All @@ -290,6 +296,8 @@ Gated canary promotion stages:
* route all traffic to primary
* scale to zero the canary deployment
* mark rollout as finished
* call post-rollout webhooks
* post the analysis result to Slack
* wait for the canary deployment to be updated and start over

### Canary Analysis
Expand Down Expand Up @@ -524,39 +532,55 @@ rate reaches the 5% threshold, then the canary fails.
When specifying a query, Flagger will run the promql query and convert the result to float64.
Then it compares the query result value with the metric threshold value.


### Webhooks

The canary analysis can be extended with webhooks.
Flagger will call each webhook URL and determine from the response status code (HTTP 2xx) if the canary is failing or not.
The canary analysis can be extended with webhooks. Flagger will call each webhook URL and
determine from the response status code (HTTP 2xx) if the canary is failing or not.

There are three types of hooks:
* Pre-rollout hooks are executed before routing traffic to canary.
The canary advancement is paused if a pre-rollout hook fails and if the number of failures reach the
threshold the canary will be rollback.
* Rollout hooks are executed during the analysis on each iteration before the metric checks.
If a rollout hook call fails the canary advancement is paused and eventfully rolled back.
* Post-rollout hooks are executed after the canary has been promoted or rolled back.
If a post rollout hook fails the error is logged.

Spec:

```yaml
canaryAnalysis:
webhooks:
- name: integration-test
url: http://int-runner.test:8080/
timeout: 30s
metadata:
test: "all"
token: "16688eb5e9f289f1991c"
- name: db-test
- name: "smoke test"
type: pre-rollout
url: http://migration-check.db/query
timeout: 30s
metadata:
key1: "val1"
key2: "val2"
- name: "load test"
type: rollout
url: http://flagger-loadtester.test/
timeout: 15s
metadata:
cmd: "hey -z 1m -q 5 -c 2 http://podinfo-canary.test:9898/"
- name: "notify"
type: post-rollout
url: http://telegram.bot:8080/
timeout: 5s
metadata:
some: "message"
```

> **Note** that the sum of all webhooks timeouts should be lower than the control loop interval.
> **Note** that the sum of all rollout webhooks timeouts should be lower than the analysis interval.

Webhook payload (HTTP POST):

```json
{
"name": "podinfo",
"namespace": "test",
"namespace": "test",
"phase": "Progressing",
"metadata": {
"test": "all",
"token": "16688eb5e9f289f1991c"
Expand Down Expand Up @@ -676,4 +700,4 @@ webhooks:
```
When the canary analysis starts, the load tester will initiate a [clone_and_start request](https://github.com/naver/ngrinder/wiki/REST-API-PerfTest)
to the nGrinder server and start a new performance test. the load tester will periodically poll the nGrinder server
for the status of the test, and prevent duplicate requests from being sent in subsequent analysis loops.
for the status of the test, and prevent duplicate requests from being sent in subsequent analysis loops.
10 changes: 6 additions & 4 deletions docs/gitbook/install/flagger-install-on-google-cloud.md
Original file line number Diff line number Diff line change
Expand Up @@ -186,7 +186,7 @@ Install cert-manager's CRDs:
```bash
CERT_REPO=https://raw.githubusercontent.com/jetstack/cert-manager

kubectl apply -f ${CERT_REPO}/release-0.6/deploy/manifests/00-crds.yaml
kubectl apply -f ${CERT_REPO}/release-0.7/deploy/manifests/00-crds.yaml
```

Create the cert-manager namespace and disable resource validation:
Expand All @@ -200,10 +200,12 @@ kubectl label namespace cert-manager certmanager.k8s.io/disable-validation=true
Install cert-manager with Helm:

```bash
helm repo update && helm upgrade -i cert-manager \
helm repo add jetstack https://charts.jetstack.io && \
helm repo update && \
helm upgrade -i cert-manager \
--namespace cert-manager \
--version v0.6.0 \
stable/cert-manager
--version v0.7.0 \
jetstack/cert-manager
```

### Istio Gateway TLS setup
Expand Down
20 changes: 17 additions & 3 deletions pkg/apis/flagger/v1alpha3/types.go
Original file line number Diff line number Diff line change
Expand Up @@ -148,11 +148,24 @@ type CanaryMetric struct {
Query string `json:"query,omitempty"`
}

// HookType can be pre, post or during rollout
type HookType string

const (
// RolloutHook execute webhook during the canary analysis
RolloutHook HookType = "rollout"
// PreRolloutHook execute webhook before routing traffic to canary
PreRolloutHook HookType = "pre-rollout"
// PreRolloutHook execute webhook after the canary analysis
PostRolloutHook HookType = "post-rollout"
)

// CanaryWebhook holds the reference to external checks used for canary analysis
type CanaryWebhook struct {
Name string `json:"name"`
URL string `json:"url"`
Timeout string `json:"timeout"`
Type HookType `json:"type"`
Name string `json:"name"`
URL string `json:"url"`
Timeout string `json:"timeout"`
// +optional
Metadata *map[string]string `json:"metadata,omitempty"`
}
Expand All @@ -161,6 +174,7 @@ type CanaryWebhook struct {
type CanaryWebhookPayload struct {
Name string `json:"name"`
Namespace string `json:"namespace"`
Phase CanaryPhase `json:"phase"`
Metadata map[string]string `json:"metadata,omitempty"`
}

Expand Down
Loading