Stabilization and move toward 1.0 #197

joshrosso · 2017-09-07T20:53:26Z

This PR represents our move to 1.0 of the alb-ingress-controller. Specifically addressing clean up and stabilization.

Docs updated to reflect HBR
Walkthrough example of new alb-ingress-controller
Address all panic-related issues
Exclude master nodes from ALB targets
Include AWS background sync (Add configurable upstream ALB background sync #185)
Include host based routing

Issue burn-down list

- Introduced reconciled bool that flags whether the ALBIngress resource was synced successfully. Where successful is qualified by all AWS resource being provisioned or modified as expected. - The primary use of this bool is to ensure that UpdateIngressStatus only operates on ingresses that experienced successful reconciliation. Thus when projects like kubernetes-incubator/external-dns operate on the ingress resources, they aren't attempting to setup DNS for a broken ALBIngress. - Resolves #191

coveralls · 2017-09-07T21:17:02Z

Coverage decreased (-0.07%) to 30.927% when pulling fd79f6a on continued_stabilization into 98ee1d7 on master.

- Excludes all master nodes from node list returned by the lister. This prevents masters from being added as part of an ALB's targetgroup. - The mechanism to determine whether a node is a master is based on the presence of the label "node-role.kubernetes.io/master". The use of this label is based on the discussion at kubernetes/kubernetes#41835. - Resolves #92

antoniobeyah · 2017-09-08T18:25:12Z

Does this work include getting the chart on quay.io updated?

joshrosso · 2017-09-13T15:15:02Z

@antoniobeyah it does now, thanks.

joshrosso · 2017-09-13T16:26:58Z

Checking off ignore master nodes, as only workers are now being picked up.

1 worker 2 master cluster:

targetgroups on ALB only contain worker:

Again, masters are qualified by the node-role.kubernetes.io/master label.

$ kubectl get nodes -L=node-role.kubernetes.io/master
NAME                                        STATUS    AGE       VERSION           MASTER
ip-10-0-29-222.us-east-2.compute.internal   Ready     1h        v1.7.2+coreos.0   
ip-10-0-66-11.us-east-2.compute.internal    Ready     1h        v1.7.2+coreos.0   <none>
ip-10-0-7-65.us-east-2.compute.internal     Ready     1h        v1.7.2+coreos.0

- When a new ingress assembly fails, it could be an existing ingress resource that was previously reconciled. We use the Reconciled var to signify whether the ingress resource has been reconciled correctly. - This commit ensures that if the assembly fails we mark reconcile as false to signify the reconcile will fail (although in reality it's never attempted). - This will also ensure an ingress status update is not attempted. - Relates to #191

joshrosso · 2017-09-13T17:04:46Z

Status updates no longer occur when reconcile fails, but will continue once it succeeds again.

Example in failure:

E0913 11:04:09.946830   22408 albingress.go:158] [ALB-INGRESS] [echoserver/echoserver] [ERROR]: Error instantiating target groups: Unable to find the echoserver/echoserver2 service
I0913 11:04:09.946901   22408 controller.go:432] ingress backend successfully reloaded...
W0913 11:04:13.278950   22408 controller.go:869] error obtaining service endpoints: service echoserver/echoserver2 does not exist
I0913 11:04:13.279646   22408 controller.go:423] backend reload required

Once issue is resolved in ingress resource:

I0913 11:05:34.215193   22408 rule.go:127] [ALB-INGRESS] [echoserver/echoserver] [INFO]: Start Rule modification.
I0913 11:05:34.318401   22408 rule.go:134] [ALB-INGRESS] [echoserver/echoserver] [INFO]: Completed Rule modification. Rule Priority: "1" | Condition: [{    Field: "host-header",    Values: ["echo.josh-test-dns.com"]  },{    Field: "path-pattern",    Values: ["/"]  }]
I0913 11:05:34.318546   22408 controller.go:432] ingress backend successfully reloaded...
I0913 11:05:51.757911   22408 event.go:218] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"echoserver", Name:"echoserver", UID:"bee86050-989a-11e7-86eb-06dbb374bd68", APIVersion:"extensions", ResourceVersion:"22530", FieldPath:""}): type: 'Normal' reason: 'UPDATE' Ingress echoserver/echoserver
I0913 11:05:54.301478   22408 status.go:347] updating Ingress echoserver/echoserver status to [{ test-echoserver-echoserver-2ad7-1990379304.us-east-2.elb.amazonaws.com}]
I0913 11:05:54.458244   22408 event.go:218] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"echoserver", Name:"echoserver", UID:"bee86050-989a-11e7-86eb-06dbb374bd68", APIVersion:"extensions", ResourceVersion:"22545", FieldPath:""}): type: 'Normal' reason: 'UPDATE' Ingress echoserver/echoserver

coveralls · 2017-09-13T17:18:40Z

Coverage decreased (-0.1%) to 30.862% when pulling 98ea294 on continued_stabilization into 98ee1d7 on master.

- DEBUG is a higher int than INFO thus the evaluation of whether DEBUG should print should be based on whether LOG_LEVEL is greater than INFO.

coveralls · 2017-09-13T23:01:03Z

Coverage decreased (-0.1%) to 30.862% when pulling fcc4c06 on continued_stabilization into 98ee1d7 on master.

- Forcing an OnUpdate to occur ensures things like target groups and update and generally will be another check that desired and current are as expected in each ingress. - Resolves #204

joshrosso · 2017-09-14T03:12:09Z

Targetgroups (and other AWS checks) will now update on at least a 3 minute interval. We do this by ensure that if the OnUpdate callback hasn't fired in 180 seconds, it is called from the controller in a separate go routine. Logs are as follows.

[ALB-INGRESS] [controller] [DEBUG]: Forcing ingress update as update hasn't been seen in 3 minutes.

coveralls · 2017-09-14T03:22:26Z

Coverage decreased (-0.1%) to 30.862% when pulling 53e2f8b on continued_stabilization into 98ee1d7 on master.

- Due to issue acquiring lock with multiple controller, we need to update the ingress library. With this upgrade comes a new interface method GetEndpoints(). Since we do not utilize a default endpoint, it just returns an empty Endpoint struct. - Resolves #203

coveralls · 2017-09-14T03:51:50Z

Coverage decreased (-0.1%) to 30.862% when pulling b30d8d2 on continued_stabilization into 98ee1d7 on master.

antoniobeyah · 2017-09-14T15:09:21Z

@joshrosso Excellent work here. Two questions:

Are PRs still currently being accepted? Some of the items listed below I think I can tackle but I want to make sure it is worth the effort.
Do you have an idea of a when you would expect 1.0 to be ready?

Things I would like to see as an MVP for 1.0 as these are essentially showstoppers from it being used:

Ability to configure access logging [Enable access logging for ALBs created with alb-ingress-controller #90] - This is a basic security requirement
Allow dash in cluster name or explain why its not allowed given cluster name is determined before the controller has been installed [cluster name cannot contain dash (-) #198]
Helm chart on quay.io out of date [quay.io/coreos/alb-ingress-controller-helm seems to be out of date #193] - we performed a workaround and published the chart to our private repository

Also, I ran into a bug where the controller did not detect changes to annotations and it required a controller restart in order to get it to correct. I haven't tested against these latest commits but will give that a shot and file an issue if it still remains.

joshrosso · 2017-09-14T15:47:54Z

Are PRs still currently being accepted? Some of the items listed below I think I can tackle but I want to make sure it is worth the effort.

Definitely. I've pinged you in k8s slack. Let's discuss more granular when you have a moment.

Do you have an idea of a when you would expect 1.0 to be ready?

Before 09/26 is ideal for me as I'll be on vacation after that :)

All issues you've mentioned are reasonable, but I'd ask that you take on #90 and possible #198 if possible.

joshrosso · 2017-09-15T14:55:22Z

Automated subnet detect for qualified subnets (based on tags) is now implemented in 4f00ac6.

- When we assemble for AWS, we should mark the reconciled flag as true representing that the ingress resource was previously reconciled. If it fails in a future reconcile, the value will change to false. - Resolves: #220

- Previously, resources would be deleted from AWS but there was no guarantee the ALBIngress would be removed from the list of ALBIngresses. Eventually it would be removed when an AWS sync occured. - This commit checks after reconcile for any LoadBalancers that have the deleted flag set to true. When this is the case it's removed from the ALBIngress list.

coveralls · 2017-09-29T17:10:56Z

Coverage decreased (-9.9%) to 21.085% when pulling 2f7fd30 on continued_stabilization into 98ee1d7 on master.

…ception

coveralls · 2017-09-29T17:55:31Z

Coverage decreased (-9.9%) to 21.085% when pulling 606a4ba on continued_stabilization into 98ee1d7 on master.

joshrosso · 2017-10-04T15:43:46Z

I want to get the rest of the issue burn down list covered then merge into master cutting an 1.0-alpha.3 build.

Bugfix for #225: iterate over all instances in all reservations.

- Move the security group retry logic into one location. - Do proper error checking to ensure an SG modification should be retried. Currently this is the case when a DependencyViolation is returned.

coveralls · 2017-10-04T20:13:51Z

Coverage decreased (-9.9%) to 21.065% when pulling db01a79 on continued_stabilization into 98ee1d7 on master.

coveralls · 2017-10-04T20:29:37Z

Coverage decreased (-10.0%) to 20.996% when pulling db01a79 on continued_stabilization into 98ee1d7 on master.

- Reference to the extensions.ingress object is passed when creating a new ALBIngress object. This ensures it's attached and, as a side-effect, ensures events are not attached to the appropriate ingress resource. - Resolves #230 and #216

coveralls · 2017-10-04T21:30:05Z

Coverage decreased (-10.006%) to 20.987% when pulling 0047294 on continued_stabilization into 98ee1d7 on master.

- Previously we looked for the presence of a certificate ARN to determine whether the protocol should be set to HTTP or HTTPS. Now we look at the protocol set in the scheme annotation to determine which it should be. - Resolves #194

joshrosso · 2017-10-04T22:43:34Z

Bringing this PR into master now assuming tests pass. As noted at the top of the README, we'll cut an alpha slice and would all issues should note the version being used.

coveralls · 2017-10-05T00:00:17Z

Coverage decreased (-10.0%) to 21.04% when pulling 68c3335 on continued_stabilization into 98ee1d7 on master.

- Now look for the presence of the security-group annotation key to see whether it's present. Previously we were getting false negatives and the controller always believed it needed to manage the SGs. - Resolves: #232

coveralls · 2017-10-05T00:55:53Z

Coverage decreased (-10.0%) to 21.04% when pulling 287a772 on continued_stabilization into 98ee1d7 on master.

Stabilization and move toward 1.0

matt-deboer and others added 4 commits August 21, 2017 11:10

Add configurable upstream ALB background sync

a9c6d2c

apply logger changes

2a04401

Remove unneeded blank identifier in ALBController impl

3129676

joshrosso mentioned this pull request Sep 13, 2017

Route53 entries #202

Closed

Fix LOG_LEVEL check for DEBUG

fcc4c06

- DEBUG is a higher int than INFO thus the evaluation of whether DEBUG should print should be based on whether LOG_LEVEL is greater than INFO.

Force OnUpdate when it hasn't occurred in 3 mins

53e2f8b

- Forcing an OnUpdate to occur ensures things like target groups and update and generally will be another check that desired and current are as expected in each ingress. - Resolves #204

joshrosso mentioned this pull request Sep 14, 2017

Controller invalid memory address or nil pointer dereference #206

Closed

joshrosso mentioned this pull request Sep 21, 2017

AWS Background Sync causes ingress statuses to flush #220

Closed

joshrosso and others added 5 commits September 21, 2017 22:07

Mark ingress as Reconcile=true after assembly from AWS

92ca0a7

- When we assemble for AWS, we should mark the reconciled flag as true representing that the ingress resource was previously reconciled. If it fails in a future reconcile, the value will change to false. - Resolves: #220

Add sleep to delete SG requests

fb3cfab

Fixing loadbalancer tests

ea47432

Ignore goveralls update errors

2f7fd30

Sometimes the loadbalancer is undefined, get rid of a null pointer ex…

606a4ba

…ception

Bugfix: iterate over all instances in all reservations.

57d3fce

joshrosso mentioned this pull request Oct 4, 2017

runtime error possibly caused by 'Unable to find a port defined in the service' #227

Closed

joshrosso mentioned this pull request Oct 4, 2017

Bug: controller fails to start if CLUSTER_NAME > 11 chars #223

Closed

joshrosso added 2 commits October 4, 2017 13:58

Merge pull request #229 from olitheolix/issue-225

39ed37f

Bugfix for #225: iterate over all instances in all reservations.

Clean up SG modification retry logic

db01a79

- Move the security group retry logic into one location. - Do proper error checking to ensure an SG modification should be retried. Currently this is the case when a DependencyViolation is returned.

joshrosso mentioned this pull request Oct 4, 2017

Removing paths fails to reconcile, due to Targetgroup in use #106

Closed

joshrosso added 2 commits October 4, 2017 16:31

Respect protocol set in scheme annotation

facf541

- Previously we looked for the presence of a certificate ARN to determine whether the protocol should be set to HTTP or HTTPS. Now we look at the protocol set in the scheme annotation to determine which it should be. - Resolves #194

README, docs, and Makefile to use 1.0-alpha.3

d4e4e95

Fix port-based tests

68c3335

joshrosso changed the title ~~WIP: Stabilization and move toward 1.0~~ Stabilization and move toward 1.0 Oct 4, 2017

Respect security-groups annotation

287a772

- Now look for the presence of the security-group annotation key to see whether it's present. Previously we were getting false negatives and the controller always believed it needed to manage the SGs. - Resolves: #232

joshrosso merged commit 3235087 into master Oct 5, 2017

bigkraig pushed a commit that referenced this pull request Oct 1, 2018

Merge pull request #197 from coreos/continued_stabilization

9840149

Stabilization and move toward 1.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stabilization and move toward 1.0 #197

Stabilization and move toward 1.0 #197

joshrosso commented Sep 7, 2017 •

edited

Loading

coveralls commented Sep 7, 2017

antoniobeyah commented Sep 8, 2017

joshrosso commented Sep 13, 2017

joshrosso commented Sep 13, 2017

joshrosso commented Sep 13, 2017

coveralls commented Sep 13, 2017

coveralls commented Sep 13, 2017

joshrosso commented Sep 14, 2017

coveralls commented Sep 14, 2017

coveralls commented Sep 14, 2017

antoniobeyah commented Sep 14, 2017

joshrosso commented Sep 14, 2017

joshrosso commented Sep 15, 2017

coveralls commented Sep 29, 2017

coveralls commented Sep 29, 2017

joshrosso commented Oct 4, 2017

coveralls commented Oct 4, 2017

coveralls commented Oct 4, 2017

coveralls commented Oct 4, 2017

joshrosso commented Oct 4, 2017

coveralls commented Oct 5, 2017

coveralls commented Oct 5, 2017

Stabilization and move toward 1.0 #197

Stabilization and move toward 1.0 #197

Conversation

joshrosso commented Sep 7, 2017 • edited Loading

Issue burn-down list

coveralls commented Sep 7, 2017

antoniobeyah commented Sep 8, 2017

joshrosso commented Sep 13, 2017

joshrosso commented Sep 13, 2017

joshrosso commented Sep 13, 2017

coveralls commented Sep 13, 2017

coveralls commented Sep 13, 2017

joshrosso commented Sep 14, 2017

coveralls commented Sep 14, 2017

coveralls commented Sep 14, 2017

antoniobeyah commented Sep 14, 2017

joshrosso commented Sep 14, 2017

joshrosso commented Sep 15, 2017

coveralls commented Sep 29, 2017

coveralls commented Sep 29, 2017

joshrosso commented Oct 4, 2017

coveralls commented Oct 4, 2017

coveralls commented Oct 4, 2017

coveralls commented Oct 4, 2017

joshrosso commented Oct 4, 2017

coveralls commented Oct 5, 2017

coveralls commented Oct 5, 2017

joshrosso commented Sep 7, 2017 •

edited

Loading