Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stabilization and move toward 1.0 #197

Merged
merged 40 commits into from
Oct 5, 2017
Merged

Conversation

joshrosso
Copy link

@joshrosso joshrosso commented Sep 7, 2017

This PR represents our move to 1.0 of the alb-ingress-controller. Specifically addressing clean up and stabilization.

  • Docs updated to reflect HBR
  • Walkthrough example of new alb-ingress-controller
  • Address all panic-related issues
  • Exclude master nodes from ALB targets
  • Include AWS background sync (Add configurable upstream ALB background sync #185)
  • Include host based routing

Issue burn-down list

matt-deboer and others added 4 commits August 21, 2017 11:10
- Introduced reconciled bool that flags whether the ALBIngress
resource was synced successfully. Where successful is qualified by all
AWS resource being provisioned or modified as expected.

- The primary use of this bool is to ensure that UpdateIngressStatus
only operates on ingresses that experienced successful reconciliation.
Thus when projects like kubernetes-incubator/external-dns operate on
the ingress resources, they aren't attempting to setup DNS for a broken
ALBIngress.

- Resolves #191
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.07%) to 30.927% when pulling fd79f6a on continued_stabilization into 98ee1d7 on master.

- Excludes all master nodes from node list returned by the lister. This
prevents masters from being added as part of an ALB's targetgroup.

- The mechanism to determine whether a node is a master is based on the
presence of the label "node-role.kubernetes.io/master". The use of this
label is based on the discussion at
kubernetes/kubernetes#41835.

- Resolves #92
@antoniobeyah
Copy link

Does this work include getting the chart on quay.io updated?

@joshrosso joshrosso mentioned this pull request Sep 13, 2017
@joshrosso
Copy link
Author

@antoniobeyah it does now, thanks.

@joshrosso
Copy link
Author

Checking off ignore master nodes, as only workers are now being picked up.

1 worker 2 master cluster:
image

targetgroups on ALB only contain worker:
image

Again, masters are qualified by the node-role.kubernetes.io/master label.

$ kubectl get nodes -L=node-role.kubernetes.io/master
NAME                                        STATUS    AGE       VERSION           MASTER
ip-10-0-29-222.us-east-2.compute.internal   Ready     1h        v1.7.2+coreos.0   
ip-10-0-66-11.us-east-2.compute.internal    Ready     1h        v1.7.2+coreos.0   <none>
ip-10-0-7-65.us-east-2.compute.internal     Ready     1h        v1.7.2+coreos.0   

- When a new ingress assembly fails, it could be an existing ingress
resource that was previously reconciled. We use the Reconciled var to
signify whether the ingress resource has been reconciled correctly.

- This commit ensures that if the assembly fails we mark reconcile as
false to signify the reconcile will fail (although in reality it's
never attempted).

- This will also ensure an ingress status update is not attempted.

- Relates to #191
@joshrosso
Copy link
Author

Status updates no longer occur when reconcile fails, but will continue once it succeeds again.

Example in failure:

E0913 11:04:09.946830   22408 albingress.go:158] [ALB-INGRESS] [echoserver/echoserver] [ERROR]: Error instantiating target groups: Unable to find the echoserver/echoserver2 service
I0913 11:04:09.946901   22408 controller.go:432] ingress backend successfully reloaded...
W0913 11:04:13.278950   22408 controller.go:869] error obtaining service endpoints: service echoserver/echoserver2 does not exist
I0913 11:04:13.279646   22408 controller.go:423] backend reload required

Once issue is resolved in ingress resource:

I0913 11:05:34.215193   22408 rule.go:127] [ALB-INGRESS] [echoserver/echoserver] [INFO]: Start Rule modification.
I0913 11:05:34.318401   22408 rule.go:134] [ALB-INGRESS] [echoserver/echoserver] [INFO]: Completed Rule modification. Rule Priority: "1" | Condition: [{    Field: "host-header",    Values: ["echo.josh-test-dns.com"]  },{    Field: "path-pattern",    Values: ["/"]  }]
I0913 11:05:34.318546   22408 controller.go:432] ingress backend successfully reloaded...
I0913 11:05:51.757911   22408 event.go:218] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"echoserver", Name:"echoserver", UID:"bee86050-989a-11e7-86eb-06dbb374bd68", APIVersion:"extensions", ResourceVersion:"22530", FieldPath:""}): type: 'Normal' reason: 'UPDATE' Ingress echoserver/echoserver
I0913 11:05:54.301478   22408 status.go:347] updating Ingress echoserver/echoserver status to [{ test-echoserver-echoserver-2ad7-1990379304.us-east-2.elb.amazonaws.com}]
I0913 11:05:54.458244   22408 event.go:218] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"echoserver", Name:"echoserver", UID:"bee86050-989a-11e7-86eb-06dbb374bd68", APIVersion:"extensions", ResourceVersion:"22545", FieldPath:""}): type: 'Normal' reason: 'UPDATE' Ingress echoserver/echoserver

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.1%) to 30.862% when pulling 98ea294 on continued_stabilization into 98ee1d7 on master.

- DEBUG is a higher int than INFO thus the evaluation of whether DEBUG
should print should be based on whether LOG_LEVEL is greater than INFO.
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.1%) to 30.862% when pulling fcc4c06 on continued_stabilization into 98ee1d7 on master.

- Forcing an OnUpdate to occur ensures things like target groups and
update and generally will be another check that desired and current are
as expected in each ingress.

- Resolves #204
@joshrosso
Copy link
Author

Targetgroups (and other AWS checks) will now update on at least a 3 minute interval. We do this by ensure that if the OnUpdate callback hasn't fired in 180 seconds, it is called from the controller in a separate go routine. Logs are as follows.

[ALB-INGRESS] [controller] [DEBUG]: Forcing ingress update as update hasn't been seen in 3 minutes.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.1%) to 30.862% when pulling 53e2f8b on continued_stabilization into 98ee1d7 on master.

- Due to issue acquiring lock with multiple controller, we need to
update the ingress library. With this upgrade comes a new interface
method GetEndpoints(). Since we do not utilize a default endpoint, it
just returns an empty Endpoint struct.

- Resolves #203
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.1%) to 30.862% when pulling b30d8d2 on continued_stabilization into 98ee1d7 on master.

@antoniobeyah
Copy link

@joshrosso Excellent work here. Two questions:

  • Are PRs still currently being accepted? Some of the items listed below I think I can tackle but I want to make sure it is worth the effort.
  • Do you have an idea of a when you would expect 1.0 to be ready?

Things I would like to see as an MVP for 1.0 as these are essentially showstoppers from it being used:

Also, I ran into a bug where the controller did not detect changes to annotations and it required a controller restart in order to get it to correct. I haven't tested against these latest commits but will give that a shot and file an issue if it still remains.

@joshrosso
Copy link
Author

Are PRs still currently being accepted? Some of the items listed below I think I can tackle but I want to make sure it is worth the effort.

Definitely. I've pinged you in k8s slack. Let's discuss more granular when you have a moment.

Do you have an idea of a when you would expect 1.0 to be ready?

Before 09/26 is ideal for me as I'll be on vacation after that :)

All issues you've mentioned are reasonable, but I'd ask that you take on #90 and possible #198 if possible.

@joshrosso
Copy link
Author

Automated subnet detect for qualified subnets (based on tags) is now implemented in 4f00ac6.

joshrosso and others added 5 commits September 21, 2017 22:07
- When we assemble for AWS, we should mark the reconciled flag as true
representing that the ingress resource was previously reconciled. If it
fails in a future reconcile, the value will change to false.

- Resolves: #220
- Previously, resources would be deleted from AWS but there was no
guarantee the ALBIngress would be removed from the list of ALBIngresses.
Eventually it would be removed when an AWS sync occured.

- This commit checks after reconcile for any LoadBalancers that have the
deleted flag set to true. When this is the case it's removed from the
ALBIngress list.
@coveralls
Copy link

Coverage Status

Coverage decreased (-9.9%) to 21.085% when pulling 2f7fd30 on continued_stabilization into 98ee1d7 on master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-9.9%) to 21.085% when pulling 606a4ba on continued_stabilization into 98ee1d7 on master.

@joshrosso
Copy link
Author

I want to get the rest of the issue burn down list covered then merge into master cutting an 1.0-alpha.3 build.

Bugfix for #225: iterate over all instances in all reservations.
- Move the security group retry logic into one location.

- Do proper error checking to ensure an SG modification should be
retried. Currently this is the case when a DependencyViolation is
returned.
@coveralls
Copy link

Coverage Status

Coverage decreased (-9.9%) to 21.065% when pulling db01a79 on continued_stabilization into 98ee1d7 on master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-10.0%) to 20.996% when pulling db01a79 on continued_stabilization into 98ee1d7 on master.

- Reference to the extensions.ingress object is passed when creating a
new ALBIngress object. This ensures it's attached and, as a side-effect,
ensures events are not attached to the appropriate ingress resource.

- Resolves #230
and #216
@coveralls
Copy link

Coverage Status

Coverage decreased (-10.006%) to 20.987% when pulling 0047294 on continued_stabilization into 98ee1d7 on master.

- Previously we looked for the presence of a certificate ARN to
determine whether the protocol should be set to HTTP or HTTPS. Now we
look at the protocol set in the scheme annotation to determine which it
should be.

- Resolves #194
@joshrosso
Copy link
Author

Bringing this PR into master now assuming tests pass. As noted at the top of the README, we'll cut an alpha slice and would all issues should note the version being used.

@joshrosso joshrosso changed the title WIP: Stabilization and move toward 1.0 Stabilization and move toward 1.0 Oct 4, 2017
@coveralls
Copy link

Coverage Status

Coverage decreased (-10.0%) to 21.04% when pulling 68c3335 on continued_stabilization into 98ee1d7 on master.

- Now look for the presence of the security-group annotation key to
see whether it's present. Previously we were getting false negatives and
the controller always believed it needed to manage the SGs.

- Resolves: #232
@coveralls
Copy link

Coverage Status

Coverage decreased (-10.0%) to 21.04% when pulling 287a772 on continued_stabilization into 98ee1d7 on master.

@joshrosso joshrosso merged commit 3235087 into master Oct 5, 2017
bigkraig pushed a commit that referenced this pull request Oct 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants