Skip to content
This repository has been archived by the owner on Apr 8, 2024. It is now read-only.

Ensure the ServiceAccount IRSA annotation is in place on each reconciliation loop #4

Closed
wants to merge 6 commits into from

Conversation

diranged
Copy link

@diranged diranged commented Jul 2, 2021

What did I want?

The original code only created the ServiceAccount resource and set the eks.amazonaws.com/role-arn annotation once - at creation time. If that failed for any reason, or became out of date, the controller did not fix it for you. The controllers job should be to constantly ensure that the desired state of the world is correct, including the ServiceAccount link.

What did I do?

A few things.. primarily I added in the "create/update service account" into the handler for state=ready. While I was here, I wanted to introduce some logic to get the current state of the ServiceAccount so that we can make a decision as to whether or not we need to apply a change or not. Finally, in order to do that cleanly, I moved some constants around because there were package import issues.

Why the new constants package?

The original constants were stored in internal/config .. but internal/config/config.go depends on k8s/client.go. That made it impossible to import a constant into the k8s package. I have pulled the true global constants out of the internal/config package and put them into a truly isolated constants package.

Whats up with the EnsureServiceAccount function?

The existing CreateOrUpdateServiceAccount function is good in that it is fairly simple. I wanted to add some logic to prevent making create/update calls if they are unnecessary, and I did not want to introduce that complexity into the existing function.

@diranged diranged force-pushed the fix_loop_3 branch 4 times, most recently from 6bbdaa5 to 242393b Compare July 6, 2021 19:54
return successRequeueIt()
}

// If we got here, something is actually wrong and we need to force a full reconciliation with the 'default'
// handler below.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems like it should be logged if something is actually wrong?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 ive added a warning message..

log.V(1).Info("No change in the incoming policy compare to state of the world(external AWS IAM) policy")
}

if !shouldReconcile {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this block just move inside the if !validation.CompareRole block above and then get rid of the shouldReconcile boolean? I much prefer to avoid boolean "temporary state" like that if I can, I find it very confusing

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - it was there originally, but I actually missed it and had logic after it that was never being executed.. so I thought this was somehow cleaner. It probably isn't though. I'll revert that part.

if err != nil {
r.Recorder.Event(iamRole, v1.EventTypeWarning, string(iammanagerv1alpha1.Error), "Unable to construct iam role name to error "+err.Error())
// It is not clear to me that we want to requeue here - as this is a fairly permanent
// error. Is there a better pattern here?
return ctrl.Result{RequeueAfter: 30 * time.Second}, nil
}

// Determine whether or not this Iamrole is supposed to be connected to a ServiceAccount or not. This is
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this case of "not managing a ServiceAccount" seems weird - like, why use this if you are not managing a ServiceAccount?

we don't have to deal with it here, but it might be a good idea to try and decouple the ServiceAccount management and the IAMRole management even more - e.g. have the flow be something like:

  1. make sure the IAMRole is in the correct state
  2. decide if we need to manage a serviceaccount
  3. if we do, do whatever serviceaccount managment we need to

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. First, to set the stage, the reason they have the "should I manage the SA" flag is because you have to set a specific annotation on the Iamrole resource to tell it to connect the Iamrole to a ServiceAccount. There are many reasons why you might choose to manage an AWS IAMRole and not actually connect it to an SA.

I tried to move the logic for "if I should manage the SA" into the k8s/rbac.go file.. but it created a dependency loop. The k8s/rbac.go code is inside the k8s module. The k8s module is loaded up by the awsapi module so that it can get some information about how it should connect to AWS. So if you are in the k8s module and try to load up the awsapi module, you get a dependency loop. 🤦

I started down the path of trying to separate these ... but I felt like I was unwinding something I shouldn't touch yet :/

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah any sort of config should probably be being passed into the k8s module from here, instead of directly loading it from awsapi.

sounds like a battle for another day

// Determine whether or not this Iamrole is supposed to be connected to a ServiceAccount or not. This is
// used during the 'Ready' and 'default' status case handlers below to ensure the connection between the
// ServiceAccount and IAM Role (if desired).
shouldManageSa, saName := utils.ParseIRSAAnnotation(ctx, iamRole)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe move this down to right above where you check shouldManageSa instead of having it up here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could move this down - but then it has to be duplicated (once inside the iammanagerv1alpha1.Ready case and once in the default case. Do you think that's more clear though and worth the duplication?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I probably would, just to reduce the distance I have to scan to remember how this got set, but that's more of a style thing. I just really don't like these boolean "do something later" style flags, and the less I have to scroll back to figure out how it got set, the better.

@@ -127,7 +127,8 @@ func (r *IamroleReconciler) Reconcile(req ctrl.Request) (ctrl.Result, error) {
func (r *IamroleReconciler) HandleReconcile(ctx context.Context, req ctrl.Request, iamRole *iammanagerv1alpha1.Iamrole) (ctrl.Result, error) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in general, this HandleReconcile method is just way too long, but it is doing a lot. Trying to refactor the error handling code to not be so repetitive (e.g. extracting a logAndRecordReconcilationError method) might shorten it and make it easier to follow.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I completely agree. In this PR I would like to not make too many changes to he HandleReconcile method because I am concerned about trying to submit that upstream and have the Keiko team balk at a massive looking rewrite. Any significant objections to me punting on that for now?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no

}
return flag, response
return val, nil
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a great improvement over the previous version of this function

Copy link

@sabw8217 sabw8217 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this is definitely an improvement, I did have a couple of suggestions though

@diranged diranged changed the base branch from fix_loop_2 to master August 3, 2021 18:14
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants