Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplified form of IAM Roles for ServiceAccounts #9352

Merged
merged 6 commits into from
Sep 10, 2020

Conversation

justinsb
Copy link
Member

@justinsb justinsb commented Jun 12, 2020

We create a featureflag "PublicJWKS" which runs in a simpler
configuration, where we expose the JWKS directly from the apiserver.

This has some limitations:

  • Only works when we have a public ELB
  • Requires enabling anonymous-auth on the apiserver

But it is a simpler configuration, and likely we can pare down the
code and get this under e2e before tackling some of the more
complicated scenarios.

Edit:

To try this out (but note that this isn't really a production config yet because of anonymous auth):

export KOPS_FEATURE_FLAGS="UseServiceAccountIAM,PublicJWKS"
kops create cluster foo.example.com --zones us-east-2a --api-loadbalancer-type public

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jun 12, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: justinsb

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added area/api area/nodeup area/provider/aws Issues or PRs related to aws provider approved Indicates a PR has been approved by an approver from all required OWNERS files. area/provider/openstack Issues or PRs related to openstack provider area/rolling-update cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jun 12, 2020
@rifelpet
Copy link
Member

rifelpet commented Jun 15, 2020

So we think this will be the PR that we'll land first between this, #9341 and #9271

Based on this comment: #9271 (comment) the remaining items for this PR are:

  • Remove the API changes. pkg/model/awsmodel/oidc_provider.go will only run if the feature flag is enabled. Also remove the creation of the jwks fitasks.ManagedFiles.
  • Add validation if the feature flag is enabled:
    • cluster cannot use gossip, must use public DNS
    • must have an internet-accessible api load balancer
    • must not use an ACM certificate on the load balancer ( i think this is required? or we'd have to reference the ACM CA for thumbprints. I'm not sure we'd want to handle that complexity for initial support of this feature)
    • we currently hardcode anonymous auth to enabled but maybe we go a step further and make sure the cluster spec isn't explicitly disabling anonymous auth just to be clear on what the final behavior is.
    • there may be additional requirement's we'll need to check here...
  • Hardcode issuer URL in AWS OIDC Provider to use the cluster's api domain name and cluster's CA thumbprint
  • Ensure unauthenticated requests can reach the apiserver's jwks endpoints
  • Update pkg/resources/aws/aws.go to look for OIDC Providers with issuer URLs that match the cluster's api domain name

I'm sure theres stuff I'm missing but hopefully that makes sense so far.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 15, 2020
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 17, 2020
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 18, 2020
@k8s-ci-robot k8s-ci-robot added area/addons area/provider/alicloud Issues or PRs related to alicloud provider and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Aug 15, 2020
@justinsb justinsb force-pushed the irsa_with_public branch 5 times, most recently from e4fb028 to 2242ab0 Compare August 16, 2020 01:09
@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 3, 2020
@justinsb
Copy link
Member Author

justinsb commented Sep 3, 2020

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 3, 2020
}

if !kubernetesRangesIntersect(versionRange, semver.MustParseRange(">= 1.19.0")) {
// Skip; this is an older manifest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic needs to be in sync with the iam_builder logic that decides whether to give the privileges to the service account or node

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is OK; this is saying "only try to patch the latest versions of the manifests". There is a deeper issue though, which is how to enable this on an existing cluster ... I talk about that in the top level comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the Kubernetes version is 1.11 but the feature flag is on, then iam_builder will withhold the permissions from the node role, yet this won't inject the projected volume. So dns-controller won't have the permissions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right - there is "caveat emptor" with feature flags, I see you've got it with your suggested change though so I'm going to incorporate that!

Copy link
Member

@johngmyers johngmyers Sep 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check can probably be removed now since context.UseServiceAccountIAM() will be false for the older manifest.

return
}

// Legacy IAM permissions for node roles
if b.Cluster.Spec.IAM.Legacy {
if b.Cluster.Spec.IAM.Legacy && b.Role.IsNodeRole() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No longer need this change. Legacy now implies all roles are node roles.

}

// ServiceAccountForServiceAccountRole returns the kubernetes service account used by pods with the specified role
func ServiceAccountForServiceAccountRole(serviceAccountRole ServiceAccountRole) types.NamespacedName {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a receiver?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

)

// PodOrNodeRole is used to specify generation of permissions for either a pod or a node
type PodOrNodeRole struct {
Copy link
Member

@johngmyers johngmyers Sep 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking the relevant buildAWSIAMRolePolicy() logic could possibly be pushed under an interface method. And a ServiceAccount() method would return the name of any service account.

Basically I'm applying the principle that code that does type assertions (or tests for existence of union fields) suggests that the branches under the test might be better as an interface method. But if it doesn't work, it doesn't work.

@justinsb
Copy link
Member Author

justinsb commented Sep 8, 2020

Thanks for explaining the interface comment @johngmyers - I implemented that and it seems to work really well. I think we could go even further with this reorganization, but there's likely a lot of moving around of code that would make this PR unreadably large again.

Your first comment (about keeping the logic in sync) gets at a deeper point - we don't have a great way to enable IRSA on an existing cluster right now. I suspect it'll actually work out, but it's possible we could hit a circular dependency while trying to publish the jwks document.

I think the answer will likely be:

  • The production flow will involve kops CLI publishing the JWKS document to an S3 bucket (i.e. no dependency on apiserver running nor on kops-controller having permissions to publish to an S3 bucket)
  • We'll therefore be able to enable IRSA and use it in one step; we might have some pods that temporarily don't have valid credentials but it should be a matter of a few seconds until everything converges.
  • When we graduate the feature-flag we'll have to include kubernetes-versions selection for whatever version we enable it on.

I don't particularly want to tackle that in this PR though - my goal here is to get test coverage on IRSA so we can get the feature enabled in upstream kubernetes (and demonstrate that kops is a good place to build test coverage of new functionality!).

@johngmyers
Copy link
Member

If the plan is to condition IRSA support on the manifest version, then different service accounts are likely to have different Kubernetes-version cutoffs.

Now when we drop support for k8s 1.11 in kops 1.20 a lot of variant manifests will drop away. But there are a few CNIs that vary over 1.16 and spotinst varies over 1.14.

pkg/model/context.go Outdated Show resolved Hide resolved
@justinsb
Copy link
Member Author

justinsb commented Sep 8, 2020

Merged your suggested change @johngmyers and then passed down UseServiceAccountIAM to avoid duplicating the logic. I guess in some version (e.g. 1.21) we'll enable a "real" flag/field for enabling this; my hope is that we won't have to tweak the manifests at that stage, but if we need to we can cut a new version of the manifests to address any incompatibilities.

I've also been thinking more about your point about functions/templates etc... there's actually a ton of overlap with the idea of operators and their ability to do transforms nicely, but the challenge of the opacity of the resulting manifest. I don't really have a fully formed answer yet, but there's something there I think...

@justinsb
Copy link
Member Author

justinsb commented Sep 8, 2020

(Had to add another commit to workaround AWS IAM vs Golang JSON restrictions on the generated IAM policy)

Copy link
Member

@johngmyers johngmyers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like there's only minor stuff left. I think it's really close.

There are some parameters of type iam.Subject that are named role; perhaps they should be named subject.

}
container := &containers[0]

if context.UseServiceAccountIAM() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cannot be false here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - removed!


// MarshalJSON formats the IAM statement for the AWS IAM restrictions.
// For example, `Resource: []` is not allowed, but golang would force us to use pointers.
func (s *Statement) MarshalJSON() ([]byte, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want unit tests for this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - we had a unit test and added test cases to cover the new cases (e.g. no Resources).

)

// Subject represents an IAM identity, to which permissions are granted.
// It is implemented by NodeRole objects and per-ServiceAccount objects.
type Subject interface {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps rename this file subject.go

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea - done!

pkg/model/iam/pod_roles.go Outdated Show resolved Hide resolved
@justinsb justinsb force-pushed the irsa_with_public branch 2 times, most recently from c878d85 to 56526ca Compare September 9, 2020 13:01
@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Sep 9, 2020

@justinsb: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-kops-e2e-cni-weave f5149ec link /test pull-kops-e2e-cni-weave
pull-kops-e2e-cni-flannel f5149ec link /test pull-kops-e2e-cni-flannel
pull-kops-e2e-cni-kuberouter f5149ec link /test pull-kops-e2e-cni-kuberouter

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

justinsb and others added 6 commits September 9, 2020 09:57
This should be much easier to start and to get under testing; it only
works with a load balancer, it sets the apiserver into anonymous-auth
allowed, it grants the anonymous auth user permission to read our jwks
tokens.  But it shouldn't need a second bucket or anything of that
nature.

Co-authored-by: John Gardiner Myers <[email protected]>
This avoids the hacky search through the list of tasks.
Hat-tip to johngmyers for the idea!
AWS IAM is very strict and doesn't support `Resource: []` for example.
We implement a custom MarshalJSON method to work around that.
@johngmyers
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 10, 2020
@k8s-ci-robot k8s-ci-robot merged commit 036ea69 into kubernetes:master Sep 10, 2020
@k8s-ci-robot k8s-ci-robot added this to the v1.19 milestone Sep 10, 2020
@olemarkus olemarkus mentioned this pull request Dec 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/addons area/api area/nodeup area/provider/alicloud Issues or PRs related to alicloud provider area/provider/aws Issues or PRs related to aws provider area/provider/openstack Issues or PRs related to openstack provider area/rolling-update cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants