Authorization Controller for NodeClass #7571

edibble21 · 2025-01-07T21:08:44Z

Fixes #N/A

Description
A authorization check that checks for the proper permissions before marking the nodeclass ready
How was this change tested?
make presubmit
Does this change impact docs?

Yes, PR includes docs updates
Yes, issue opened: #
No

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

netlify · 2025-01-07T21:09:05Z

✅ Deploy Preview for karpenter-docs-prod ready!

Name	Link
🔨 Latest commit	`309e10a`
🔍 Latest deploy log	https://app.netlify.com/sites/karpenter-docs-prod/deploys/67916fcabc74a50008b61a09
😎 Deploy Preview	https://deploy-preview-7571--karpenter-docs-prod.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

coveralls · 2025-01-07T21:14:13Z

Pull Request Test Coverage Report for Build 12773797859

Details

54 of 58 (93.1%) changed or added relevant lines in 8 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage increased (+0.3%) to 65.222%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
pkg/controllers/controllers.go	0	2	0.0%
pkg/errors/errors.go	7	9	77.78%

Totals
Change from base Build 12754714725:	0.3%
Covered Lines:	5840
Relevant Lines:	8954

💛 - Coveralls

engedaam

/karpenter snapshot

pkg/apis/v1/ec2nodeclass_status.go

engedaam · 2025-01-07T21:42:46Z

pkg/cloudprovider/cloudprovider.go

@@ -93,7 +94,7 @@ func (c *CloudProvider) Create(ctx context.Context, nodeClaim *karpv1.NodeClaim)
 	if nodeClassReady.IsFalse() {
 		return nil, cloudprovider.NewNodeClassNotReadyError(stderrors.New(nodeClassReady.Message))
 	}
-	if nodeClassReady.IsUnknown() {
+	if nodeClassReady.IsUnknown() && ctx.Value("DryRun") == nil {


Why are we checking dry-run here?

Because during the initial reconciliation nodeclass readiness is unknown so it will fail without a check

Why do we care about the dry-run value in that case? If the nodeclass has not been checked yet

+1, this check feels odd to me too -- it's leaking details into the implementation and I'm wondering if there are ways to inject details into the providers themselves or into the API layer so that we don't have to leak this detail here

pkg/controllers/nodeclass/status/authorization.go

pkg/controllers/nodeclass/status/suite_test.go

engedaam · 2025-01-07T22:01:21Z

pkg/controllers/nodeclass/status/authorization_test.go

@@ -0,0 +1,70 @@
+/*


Can we add an integration test for permission related failures?

github-actions · 2025-01-07T22:03:30Z

Snapshot successfully published to oci://021119463062.dkr.ecr.us-east-1.amazonaws.com/karpenter/snapshot/karpenter:0-19af5b92e7738bd9ff5086809661cc81f0ade773.
To install you must login to the ECR repo with an AWS account:

aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin 021119463062.dkr.ecr.us-east-1.amazonaws.com

helm upgrade --install karpenter oci://021119463062.dkr.ecr.us-east-1.amazonaws.com/karpenter/snapshot/karpenter --version "0-19af5b92e7738bd9ff5086809661cc81f0ade773" --namespace "kube-system" --create-namespace \
  --set "settings.clusterName=${CLUSTER_NAME}" \
  --set "settings.interruptionQueue=${CLUSTER_NAME}" \
  --set controller.resources.requests.cpu=1 \
  --set controller.resources.requests.memory=1Gi \
  --set controller.resources.limits.cpu=1 \
  --set controller.resources.limits.memory=1Gi \
  --wait

engedaam · 2025-01-08T00:25:41Z

pkg/controllers/nodeclass/status/validation.go

+	var errs []error
+	//create checks createfleet, and describelaunchtemplate
+	if _, err := n.cloudProvider.Create(ctx, nodeClaim); err != nil {
+		errs = append(errs, fmt.Errorf("create: %w", err))


Consider using "go.uber.org/multierr" here

engedaam · 2025-01-08T00:29:01Z

pkg/controllers/nodeclass/status/validation.go

+	if err := n.instanceProvider.CreateTags(ctx, "mock-id", map[string]string{"mock-tag": "mock-tag-value"}); err != nil {
+		errs = append(errs, fmt.Errorf("create tags: %w", err))
+	}
+	if corecloudprovider.IsNodeClassNotReadyError(errors.Join(errs...)) {


Why do we only set the condition to false if the node class is not ready?

engedaam · 2025-01-08T00:31:32Z

pkg/cloudprovider/cloudprovider.go

@@ -93,7 +94,7 @@ func (c *CloudProvider) Create(ctx context.Context, nodeClaim *karpv1.NodeClaim)
 	if nodeClassReady.IsFalse() {
 		return nil, cloudprovider.NewNodeClassNotReadyError(stderrors.New(nodeClassReady.Message))
 	}
-	if nodeClassReady.IsUnknown() {
+	if nodeClassReady.IsUnknown() && ctx.Value("DryRun") == nil {


Why do we care about the dry-run value in that case? If the nodeclass has not been checked yet

jonathan-innis · 2025-01-08T06:25:49Z

pkg/cloudprovider/cloudprovider.go

@@ -93,7 +94,7 @@ func (c *CloudProvider) Create(ctx context.Context, nodeClaim *karpv1.NodeClaim)
 	if nodeClassReady.IsFalse() {
 		return nil, cloudprovider.NewNodeClassNotReadyError(stderrors.New(nodeClassReady.Message))
 	}
-	if nodeClassReady.IsUnknown() {
+	if nodeClassReady.IsUnknown() && ctx.Value("DryRun") == nil {


+1, this check feels odd to me too -- it's leaking details into the implementation and I'm wondering if there are ways to inject details into the providers themselves or into the API layer so that we don't have to leak this detail here

jonathan-innis · 2025-01-08T06:27:50Z

pkg/cloudprovider/cloudprovider.go

@@ -109,6 +110,9 @@ func (c *CloudProvider) Create(ctx context.Context, nodeClaim *karpv1.NodeClaim)
 	}
 	instance, err := c.instanceProvider.Create(ctx, nodeClass, nodeClaim, tags, instanceTypes)
 	if err != nil {
+		if awserrors.IsUnauthorizedError(err) {


Just checking: Is this here to handle the case where we get unauthorized and that causes us to fail the launch -- is the idea here that eventually the controller will reconcile and should block further launches (similar to what we do with other validation checks, where we expect the reconciliation to eventually catch-up and mark the NodeClass as NotReady)?

jonathan-innis · 2025-01-08T06:30:32Z

pkg/controllers/nodeclass/status/validation.go

+		errs = append(errs, fmt.Errorf("create: %w", err))
+	}
+
+	if err := n.instanceProvider.Delete(ctx, "mock-id"); err != nil {


Why are we checking delete when it comes to determining whether we can launch an instance?

jonathan-innis · 2025-01-08T06:34:26Z

pkg/controllers/nodeclass/status/validation.go

+		errs = append(errs, fmt.Errorf("create: %w", err))
+	}
+
+	if err := n.instanceProvider.Delete(ctx, "mock-id"); err != nil {


We're injecting dry-run into the context, which feels a little bit "hacky" to me -- stuffing values into context is always going to make the code handling that context a little awkward -- I wonder what happens if you wrap the EC2API with something like EC2DryRunAPI and it implements the EC2API but automatically mutates the call to pass the dry-run flag -- now you can inject this API into the providers and you don't have to mess with the context at all

aws#7597)

Co-authored-by: Andrii Omelianenko <[email protected]>

…ock (aws#7566)

…actions-deps group (aws#7612) Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

Co-authored-by: APICodeGen <[email protected]>

edibble21 added 2 commits January 7, 2025 12:28

Authorization controller for nodeclass

727b11b

fix presubmit

9f6cd84

edibble21 requested a review from a team as a code owner January 7, 2025 21:08

edibble21 requested a review from jmdeal January 7, 2025 21:08

edibble21 changed the title ~~Authorizationcontroller~~ Authorization Controller for NodeClass Jan 7, 2025

Merge branch 'main' into Authorizationcontroller

19af5b9

engedaam reviewed Jan 7, 2025

View reviewed changes

engedaam self-assigned this Jan 7, 2025

Feedback Addressed

47c0488

engedaam reviewed Jan 8, 2025

View reviewed changes

jonathan-innis reviewed Jan 8, 2025

View reviewed changes

edibble21 and others added 14 commits January 14, 2025 10:13

Merge branch 'aws:main' into Authorizationcontroller

5499554

fix: unify nodeclass status and termination controllers to prevent ra… (

1de3a8a

aws#7597)

docs: fix graceful-node-shutdown url reference (aws#7605)

fb94371

Co-authored-by: Andrii Omelianenko <[email protected]>

docs: Add notes about instanceStorePolicy that can help prevent deadl…

a6e7273

…ock (aws#7566)

chore: Use security group policy to manage pod-eni resource (aws#7607)

75cf5be

chore(deps): bump actions/upload-artifact from 4.4.3 to 4.6.0 in the …

6d3dd81

…actions-deps group (aws#7612) Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

chore(deps): bump the k8s-go-deps group with 5 updates (aws#7610)

e78bf63

Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

chore(deps): bump the go-deps group with 12 updates (aws#7611)

eb8a7a1

Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

chore: Update data from AWS APIs (aws#7614)

a4bcd27

Co-authored-by: APICodeGen <[email protected]>

chore: Add unregistered taint when nodes register (aws#7616)

a30a57d

Authorization controller for nodeclass

52539a9

fix presubmit

0ecf8f6

Feedback Addressed

4137382

authorization check with mocked calls

309e10a

edibble21 closed this Jan 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Authorization Controller for NodeClass #7571

Authorization Controller for NodeClass #7571

edibble21 commented Jan 7, 2025

netlify bot commented Jan 7, 2025 •

edited

Loading

coveralls commented Jan 7, 2025 •

edited

Loading

engedaam left a comment

engedaam Jan 7, 2025 •

edited

Loading

edibble21 Jan 7, 2025

engedaam Jan 8, 2025

jonathan-innis Jan 8, 2025

engedaam Jan 7, 2025

github-actions bot commented Jan 7, 2025

engedaam Jan 8, 2025

engedaam Jan 8, 2025

engedaam Jan 8, 2025

jonathan-innis Jan 8, 2025

jonathan-innis Jan 8, 2025

jonathan-innis Jan 8, 2025

jonathan-innis Jan 8, 2025

Authorization Controller for NodeClass #7571

Authorization Controller for NodeClass #7571

Conversation

edibble21 commented Jan 7, 2025

netlify bot commented Jan 7, 2025 • edited Loading

✅ Deploy Preview for karpenter-docs-prod ready!

coveralls commented Jan 7, 2025 • edited Loading

Pull Request Test Coverage Report for Build 12773797859

Details

💛 - Coveralls

engedaam left a comment

Choose a reason for hiding this comment

engedaam Jan 7, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Jan 7, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

netlify bot commented Jan 7, 2025 •

edited

Loading

coveralls commented Jan 7, 2025 •

edited

Loading

engedaam Jan 7, 2025 •

edited

Loading