Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Authorization Controller for NodeClass #7571

Closed
wants to merge 18 commits into from

Conversation

edibble21
Copy link
Contributor

Fixes #N/A

Description
A authorization check that checks for the proper permissions before marking the nodeclass ready
How was this change tested?
make presubmit
Does this change impact docs?

  • Yes, PR includes docs updates
  • Yes, issue opened: #
  • No

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@edibble21 edibble21 requested a review from a team as a code owner January 7, 2025 21:08
@edibble21 edibble21 requested a review from jmdeal January 7, 2025 21:08
@edibble21 edibble21 changed the title Authorizationcontroller Authorization Controller for NodeClass Jan 7, 2025
Copy link

netlify bot commented Jan 7, 2025

Deploy Preview for karpenter-docs-prod ready!

Name Link
🔨 Latest commit 309e10a
🔍 Latest deploy log https://app.netlify.com/sites/karpenter-docs-prod/deploys/67916fcabc74a50008b61a09
😎 Deploy Preview https://deploy-preview-7571--karpenter-docs-prod.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@coveralls
Copy link

coveralls commented Jan 7, 2025

Pull Request Test Coverage Report for Build 12773797859

Details

  • 54 of 58 (93.1%) changed or added relevant lines in 8 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.3%) to 65.222%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/controllers/controllers.go 0 2 0.0%
pkg/errors/errors.go 7 9 77.78%
Totals Coverage Status
Change from base Build 12754714725: 0.3%
Covered Lines: 5840
Relevant Lines: 8954

💛 - Coveralls

Copy link
Contributor

@engedaam engedaam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/karpenter snapshot

pkg/apis/v1/ec2nodeclass_status.go Outdated Show resolved Hide resolved
@@ -93,7 +94,7 @@ func (c *CloudProvider) Create(ctx context.Context, nodeClaim *karpv1.NodeClaim)
if nodeClassReady.IsFalse() {
return nil, cloudprovider.NewNodeClassNotReadyError(stderrors.New(nodeClassReady.Message))
}
if nodeClassReady.IsUnknown() {
if nodeClassReady.IsUnknown() && ctx.Value("DryRun") == nil {
Copy link
Contributor

@engedaam engedaam Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we checking dry-run here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because during the initial reconciliation nodeclass readiness is unknown so it will fail without a check

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we care about the dry-run value in that case? If the nodeclass has not been checked yet

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, this check feels odd to me too -- it's leaking details into the implementation and I'm wondering if there are ways to inject details into the providers themselves or into the API layer so that we don't have to leak this detail here

pkg/controllers/nodeclass/status/authorization.go Outdated Show resolved Hide resolved
pkg/controllers/nodeclass/status/authorization.go Outdated Show resolved Hide resolved
pkg/controllers/nodeclass/status/authorization.go Outdated Show resolved Hide resolved
pkg/controllers/nodeclass/status/authorization.go Outdated Show resolved Hide resolved
pkg/controllers/nodeclass/status/authorization.go Outdated Show resolved Hide resolved
pkg/controllers/nodeclass/status/suite_test.go Outdated Show resolved Hide resolved
pkg/controllers/nodeclass/status/suite_test.go Outdated Show resolved Hide resolved
@@ -0,0 +1,70 @@
/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add an integration test for permission related failures?

Copy link
Contributor

github-actions bot commented Jan 7, 2025

Snapshot successfully published to oci://021119463062.dkr.ecr.us-east-1.amazonaws.com/karpenter/snapshot/karpenter:0-19af5b92e7738bd9ff5086809661cc81f0ade773.
To install you must login to the ECR repo with an AWS account:

aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin 021119463062.dkr.ecr.us-east-1.amazonaws.com

helm upgrade --install karpenter oci://021119463062.dkr.ecr.us-east-1.amazonaws.com/karpenter/snapshot/karpenter --version "0-19af5b92e7738bd9ff5086809661cc81f0ade773" --namespace "kube-system" --create-namespace \
  --set "settings.clusterName=${CLUSTER_NAME}" \
  --set "settings.interruptionQueue=${CLUSTER_NAME}" \
  --set controller.resources.requests.cpu=1 \
  --set controller.resources.requests.memory=1Gi \
  --set controller.resources.limits.cpu=1 \
  --set controller.resources.limits.memory=1Gi \
  --wait

@engedaam engedaam self-assigned this Jan 7, 2025
var errs []error
//create checks createfleet, and describelaunchtemplate
if _, err := n.cloudProvider.Create(ctx, nodeClaim); err != nil {
errs = append(errs, fmt.Errorf("create: %w", err))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using "go.uber.org/multierr" here

if err := n.instanceProvider.CreateTags(ctx, "mock-id", map[string]string{"mock-tag": "mock-tag-value"}); err != nil {
errs = append(errs, fmt.Errorf("create tags: %w", err))
}
if corecloudprovider.IsNodeClassNotReadyError(errors.Join(errs...)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we only set the condition to false if the node class is not ready?

@@ -93,7 +94,7 @@ func (c *CloudProvider) Create(ctx context.Context, nodeClaim *karpv1.NodeClaim)
if nodeClassReady.IsFalse() {
return nil, cloudprovider.NewNodeClassNotReadyError(stderrors.New(nodeClassReady.Message))
}
if nodeClassReady.IsUnknown() {
if nodeClassReady.IsUnknown() && ctx.Value("DryRun") == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we care about the dry-run value in that case? If the nodeclass has not been checked yet

@@ -93,7 +94,7 @@ func (c *CloudProvider) Create(ctx context.Context, nodeClaim *karpv1.NodeClaim)
if nodeClassReady.IsFalse() {
return nil, cloudprovider.NewNodeClassNotReadyError(stderrors.New(nodeClassReady.Message))
}
if nodeClassReady.IsUnknown() {
if nodeClassReady.IsUnknown() && ctx.Value("DryRun") == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, this check feels odd to me too -- it's leaking details into the implementation and I'm wondering if there are ways to inject details into the providers themselves or into the API layer so that we don't have to leak this detail here

@@ -109,6 +110,9 @@ func (c *CloudProvider) Create(ctx context.Context, nodeClaim *karpv1.NodeClaim)
}
instance, err := c.instanceProvider.Create(ctx, nodeClass, nodeClaim, tags, instanceTypes)
if err != nil {
if awserrors.IsUnauthorizedError(err) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checking: Is this here to handle the case where we get unauthorized and that causes us to fail the launch -- is the idea here that eventually the controller will reconcile and should block further launches (similar to what we do with other validation checks, where we expect the reconciliation to eventually catch-up and mark the NodeClass as NotReady)?

errs = append(errs, fmt.Errorf("create: %w", err))
}

if err := n.instanceProvider.Delete(ctx, "mock-id"); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we checking delete when it comes to determining whether we can launch an instance?

errs = append(errs, fmt.Errorf("create: %w", err))
}

if err := n.instanceProvider.Delete(ctx, "mock-id"); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're injecting dry-run into the context, which feels a little bit "hacky" to me -- stuffing values into context is always going to make the code handling that context a little awkward -- I wonder what happens if you wrap the EC2API with something like EC2DryRunAPI and it implements the EC2API but automatically mutates the call to pass the dry-run flag -- now you can inject this API into the providers and you don't have to mess with the context at all

edibble21 and others added 14 commits January 14, 2025 10:13
…actions-deps group (aws#7612)

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
@edibble21 edibble21 closed this Jan 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants