Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eks.AlbController - helm error "UPGRADE FAILED: another operation (install/upgrade/rollback) is in progress" #27641

Closed
jeremychone opened this issue Oct 22, 2023 · 5 comments
Labels
@aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service bug This issue is a bug.

Comments

@jeremychone
Copy link

jeremychone commented Oct 22, 2023

Describe the bug

When doing a simple eks.Cluster with a simple eks.AlbController

I get CREATE_FAILED | Custom::AWSCDK-EKS-HelmChart ... UPGRADE FAILED: another operation (install/upgrade/rollback) is in progress error.

Expected Behavior

AlbController created.

Current Behavior

I am getting this error:

JcCdkXpEksStack: creating CloudFormation changeset...
[██████████████████████████▎·······························] (10/22)

12:26:06 PM | CREATE_FAILED        | Custom::AWSCDK-EKS-HelmChart          | JcCdkXpAlbControll...e/Resource/Default
Received response status [FAILED] from custom resource. Message returned: Error: b'Error: UPGRADE FAILED: another operation (install/upgrade/rollback) is in progress\n'

Logs: /aws/lambda/JcCdkXpEksStack-awscdkawseksKubect-Handler886CB40B-8EAQbaRd9uON

at invokeUserFunction (/var/task/framework.js:2:6)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async onEvent (/var/task/framework.js:1:369)
at async Runtime.handler (/var/task/cfn-response.js:1:1573) (RequestId: 3d2c880c-ce89-4e07-a73b-910a5a760a75)

Reproduction Steps

    const cluster = new eks.Cluster(this, 'jc-cdk-xp-cluster', {
      version: eks.KubernetesVersion.V1_27,
      defaultCapacity: 1,
      defaultCapacityInstance: new ec2.InstanceType('t3.micro'),
      kubectlLayer: new KubectlV27Layer(this, 'kubectl'),
    });
    
    const albController = new eks.AlbController(this, 'JcCdkXpAlbController', {
      cluster: cluster,
      version: eks.AlbControllerVersion.V2_5_1,
    });

Even with the albController property inside the cluster, we get this error.

I tried many various options, with and without the default capacity, with or without the kubectlLayer, but still got the same error.

Possible Solution

No response

Additional Information/Context

This appears to mirror a previously closed issue/discussion: #19705.

Additionally, I encountered a failure when attempting to create the cluster in one go with the .albController property.

Subsequently, I first set up the cluster without the ALB, and that was successful. However, upon adding the const albController..., I faced the same error again.

CDK CLI Version

2.102.0 (build 2abc59a)

Framework Version

No response

Node.js Version

v20.8.1

OS

Mac

Language

TypeScript

Language Version

Typescript Version 5.2.2

Other information

eks: 1.27

@jeremychone jeremychone added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Oct 22, 2023
@github-actions github-actions bot added the @aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service label Oct 22, 2023
@indrora
Copy link
Contributor

indrora commented Oct 23, 2023

This is an underlying Kubernetes issue rearing its head: Your Helm chart is never finishing deployment.

This description lines up very nicely with this third party discussion: https://medium.com/nerd-for-tech/kubernetes-helm-error-upgrade-failed-another-operation-install-upgrade-rollback-is-in-progress-52ea2c6fcda9

StackOverflow shows that the error is from somewhere deep in Kubernetes/Helm, as the issue appears on Azure as well: https://stackoverflow.com/questions/71599858/upgrade-failed-another-operation-install-upgrade-rollback-is-in-progress

@indrora indrora added the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Oct 23, 2023
@jeremychone
Copy link
Author

@indrora Thanks, that does seem to be the problem.

The catch is that in a CDK environment, Helm charts are executed by the cluster layer (e.g., KubectlV27Layer), and asking the user to install Helm locally somewhat defeats the purpose of CDK and the cdk deploy ... process.

I'm attempting to circumvent this Helm issue by installing the ALBController using the kubectl method, but it's rather cumbersome.

I'm wondering if there's a way to utilize eks.HelmCharts or eks.KubernetesManifest for cleanup or something similar. I'm not exactly sure how this would integrate with the CDK workflow, though.

By the way, am I the only one encountering this issue on AWS?

I've recreated numerous stacks and clusters, but I consistently run into this problem. I'm curious about how others are managing to create their ALBControllers with CDK. I must be overlooking something.

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Oct 24, 2023
@jeremychone
Copy link
Author

jeremychone commented Oct 24, 2023

Good news - I've identified the problem. It turns out that the node resources were insufficient. The t3.micro was simply too small, and it seems this limitation prevented Helm from completing its task, resulting in the error.

I've made the adjustments below, and now everything is working perfectly.

export class CdkEksXp05Stack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);
    
    const stack = this;

    const cluster = new eks.Cluster(stack, CLUSTER_NAME, {
      clusterName: CLUSTER_NAME,
      version: eks.KubernetesVersion.V1_27,
      defaultCapacity: 2,
      defaultCapacityInstance: new ec2.InstanceType('t3.large'),
      kubectlLayer: new KubectlV27Layer(stack, 'kubectl'),
    });
    
    // #region    --- ALB
    const albController = new eks.AlbController(stack, 'AlbController', {
      cluster,
      version: eks.AlbControllerVersion.V2_5_1,
    });
    // #endregion --- ALB
  }
}

Note: As a precaution, I initially deployed with the ALB section commented out, then uncommented it and deployed a second time. This was to prevent the entire cluster from rolling back in case of an issue. However, I anticipate that everything should work in a single deploy.

Additional Note: I came to understand that there was a resource issue when I reconfigured the LBC installation to the Kubernetes method via CDK. The process halted during the LBC deployment because of insufficient resources. This circumstance suggested that the problem wasn't related to Helm but was instead due to a resource limitation.

This situation appears to be an issue related to either the documentation or the need for more precise error messaging (though the latter might not be an easy fix).

From my side, we can close this issue. (Not sure if I should be the one doing it).

Thanks, @indrora, for your input.

@khushail khushail removed the needs-triage This issue or PR still needs to be triaged. label Oct 24, 2023
@github-actions
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

@phelian
Copy link

phelian commented Nov 12, 2023

I still get this error even with large capacity :(
eks 1.27 alb 2.5.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service bug This issue is a bug.
Projects
None yet
Development

No branches or pull requests

4 participants