eks.AlbController - helm error "UPGRADE FAILED: another operation (install/upgrade/rollback) is in progress" #27641

jeremychone · 2023-10-22T22:08:24Z

Describe the bug

When doing a simple eks.Cluster with a simple eks.AlbController

I get CREATE_FAILED | Custom::AWSCDK-EKS-HelmChart ... UPGRADE FAILED: another operation (install/upgrade/rollback) is in progress error.

Expected Behavior

AlbController created.

Current Behavior

I am getting this error:

JcCdkXpEksStack: creating CloudFormation changeset...
[██████████████████████████▎·······························] (10/22)

12:26:06 PM | CREATE_FAILED        | Custom::AWSCDK-EKS-HelmChart          | JcCdkXpAlbControll...e/Resource/Default
Received response status [FAILED] from custom resource. Message returned: Error: b'Error: UPGRADE FAILED: another operation (install/upgrade/rollback) is in progress\n'

Logs: /aws/lambda/JcCdkXpEksStack-awscdkawseksKubect-Handler886CB40B-8EAQbaRd9uON

at invokeUserFunction (/var/task/framework.js:2:6)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async onEvent (/var/task/framework.js:1:369)
at async Runtime.handler (/var/task/cfn-response.js:1:1573) (RequestId: 3d2c880c-ce89-4e07-a73b-910a5a760a75)

Reproduction Steps

    const cluster = new eks.Cluster(this, 'jc-cdk-xp-cluster', {
      version: eks.KubernetesVersion.V1_27,
      defaultCapacity: 1,
      defaultCapacityInstance: new ec2.InstanceType('t3.micro'),
      kubectlLayer: new KubectlV27Layer(this, 'kubectl'),
    });
    
    const albController = new eks.AlbController(this, 'JcCdkXpAlbController', {
      cluster: cluster,
      version: eks.AlbControllerVersion.V2_5_1,
    });

Even with the albController property inside the cluster, we get this error.

I tried many various options, with and without the default capacity, with or without the kubectlLayer, but still got the same error.

Possible Solution

No response

Additional Information/Context

This appears to mirror a previously closed issue/discussion: #19705.

Additionally, I encountered a failure when attempting to create the cluster in one go with the .albController property.

Subsequently, I first set up the cluster without the ALB, and that was successful. However, upon adding the const albController..., I faced the same error again.

CDK CLI Version

2.102.0 (build 2abc59a)

Framework Version

No response

Node.js Version

v20.8.1

OS

Mac

Language

TypeScript

Language Version

Typescript Version 5.2.2

Other information

eks: 1.27

The text was updated successfully, but these errors were encountered:

indrora · 2023-10-23T20:29:20Z

This is an underlying Kubernetes issue rearing its head: Your Helm chart is never finishing deployment.

This description lines up very nicely with this third party discussion: https://medium.com/nerd-for-tech/kubernetes-helm-error-upgrade-failed-another-operation-install-upgrade-rollback-is-in-progress-52ea2c6fcda9

StackOverflow shows that the error is from somewhere deep in Kubernetes/Helm, as the issue appears on Azure as well: https://stackoverflow.com/questions/71599858/upgrade-failed-another-operation-install-upgrade-rollback-is-in-progress

jeremychone · 2023-10-23T21:18:24Z

@indrora Thanks, that does seem to be the problem.

The catch is that in a CDK environment, Helm charts are executed by the cluster layer (e.g., KubectlV27Layer), and asking the user to install Helm locally somewhat defeats the purpose of CDK and the cdk deploy ... process.

I'm attempting to circumvent this Helm issue by installing the ALBController using the kubectl method, but it's rather cumbersome.

I'm wondering if there's a way to utilize eks.HelmCharts or eks.KubernetesManifest for cleanup or something similar. I'm not exactly sure how this would integrate with the CDK workflow, though.

By the way, am I the only one encountering this issue on AWS?

I've recreated numerous stacks and clusters, but I consistently run into this problem. I'm curious about how others are managing to create their ALBControllers with CDK. I must be overlooking something.

jeremychone · 2023-10-24T16:57:19Z

Good news - I've identified the problem. It turns out that the node resources were insufficient. The t3.micro was simply too small, and it seems this limitation prevented Helm from completing its task, resulting in the error.

I've made the adjustments below, and now everything is working perfectly.

export class CdkEksXp05Stack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);
    
    const stack = this;

    const cluster = new eks.Cluster(stack, CLUSTER_NAME, {
      clusterName: CLUSTER_NAME,
      version: eks.KubernetesVersion.V1_27,
      defaultCapacity: 2,
      defaultCapacityInstance: new ec2.InstanceType('t3.large'),
      kubectlLayer: new KubectlV27Layer(stack, 'kubectl'),
    });
    
    // #region    --- ALB
    const albController = new eks.AlbController(stack, 'AlbController', {
      cluster,
      version: eks.AlbControllerVersion.V2_5_1,
    });
    // #endregion --- ALB
  }
}

Note: As a precaution, I initially deployed with the ALB section commented out, then uncommented it and deployed a second time. This was to prevent the entire cluster from rolling back in case of an issue. However, I anticipate that everything should work in a single deploy.

Additional Note: I came to understand that there was a resource issue when I reconfigured the LBC installation to the Kubernetes method via CDK. The process halted during the LBC deployment because of insufficient resources. This circumstance suggested that the problem wasn't related to Helm but was instead due to a resource limitation.

This situation appears to be an issue related to either the documentation or the need for more precise error messaging (though the latter might not be an easy fix).

From my side, we can close this issue. (Not sure if I should be the one doing it).

Thanks, @indrora, for your input.

github-actions · 2023-10-24T18:24:10Z

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

phelian · 2023-11-12T15:15:00Z

I still get this error even with large capacity :(
eks 1.27 alb 2.5.1

jeremychone added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Oct 22, 2023

github-actions bot added the @aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service label Oct 22, 2023

indrora added the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Oct 23, 2023

github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Oct 24, 2023

khushail removed the needs-triage This issue or PR still needs to be triaged. label Oct 24, 2023

khushail closed this as completed Oct 24, 2023

github-actions bot mentioned this issue Nov 1, 2023

Monthly issue metrics report - October 2023 #27785

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

eks.AlbController - helm error "UPGRADE FAILED: another operation (install/upgrade/rollback) is in progress" #27641

eks.AlbController - helm error "UPGRADE FAILED: another operation (install/upgrade/rollback) is in progress" #27641

jeremychone commented Oct 22, 2023 •

edited

Loading

indrora commented Oct 23, 2023

jeremychone commented Oct 23, 2023

jeremychone commented Oct 24, 2023 •

edited

Loading

github-actions bot commented Oct 24, 2023

phelian commented Nov 12, 2023

eks.AlbController - helm error "UPGRADE FAILED: another operation (install/upgrade/rollback) is in progress" #27641

eks.AlbController - helm error "UPGRADE FAILED: another operation (install/upgrade/rollback) is in progress" #27641

Comments

jeremychone commented Oct 22, 2023 • edited Loading

Describe the bug

Expected Behavior

Current Behavior

Reproduction Steps

Possible Solution

Additional Information/Context

CDK CLI Version

Framework Version

Node.js Version

OS

Language

Language Version

Other information

indrora commented Oct 23, 2023

jeremychone commented Oct 23, 2023

jeremychone commented Oct 24, 2023 • edited Loading

github-actions bot commented Oct 24, 2023

⚠️COMMENT VISIBILITY WARNING⚠️

phelian commented Nov 12, 2023

jeremychone commented Oct 22, 2023 •

edited

Loading

jeremychone commented Oct 24, 2023 •

edited

Loading