-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
aws_eks: Cluster creation with AlbControllerOptions is running into error #22005
Comments
related to #19705 |
@mrlikl I was able to deploy it with cdk 2.46.0, kubernetes 1.21 and alb controller 2.4.1. Are you still having the issue? |
Getting the same error when default_capacity=0, the code mentioned in the description will reproduce the error now. |
@mrlikl I am running the following code to reproduce this error. Will let you know when the deploy completed. import { KubectlV23Layer } from '@aws-cdk/lambda-layer-kubectl-v23';
import {
App, Stack,
aws_eks as eks,
aws_ec2 as ec2,
} from 'aws-cdk-lib';
const devEnv = {
account: process.env.CDK_DEFAULT_ACCOUNT,
region: process.env.CDK_DEFAULT_REGION,
};
const app = new App();
const stack = new Stack(app, 'triage-dev5', { env: devEnv });
new eks.Cluster(stack, 'Cluster', {
vpc: ec2.Vpc.fromLookup(stack, 'Vpc', { isDefault: true }),
albController: {
version: eks.AlbControllerVersion.V2_4_1,
},
version: eks.KubernetesVersion.V1_23,
kubectlLayer: new KubectlV23Layer(stack, 'LayerVersion'),
clusterLogging: [
eks.ClusterLoggingTypes.API,
eks.ClusterLoggingTypes.AUTHENTICATOR,
eks.ClusterLoggingTypes.SCHEDULER,
],
endpointAccess: eks.EndpointAccess.PUBLIC,
placeClusterHandlerInVpc: true,
clusterName: 'baking-k8s',
outputClusterName: true,
outputMastersRoleArn: true,
defaultCapacity: 0,
kubectlEnvironment: { MINIMUM_IP_TARGET: '100', WARM_IP_TARGET: '100' },
}); |
I am getting error with the CDK code provided above: Lambda Log:
I am making this a P2 now and I will investigate a little bit more on this next week. If you have any possible solution please let me know. Any pull request would be highly appreciated as well. |
I think this issue should be prioritized, a lot of other folks running into trouble when developing on sandbox. I have seen a lot of issue in this repo which have setting default capacity 0 but did not realized it's a bug, It really impact development productivity since cloud formation template will take hours to rollback and cleanup the resource. |
I have the same issue:
The error from CloudFormation is:
|
Hey @pahud. Thank you so much for looking into this.
|
Hi @pahud, still face the same issue. I deployed the cdk in cn-north-1 region. |
Hi @pahud , I think I found out the root cause in my scenario. It may be caused by image can not be pulled in cn-north-1 region. Please check: Failed to pull image "602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.1": rpc error: code = Unknown desc = failed to pull and unpack image "602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.1": failed to resolve reference "602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.1": pulling from host 602401143452.dkr.ecr.us-west-2.amazonaws.com failed with status code [manifests v2.4.1]: 401 Unauthorized
|
Seems like related to #22520 |
013241004608.dkr.ecr.us-gov-west-1.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.1 Find a solution in kubernetes-sigs/aws-load-balancer-controller#1694, you can manually replace the ecr template url in cloudformation. https://github.com/kubernetes-sigs/aws-load-balancer-controller/releases?page=2 |
The issue is that when the cluster is deployed with While this is expected as there are no nodes, I was testing by adding a check to kubectl-handler to see if nodes are 0 when the error is thrown and was able to handle the error. However, I am not sure if this is the right approach to solve this issue.
|
@pahud out of interest is this still on the backlog or has it been deprioritized? Calling |
I have been Stuck on creating FargateCluster with this issue since 06/22 #22005 (comment) . Did the 'defaultCapacity' work for you? It is not an option for fargate. Just tried with latest version of CDK today and still having this issue. It is possible to escalate this issue please? |
Could someone help me i have the same issue. Here is my repo https://github.com/PavanMudigondaTR/install-karpenter-with-cdk |
It's been a while and I am now testing the following code in the latest CDK export class EksStack extends Stack {
constructor(scope: Construct, id: string, props?: StackProps) {
super(scope, id, props)
// use my default VPC
const vpc = getDefaultVpc(this);
new eks.Cluster(this, 'Cluster', {
vpc,
albController: {
version: eks.AlbControllerVersion.V2_6_2,
},
version: eks.KubernetesVersion.V1_27,
kubectlLayer: new KubectlLayer(this, 'LayerVersion'),
clusterLogging: [
eks.ClusterLoggingTypes.API,
eks.ClusterLoggingTypes.AUTHENTICATOR,
eks.ClusterLoggingTypes.SCHEDULER,
],
endpointAccess: eks.EndpointAccess.PUBLIC,
placeClusterHandlerInVpc: true,
clusterName: 'baking-k8s',
outputClusterName: true,
outputMastersRoleArn: true,
defaultCapacity: 0,
kubectlEnvironment: { MINIMUM_IP_TARGET: '100', WARM_IP_TARGET: '100' },
});
}
} For issues from @mrlikl @Karatakos @smislam @PavanMudigondaTR, I am not sure if your issues are related to this one which seems to be related with AlbController, if it doesn't come with AlbController, please open a new issue and link to this one. @YikaiHu EKS in China is a little bit more complicated, please open a separate issue for your case in China and link to this one. Thanks. |
Unfortunately I can't deploy it with the following code in my first attempt. I am making it a p1 for now and will simplify the code hopefully to figure out the root cause. export class EksStack extends Stack {
constructor(scope: Construct, id: string, props?: StackProps) {
super(scope, id, props)
// use my default VPC
const vpc = getDefaultVpc(this);
new eks.Cluster(this, 'Cluster', {
vpc,
albController: {
version: eks.AlbControllerVersion.V2_6_2,
},
version: eks.KubernetesVersion.V1_27,
kubectlLayer: new KubectlLayer(this, 'LayerVersion'),
clusterLogging: [
eks.ClusterLoggingTypes.API,
eks.ClusterLoggingTypes.AUTHENTICATOR,
eks.ClusterLoggingTypes.SCHEDULER,
],
endpointAccess: eks.EndpointAccess.PUBLIC,
placeClusterHandlerInVpc: true,
clusterName: 'baking-k8s',
outputClusterName: true,
outputMastersRoleArn: true,
defaultCapacity: 0,
kubectlEnvironment: { MINIMUM_IP_TARGET: '100', WARM_IP_TARGET: '100' },
});
}
} |
This issue has not received a response in a while. If you want to keep this issue open, please leave a comment below and auto-close will be canceled. |
issue still persists. please bot don't close the ticket |
Hey @pahud, thank you so much for looking into this. I concur that the issue still persist. Here is the error: Node: v20.10.0
When I add your suggestion
|
I was able to resolve the issue. What I have found is that to create the egress controller, the code is getting helm files from Kubernetes sigs. To access those file, you must have egress enabled. In my case, I was creating my cluster in Private subnet. You need to create your cluster in a subnet with egress. Please update your Cluster and your VPC configurations to see if this gets resolved for you. My Stack completed successfully. |
Thank you @smislam for the insights. |
This issue has not received a response in a while. If you want to keep this issue open, please leave a comment below and auto-close will be canceled. |
@smislam @pahud im still getting the same error with my python code even with
|
For some reason it will fail if vpc_subnets selection is
This means CDK doesn't seem to find any "private with egress" subnets in your vpc. Can you make sure you do have private subnets with egress(typically NAT gateway)? |
@andreprawira, It looks like you are using a VPC (already created in another stack) that doesn't have a private subnet with egress. And, that is why you are getting that error. vpc = ec2.Vpc.from_lookup(self, "VPCLookup", vpc_id=props.vpc_id) You will not be able to use CDK to create your stack with such configuration for the reason I mentioned earlier in my comment.. So, either update with your VPC to create new private subnet with Egress or create an entirely new VPC with |
@pahud @smislam so we have a product in our service catalog that deploys VPC and IGW to all of our accounts and within that product, we dont use NAT GW, rather we use a TGW in our network account (meaning all traffic goes in and out through network account, even with the VPCs in various other accounts). That is why i did a VPC from lookup cause it has already been created. That being said, is there another way for me to use the Furthermore, using How do i use |
@andreprawira, Your setup should work. There is a bug in the older version of CDK that has an issue with Transit Gateway. I ran into this a while back. Any chance you are using older version of CDK? |
@smislam i just updated my cdk from version
but i am still getting the same |
That is strange. I am not sure what is happening @andreprawira. We will need @pahud and the AWS CDK team to look deeper into this. Happy coding and a happy New Year! |
I think you still can use private isolated for the vpc_subnets as below:
But if you look at the synthesized template, there could be a chance
Technically, it is possible to deploy eks cluster with isolated subnets but there're a lot of requirements you need to consider and we don't have a working sample for now and we will need more feedback from the community before we know how to do that and add it in the document. We have a p1 tracking issue for eks cluster with isolated support at #12171 - we will need to close that first but that should not relevant to albcontroller. |
This issue has not received a response in a while. If you want to keep this issue open, please leave a comment below and auto-close will be canceled. |
Describe the bug
While creating an eks cluster with eks.AlbControllerOptions, it is running into error while creating the custom resource Custom::AWSCDK-EKS-HelmChart
"Received response status [FAILED] from custom resource. Message returned: Error: b'Error: UPGRADE FAILED: another operation (install/upgrade/rollback) is in progress' "
Expected Behavior
Creation of the custom resource Custom::AWSCDK-EKS-HelmChart to be succesfull
Current Behavior
Custom::AWSCDK-EKS-HelmChart is running into error "Received response status [FAILED] from custom resource. Message returned: Error: b'Error: UPGRADE FAILED: another operation (install/upgrade/rollback) is in progress' "
Reproduction Steps
cluster = eks.Cluster(
scope=self,
id=construct_id,
tags={"env": "production"},
alb_controller=eks.AlbControllerOptions(
version=eks.AlbControllerVersion.V2_4_1
),
version=eks.KubernetesVersion.V1_21,
cluster_logging=[
eks.ClusterLoggingTypes.API,
eks.ClusterLoggingTypes.AUTHENTICATOR,
eks.ClusterLoggingTypes.SCHEDULER,
],
endpoint_access=eks.EndpointAccess.PUBLIC,
place_cluster_handler_in_vpc=True,
cluster_name="basking-k8s",
output_masters_role_arn=True,
output_cluster_name=True,
default_capacity=0,
kubectl_environment={"MINIMUM_IP_TARGET": "100", "WARM_IP_TARGET": "100"},
)
Possible Solution
No response
Additional Information/Context
No response
CDK CLI Version
2.40.0
Framework Version
No response
Node.js Version
16.17.0
OS
macos 12.5.1
Language
Python
Language Version
3.10.6
Other information
No response
The text was updated successfully, but these errors were encountered: