Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ec2: CloudFormation attempts to trigger a replacement when adding a tag to NACL #27476

Closed
juinquok opened this issue Oct 10, 2023 · 7 comments
Closed
Labels
@aws-cdk/aws-ec2 Related to Amazon Elastic Compute Cloud bug This issue is a bug. closed-for-staleness This issue was automatically closed because it hadn't received any attention in a while. effort/medium Medium work item – several days of effort p2 response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days.

Comments

@juinquok
Copy link
Contributor

juinquok commented Oct 10, 2023

Describe the bug

Related to changes made in #26898 from the original issue raised in #26897

After updating aws-cdk with this change inside and with the original construct not having networkAclName specified, aws-cdk deploy resulted in an Internal Failure being recorded for the deployment of the stack. This was when the networkAclName was using the this.node.path variable in the Name tag of the NACL.

Further investigation into the resources being created/destroyed seems to point to the NACL being deleted and recreated during the deployment process. The CloudFormation logs provide the following information Requested update requires the replacement of the existing resource; deleting existing resource, then creating a new one. for the NACL that is being tagged.

I'm not sure if this forum is the right place to be raising CloudFormation specific issues, if it's not, will be happy to post this in the correct forum.

Expected Behavior

  1. The Name tag should be added to the resource without having to re-create the resource again.
  2. Even if the resource is deleted, CloudFormation should be able to automatically associate the default NACL of the VPC with the affected subnet since the custom NACL is being deleted. After it is recreated and the NACL is associated to the subnet again, all will be okay.

Current Behavior

  1. Tag seems to have been added per the logs.
  2. CDKMetadata is updated
  3. CFN attempts to update the default association
  4. CFN provides the message: Requested update requires the replacement of the existing resource; deleting existing resource, then creating a new one.
  5. Resource creation is initiated for the default association
  6. Resource errors out with UPDATE_FAILED and a status reason of Internal Failure

Reproduction Steps

  1. Create a VPC with subnets and associate the NACL accordingly before the commit fix(ec2): networkAclName property for NetworkAcl does not work #26898 was released
const publicNacl = new NetworkAcl(this, 'PublicNacl', {
      vpc: this.vpc,
      subnetSelection: {
        subnetType: SubnetType.PUBLIC,
      },
    });
publicNacl.addEntry(`AllowSshRule-1`, {
        cidr: AclCidr.ipv4(<SOME_IP_HERE>),
        direction: TrafficDirection.INGRESS,
        ruleNumber: 1 + index,
        traffic: AclTraffic.tcpPort(22),
        ruleAction: Action.ALLOW,
      });
  1. Add networkAclName: 'PublicNacl', to the publicNacl and use the latest aws-cdk which includes commit fix(ec2): networkAclName property for NetworkAcl does not work #26898
  2. aws-cdk deploy
  3. Error will be thrown

Possible Solution

Wondering if the NACL's or the association's logical resource ID is somehow tied to the name that is used resulting in people having earlier versions of the construct to face this issue when the construct is deployed. However, aws-cdk diff does not show any difference in logical ID only that the Name tag needs to be added

Additional Information/Context

Previous CDK version was 2.84.0

CDK CLI Version

2.99.1 (build b2a895e)

Framework Version

No response

Node.js Version

18.16.0

OS

Mac w/ Apple Silicon

Language

TypeScript

Language Version

4.9.5

Other information

No response

@juinquok juinquok added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Oct 10, 2023
@github-actions github-actions bot added the @aws-cdk/aws-ec2 Related to Amazon Elastic Compute Cloud label Oct 10, 2023
@juinquok juinquok changed the title ec2: CloudFormation attempts to trigger a replacement when adding a tag ec2: CloudFormation attempts to trigger a replacement when adding a tag to NACL Oct 10, 2023
@peterwoodworth peterwoodworth added p1 needs-review and removed needs-triage This issue or PR still needs to be triaged. labels Oct 10, 2023
@peterwoodworth
Copy link
Contributor

Thanks for reporting, we'll look into this then get back to you

@juinquok
Copy link
Contributor Author

Thanks for reporting, we'll look into this then get back to you

Thank you! Will be happy to share the CloudFormation logs privately if needed :)

@mikewrighton mikewrighton added needs-reproduction This issue needs reproduction. and removed needs-review labels Oct 11, 2023
@bea-mo
Copy link

bea-mo commented Oct 13, 2023

This is breaking our ability to deploy / update to any existing stack that has NetworkAcl constructs to the newer aws-cdk-lib version. The update results in a create/replace of all of our AWS::EC2::SubnetNetworkAclAssociation which fails with AWS::EC2::SubnetNetworkAclAssociation' with identifier '...' already exists

CDK Diff

[~] AWS::EC2::NetworkAcl ... ...
 └─ [+] Tags
     └─ [{"Key":"Name","Value":"..."}]

CFN Error

Resource handler returned message: "Resource of type 'AWS::EC2::SubnetNetworkAclAssociation' with identifier 'aclassoc-...' already exists." (RequestToken: ..., HandlerErrorCode: AlreadyExists)

Previous working CDK version 2.91.0
Updating to 2.96.2 introduced issue (from changelog appears to be 2.94.0 when the bug was introduced).

@juinquok
Copy link
Contributor Author

To reproduce -

  1. Create and deploy a stack with public, private with egress and private isolated subnets with a NACL for each of the subnets that is customised with some rules using a cdk version earlier than 2.94.0, I used 2.84.0. The NACL name should be empty
  2. Update the aws-cdk version to latest and run cdk diff, you'll get a diff as mentioned by @bea-mo above which only indicates that tags are to be added to the resource.
  3. Run cdk deploy and you'll get a stack update error after the tags have been added as cloudformation is trying to create / replace all the AWS::EC2::SubnetNetworkAclAssociation.

@Mexflubber
Copy link

I'm experiencing the same thing, can't get my stack deployed.

@pahud pahud self-assigned this Apr 17, 2024
@pahud
Copy link
Contributor

pahud commented Apr 24, 2024

This PR you mentioned was first introduced in 2.94.0
7f31da8

  1. I deployed this with 2.93.0 first
export class DummyStack extends Stack {
  constructor(scope: Construct, id: string, props: StackProps) {
    super(scope, id, props);

    // my default vpc
    const vpc = getDefaultVpc(this);

    const publicNacl = new ec2.NetworkAcl(this, 'PublicNacl', {
      vpc,
      subnetSelection: {
        subnetType: SubnetType.PUBLIC,
      },
    });

    publicNacl.addEntry('AllowSshRule-1', {
      cidr: ec2.AclCidr.ipv4('8.8.8.8'),
      direction: ec2.TrafficDirection.INGRESS,
      ruleNumber: 1,
      traffic: ec2.AclTraffic.tcpPort(22),
      ruleAction: ec2.Action.ALLOW,
    });

  }
}
  1. upgraded to 2.138.0,
  2. update and add thenetworkAckEntryName prop:
    publicNacl.addEntry('AllowSshRule-1', {
      cidr: ec2.AclCidr.ipv4('8.8.8.8/32'),
      direction: ec2.TrafficDirection.INGRESS,
      ruleNumber: 1,
      traffic: ec2.AclTraffic.tcpPort(22),
      ruleAction: ec2.Action.ALLOW,
      networkAclEntryName: 'PublicNacl'
    });
  1. run npx cdk diff first
 npx cdk diff
Stack dummy-stack
Hold on while we create a read-only change set to get a diff with accurate replacement information (use --no-change-set to use a less accurate but faster template-only diff)
Resources
[~] AWS::EC2::NetworkAcl PublicNacl PublicNacl8B0AAE87 
 └─ [+] Tags
     └─ [{"Key":"Name","Value":"dummy-stack/PublicNacl"}]


✨  Number of stacks with differences: 1
  1. run cdk deploy and it deploys with no error.
 ✅  dummy-stack

✨  Deployment time: 31.99s

Stack ARN:
arn:aws:cloudformation:us-east-1:XXXXXXXXXXXX:stack/dummy-stack/2dbb9230-023e-11ef-b55e-0e8537de8c81

✨  Total time: 34.48s

Now I destroyed the stack and repeated it again using 2.84.0 and upgraded to 2.138.0. It deploys with no error.

 ✅  dummy-stack

✨  Deployment time: 32.09s

Stack ARN:
arn:aws:cloudformation:us-east-1:XXXXXXXXXX:stack/dummy-stack/00c77990-0240-11ef-9791-0affd7dd11e9

✨  Total time: 34.6s

As we can't reproduce this issue, I am downgrading this to p2 and if you believe this issue does exists please share your detailed reproduction steps with code snippets that we can copy/paste and reproduce in our environment. Thanks.

@pahud pahud added p2 and removed p1 labels Apr 24, 2024
@pahud pahud removed their assignment Apr 24, 2024
@pahud pahud added response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. effort/medium Medium work item – several days of effort and removed needs-reproduction This issue needs reproduction. labels Apr 24, 2024
Copy link

This issue has not received a response in a while. If you want to keep this issue open, please leave a comment below and auto-close will be canceled.

@github-actions github-actions bot added closing-soon This issue will automatically close in 4 days unless further comments are made. closed-for-staleness This issue was automatically closed because it hadn't received any attention in a while. and removed closing-soon This issue will automatically close in 4 days unless further comments are made. labels Apr 26, 2024
@github-actions github-actions bot closed this as completed May 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-ec2 Related to Amazon Elastic Compute Cloud bug This issue is a bug. closed-for-staleness This issue was automatically closed because it hadn't received any attention in a while. effort/medium Medium work item – several days of effort p2 response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days.
Projects
None yet
Development

No branches or pull requests

6 participants