Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: CloudFormation Stack fails to sync or get ready when it takes too long to deploy #1505

Open
1 task done
karloscarrijo opened this issue Sep 30, 2024 · 4 comments
Open
1 task done
Labels
bug Something isn't working needs:triage

Comments

@karloscarrijo
Copy link

karloscarrijo commented Sep 30, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Affected Resource(s)

  • cloudformation.aws.upbound.io/v1beta1 - Stack

Resource MRs required to reproduce the bug

apiVersion: cloudformation.aws.upbound.io/v1beta1
kind: Stack
metadata:
  name: ct-stack
spec:
  forProvider:
    name: controltower-stack
    parameters:
      AuditAccountId: "xxxxxxxxxxxxxx"
      LogArchiveAccountId: "xxxxxxxxxxxxxx"
    region: us-east-1
    templateBody: |
      {
        "AWSTemplateFormatVersion": "2010-09-09",
        "Description": "AWS Control Tower Setup",
        "Parameters": {
          "AuditAccountId": {
            "Type": "String",
            "Description": "The ID of the Audit Account"
          },
          "LogArchiveAccountId": {
            "Type": "String",
            "Description": "The ID of the Log Archive Account"
          }
        },
        "Resources": {
          "ControlTowerLandingZone": {
            "Type": "AWS::ControlTower::LandingZone",
            "Properties": {
              "Manifest": {
                "governedRegions": ["us-east-1"],
                "organizationStructure": {
                  "security": { "name": "security" }
                },
                "centralizedLogging": {
                  "accountId": { "Ref": "LogArchiveAccountId" },
                  "configurations": {
                    "loggingBucket": { "retentionDays": 60 },
                    "accessLoggingBucket": { "retentionDays": 60 }
                  },
                  "enabled": true
                },
                "securityRoles": {
                  "accountId": { "Ref": "AuditAccountId" }
                },
                "accessManagement": { "enabled": true }
              },
              "Tags": [
                { "Key": "Name", "Value": "ControlTowerLandingZone" }
              ],
              "Version": "3.3"
            }
          }
        },
        "Outputs": {
          "LandingZoneId": {
            "Description": "The ID of the Control Tower Landing Zone",
            "Value": { "Ref": "ControlTowerLandingZone" }
          }
        }
      }

Steps to Reproduce

  1. Create a CloudFormation Stack resource with the manifest mentioned above.
  2. Wait for it to be provisioned on AWS (around 30 minutes)
  3. Check the resource on crossplane and see that is not synced or ready and keeps trying to recreate.

What happened?

If it is a simple cloudformation template (for instance, creating a parameter store), it works fine and the resource gets Ready and Synced. But if the cloudformation template is complex and takes too long to complete (like enabling Control Tower on a master account) it never gets synced or Ready, and it keeps trying to recreate the stack, even thou it was created successfully on AWS.

Relevant Error Output Snippet

Warning CannotCreateExternalResource 2m57s (x215 over 3h33m) managed/cloudformation.aws.upbound.io/v1betal, kind-stack (combined from similar events): async create failed: failed to create the resource: [{0 creating CloudFormation Stack (FoundationControlTowerStack): operation error CloudFormation: CreateStack, https response error StatusCode: 400, RequestID: ec977b01-3963-49da-8069-1f6f134b055a, AlreadyExistsException: Stack [FoundationControlTowerStack] already exists []}]

Crossplane Version

1.17.0

Provider Version

1.14.0

Kubernetes Version

No response

Kubernetes Distribution

EKS

Additional Info

I have tried to create an Observe-only resource to import the Cloudformation stack that was created and it works, but only if I set the external-name metadata do the ID (last part of the ARN of the stack), not the full ARN or Name. I'm not sure if is related to the bug.

@karloscarrijo karloscarrijo added bug Something isn't working needs:triage labels Sep 30, 2024
@karloscarrijozup
Copy link

karloscarrijozup commented Oct 2, 2024

Just to add to this, I noticed that after exactly 15 minutes I get the "token expired" error on the provider logs, and after that it tries to recreate the stack, generating the AlreadyExistsException over and over again.

2024-10-02T18:10:11Z    DEBUG   provider-aws    Cannot create external resource {"controller": "managed/cloudformation.aws.upbound.io/v1beta1, kind=stack", "request": {"name":"ct-stack"}, "uid": "dd85052a-1483-438b-8d93-6e33aa384315", "version": "19203025", "external-name": "", "error": "async create failed: failed to create the resource: [{0 waiting for CloudFormation Stack (arn:aws:cloudformation:sa-east-1:xxxxxxxxxxxx:stack/controltower-stack/23403010-80e5-11ef-b07e-0a6a2c5bb2b9) create: operation error CloudFormation: DescribeStacks, https response error StatusCode: 403, RequestID: aa00246f-187c-4c34-b175-695ea3a91163, api error ExpiredToken: The security token included in the request is expired  []}]"}
2024-10-02T18:10:11Z    DEBUG   provider-aws    Async create starting...        {"trackerUID": "dd85052a-1483-438b-8d93-6e33aa384315", "resourceName": "ct-stack", "gvk": "cloudformation.aws.upbound.io/v1beta1, Kind=Stack", "tfID": ""}
2024-10-02T18:10:11Z    DEBUG   provider-aws    Creating the external resource  {"uid": "dd85052a-1483-438b-8d93-6e33aa384315", "name": "ct-stack", "gvk": "cloudformation.aws.upbound.io/v1beta1, Kind=Stack"}
2024-10-02T18:10:11Z    DEBUG   provider-aws    Async create ended.     {"trackerUID": "dd85052a-1483-438b-8d93-6e33aa384315", "resourceName": "ct-stack", "gvk": "cloudformation.aws.upbound.io/v1beta1, Kind=Stack", "error": "async create failed: failed to create the resource: [{0 creating CloudFormation Stack (controltower-stack): operation error CloudFormation: CreateStack, https response error StatusCode: 400, RequestID: 90d2e4b5-94f9-4ffe-85e8-6e764c42604e, AlreadyExistsException: Stack [controltower-stack] already exists  []}]", "tfID": ""}

@karloscarrijozup
Copy link

Seems related to #1346, #1482 and crossplane/crossplane#5918

@peterfocko
Copy link

I am facing the same issue, when creating CFN stacks, that takes longer to create. In my case I am creating CloudFront distribution in the stack. I also get security token in the request is expired error.

@karloscarrijozup
Copy link

/fresh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs:triage
Projects
None yet
Development

No branches or pull requests

3 participants